1 Commits

Author SHA1 Message Date
72afec773e Integrate heuristics into lexer selection 2024-08-24 21:35:06 -03:00
303 changed files with 364 additions and 649 deletions

1
.gitignore vendored
View File

@ -9,4 +9,3 @@ shard.lock
.vscode/
.crystal/
venv/
.croupier

3
.md.rb
View File

@ -1,3 +0,0 @@
exclude_rule 'MD033' # Inline HTML
exclude_rule 'MD005' # 3-space indent for lists
exclude_rule 'MD024' # Repeated headings

1
.mdlrc
View File

@ -1 +0,0 @@
style ".md.rb"

View File

@ -1,35 +0,0 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-merge-conflict
- repo: https://github.com/jumanjihouse/pre-commit-hooks
rev: 3.0.0
hooks:
- id: shellcheck
- id: markdownlint
exclude: '^content'
- repo: https://github.com/mrtazz/checkmake
rev: 0.2.2
hooks:
- id: checkmake
exclude: lexers/makefile.xml
- repo: https://github.com/python-jsonschema/check-jsonschema
rev: 0.29.2
hooks:
- id: check-github-workflows
- repo: https://github.com/commitizen-tools/commitizen
rev: v3.29.0 # automatically updated by Commitizen
hooks:
- id: commitizen
- id: commitizen-branch
stages:
- post-commit
- push

View File

@ -1,41 +0,0 @@
# Changelog
All notable changes to this project will be documented in this file.
## [0.6.4] - 2024-08-28
### 🐛 Bug Fixes
- Ameba
- Variable bame in Hacefile
### 📚 Documentation
- Mention AUR package
### ⚙️ Miscellaneous Tasks
- Pre-commit hooks
- Git-cliff config
- Started changelog
- Force conventional commit messages
- Force conventional commit messages
- Updated pre-commit
### Build
- Switch from Makefile to Hacefile
- Added do_release script
- Fix markdown check
### Bump
- Release v0.6.4
## [0.6.1] - 2024-08-25
### 📚 Documentation
- Improve readme and help message
<!-- generated by git-cliff -->

View File

@ -1,115 +0,0 @@
variables:
FLAGS: "-d --error-trace"
NAME: "tartrazine"
tasks:
build:
default: true
dependencies:
- src
- shard.lock
- shard.yml
- Hacefile.yml
- lexers/*xml
- styles/*xml
outputs:
- bin/{{NAME}}
commands: |
shards build {{FLAGS}}
get-deps:
dependencies:
- shard.yml
outputs:
- shard.lock
commands: |
shards install
build-release:
phony: true
always_run: true
commands: |
hace build FLAGS="--release"
install:
phony: true
always_run: true
dependencies:
- bin/hace
commands: |
rm ${HOME}/.local/bin/{{NAME}}
cp bin/hace ${HOME}/.local/bin/{{NAME}}
static:
outputs:
- bin/{{NAME}}-static-linux-amd64
- bin/{{NAME}}-static-linux-arm64
commands: |
hace clean
./build_static.sh
test:
dependencies:
- src
- spec
- shard.lock
- shard.yml
commands: |
crystal spec -v --error-trace
phony: true
always_run: true
lint:
dependencies:
- src
- spec
- shard.lock
- shard.yml
commands: |
crystal tool format src/*.cr spec/*.cr
ameba --fix
always_run: true
phony: true
docs:
dependencies:
- src
- shard.lock
- shard.yml
- README.md
commands: |
crystal docs
outputs:
- docs/index.html
pre-commit:
default: true
outputs:
- .git/hooks/commit-msg
- .git/hooks/pre-commit
dependencies:
- .pre-commit-config.yaml
commands: |
pre-commit install --hook-type commit-msg
pre-commit install
clean:
phony: true
always_run: true
commands: |
rm -rf shard.lock bin lib
coverage:
dependencies:
- src
- spec
- shard.lock
- shard.yml
commands: |
shards install
crystal build -o bin/run_tests src/run_tests.cr
rm -rf coverage/
mkdir coverage
kcov --clean --include-path=./src ${PWD}/coverage ./bin/run_tests
outputs:
- coverage/index.html

7
Makefile Normal file
View File

@ -0,0 +1,7 @@
build: $(wildcard src/**/*.cr) $(wildcard lexers/*xml) $(wildcard styles/*xml) shard.yml
shards build -Dstrict_multi_assign -Dno_number_autocast -d --error-trace
release: $(wildcard src/**/*.cr) $(wildcard lexers/*xml) $(wildcard styles/*xml) shard.yml
shards build --release
static: $(wildcard src/**/*.cr) $(wildcard lexers/*xml) $(wildcard styles/*xml) shard.yml
shards build --release --static
strip bin/tartrazine

View File

@ -2,22 +2,44 @@
Tartrazine is a library to syntax-highlight code. It is
a port of [Pygments](https://pygments.org/) to
[Crystal](https://crystal-lang.org/).
[Crystal](https://crystal-lang.org/). Kind of.
It also provides a CLI tool which can be used to highlight many things in many styles.
The CLI tool can be used to highlight many things in many styles.
Currently Tartrazine supports 247 languages and has 331 themes (63 from Chroma,
the rest are base16 themes via [Sixteen](https://github.com/ralsina/sixteen)
# A port of what? Why "kind of"?
Pygments is a staple of the Python ecosystem, and it's great.
It lets you highlight code in many languages, and it has many
themes. Chroma is "Pygments for Go", it's actually a port of
Pygments to Go, and it's great too.
I wanted that in Crystal, so I started this project. But I did
not read much of the Pygments code. Or much of Chroma's.
Chroma has taken most of the Pygments lexers and turned them into
XML descriptions. What I did was take those XML files from Chroma
and a pile of test cases from Pygments, and I slapped them together
until the tests passed and my code produced the same output as
Chroma. Think of it as *extreme TDD*.
Currently the pass rate for tests in the supported languages
is `96.8%`, which is *not bad for a couple days hacking*.
This only covers the RegexLexers, which are the most common ones,
but it means the supported languages are a subset of Chroma's, which
is a subset of Pygments'.
Currently Tartrazine supports ... 248 languages.
It has 331 themes (63 from Chroma, the rest are base16 themes via
[Sixteen](https://github.com/ralsina/sixteen)
## Installation
If you are using Arch: Use yay or your favourite AUR helper, package name is `tartrazine`.
From prebuilt binaries:
Each release provides statically-linked binaries that should
work on any Linux. Get them from the [releases page](https://github.com/ralsina/tartrazine/releases)
and put them in your PATH.
work on any Linux. Get them from the [releases page](https://github.com/ralsina/tartrazine/releases) and put them in your PATH.
To build from source:
@ -30,14 +52,14 @@ To build from source:
Show a syntax highlighted version of a C source file in your terminal:
```shell
tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers -f terminal
$ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers -f terminal
```
Generate a standalone HTML file from a C source file with the syntax highlighted:
```shell
$ tartrazine whatever.c -t catppuccin-macchiato --line-numbers \
--standalone -f html -o whatever.html
$ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers \
--standalone -f html -o whatever.html
```
## Usage as a Library
@ -65,30 +87,3 @@ puts formatter.format(File.read(ARGV[0]), lexer)
## Contributors
- [Roberto Alsina](https://github.com/ralsina) - creator and maintainer
## A port of what, and why "kind of"
Pygments is a staple of the Python ecosystem, and it's great.
It lets you highlight code in many languages, and it has many
themes. Chroma is "Pygments for Go", it's actually a port of
Pygments to Go, and it's great too.
I wanted that in Crystal, so I started this project. But I did
not read much of the Pygments code. Or much of Chroma's.
Chroma has taken most of the Pygments lexers and turned them into
XML descriptions. What I did was take those XML files from Chroma
and a pile of test cases from Pygments, and I slapped them together
until the tests passed and my code produced the same output as
Chroma. Think of it as [*extreme TDD*](https://ralsina.me/weblog/posts/tartrazine-reimplementing-pygments.html)
Currently the pass rate for tests in the supported languages
is `96.8%`, which is *not bad for a couple days hacking*.
This only covers the RegexLexers, which are the most common ones,
but it means the supported languages are a subset of Chroma's, which
is a subset of Pygments' and DelegatingLexers (useful for things like template languages)
Then performance was bad, so I hacked and hacked and made it significantly
[faster than chroma](https://ralsina.me/weblog/posts/a-tale-of-optimization.html)
which is fun.

View File

@ -8,8 +8,8 @@
* ✅ Implement lexer loader that respects aliases
* ✅ Implement lexer loader by file extension
* ✅ Add --line-numbers to terminal formatter
* Implement lexer loader by mime type
* Implement lexer loader by mime type
* ✅ Implement Delegating lexers
* ✅ Add RstLexer
* Add Mako template lexer
* Implement heuristic lexer detection
* Implement heuristic lexer detection

View File

@ -7,10 +7,10 @@ docker run --rm --privileged \
# Build for AMD64
docker build . -f Dockerfile.static -t tartrazine-builder
docker run -ti --rm -v "$PWD":/app --user="$UID" tartrazine-builder /bin/sh -c "cd /app && rm -rf lib shard.lock && shards build --static --release"
docker run -ti --rm -v "$PWD":/app --user="$UID" tartrazine-builder /bin/sh -c "cd /app && rm -rf lib shard.lock && make static"
mv bin/tartrazine bin/tartrazine-static-linux-amd64
# Build for ARM64
docker build . -f Dockerfile.static --platform linux/arm64 -t tartrazine-builder
docker run -ti --rm -v "$PWD":/app --platform linux/arm64 --user="$UID" tartrazine-builder /bin/sh -c "cd /app && rm -rf lib shard.lock && shards build --static --release"
docker run -ti --rm -v "$PWD":/app --platform linux/arm64 --user="$UID" tartrazine-builder /bin/sh -c "cd /app && rm -rf lib shard.lock && make static"
mv bin/tartrazine bin/tartrazine-static-linux-arm64

View File

@ -1,79 +0,0 @@
# git-cliff ~ default configuration file
# https://git-cliff.org/docs/configuration
#
# Lines starting with "#" are comments.
# Configuration options are organized into tables and keys.
# See documentation for more information on available options.
[changelog]
# template for the changelog header
header = """
# Changelog\n
All notable changes to this project will be documented in this file.\n
"""
# template for the changelog body
# https://keats.github.io/tera/docs/#introduction
body = """
{% if version %}\
## [{{ version | trim_start_matches(pat="v") }}] - {{ timestamp | date(format="%Y-%m-%d") }}
{% else %}\
## [unreleased]
{% endif %}\
{% for group, commits in commits | group_by(attribute="group") %}
### {{ group | striptags | trim | upper_first }}
{% for commit in commits %}
- {% if commit.scope %}*({{ commit.scope }})* {% endif %}\
{% if commit.breaking %}[**breaking**] {% endif %}\
{{ commit.message | upper_first }}\
{% endfor %}
{% endfor %}\n
"""
# template for the changelog footer
footer = """
<!-- generated by git-cliff -->
"""
# remove the leading and trailing s
trim = true
# postprocessors
postprocessors = [
# { pattern = '<REPO>', replace = "https://github.com/orhun/git-cliff" }, # replace repository URL
]
[git]
# parse the commits based on https://www.conventionalcommits.org
conventional_commits = true
# filter out the commits that are not conventional
filter_unconventional = true
# process each line of a commit as an individual commit
split_commits = false
# regex for preprocessing the commit messages
commit_preprocessors = [
# Replace issue numbers
#{ pattern = '\((\w+\s)?#([0-9]+)\)', replace = "([#${2}](<REPO>/issues/${2}))"},
# Check spelling of the commit with https://github.com/crate-ci/typos
# If the spelling is incorrect, it will be automatically fixed.
#{ pattern = '.*', replace_command = 'typos --write-changes -' },
]
# regex for parsing and grouping commits
commit_parsers = [
{ message = "^feat", group = "<!-- 0 -->🚀 Features" },
{ message = "^fix", group = "<!-- 1 -->🐛 Bug Fixes" },
{ message = "^doc", group = "<!-- 3 -->📚 Documentation" },
{ message = "^perf", group = "<!-- 4 -->⚡ Performance" },
{ message = "^refactor", group = "<!-- 2 -->🚜 Refactor" },
{ message = "^style", group = "<!-- 5 -->🎨 Styling" },
{ message = "^test", group = "<!-- 6 -->🧪 Testing" },
{ message = "^chore\\(release\\): prepare for", skip = true },
{ message = "^chore\\(deps.*\\)", skip = true },
{ message = "^chore\\(pr\\)", skip = true },
{ message = "^chore\\(pull\\)", skip = true },
{ message = "^chore|^ci", group = "<!-- 7 -->⚙️ Miscellaneous Tasks" },
{ body = ".*security", group = "<!-- 8 -->🛡️ Security" },
{ message = "^revert", group = "<!-- 9 -->◀️ Revert" },
]
# filter out the commits that are not matched by commit parsers
filter_commits = false
# sort the tags topologically
topo_order = false
# sort the commits inside sections by oldest/newest order
sort_commits = "oldest"

View File

@ -1,15 +0,0 @@
#!/bin/bash
set e
PKGNAME=$(basename "$PWD")
VERSION=$(git cliff --bumped-version |cut -dv -f2)
sed "s/^version:.*$/version: $VERSION/g" -i shard.yml
git add shard.yml
hace lint test
git cliff --bump -o
git commit -a -m "bump: Release v$VERSION"
git tag "v$VERSION"
git push --tags
hace static
gh release create "v$VERSION" "bin/$PKGNAME-static-linux-amd64" "bin/$PKGNAME-static-linux-arm64" --title "Release v$VERSION" --notes "$(git cliff -l -s all)"

View File

@ -127,3 +127,4 @@
</state>
</rules>
</lexer>

View File

@ -52,3 +52,4 @@
</state>
</rules>
</lexer>

View File

@ -151,4 +151,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -63,4 +63,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -65,4 +65,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -160,4 +160,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -318,4 +318,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -63,3 +63,4 @@
</state>
</rules>
</lexer>

View File

@ -72,4 +72,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -55,3 +55,4 @@
</state>
</rules>
</lexer>

View File

@ -105,4 +105,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -314,4 +314,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -71,4 +71,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -56,4 +56,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -127,4 +127,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -171,4 +171,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -306,4 +306,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -123,4 +123,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -75,3 +75,4 @@
</state>
</rules>
</lexer>

View File

@ -67,3 +67,4 @@
</state>
</rules>
</lexer>

View File

@ -92,4 +92,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -94,4 +94,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -217,4 +217,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -22,4 +22,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -657,4 +657,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -19,3 +19,4 @@
</state>
</rules>
</lexer>

View File

@ -149,4 +149,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -81,4 +81,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -138,4 +138,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -25,4 +25,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -80,4 +80,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -48,4 +48,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -118,4 +118,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -328,4 +328,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -119,4 +119,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -134,4 +134,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -148,4 +148,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -89,4 +89,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -131,4 +131,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -140,4 +140,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -52,4 +52,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -87,4 +87,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -87,4 +87,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -207,4 +207,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -181,4 +181,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -133,4 +133,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -759,4 +759,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -320,4 +320,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -369,4 +369,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -130,4 +130,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -210,4 +210,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -14,4 +14,4 @@
<rule pattern="([-A-Za-z0-9]+)(\[[^\] \t=]+\])?([ \t]*)(=)([ \t]*)([^\n]*)([ \t\n]*\n)"><bygroups><token type="NameAttribute"/><token type="NameNamespace"/><token type="TextWhitespace"/><token type="Operator"/><token type="TextWhitespace"/><token type="LiteralString"/><token type="TextWhitespace"/></bygroups></rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -49,4 +49,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -150,4 +150,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -54,4 +54,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -165,4 +165,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -173,4 +173,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -87,4 +87,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -741,4 +741,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -116,4 +116,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -129,4 +129,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -163,4 +163,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -409,4 +409,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -65,4 +65,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -156,4 +156,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -75,4 +75,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -68,4 +68,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -260,4 +260,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -62,4 +62,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -286,4 +286,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -111,4 +111,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -85,4 +85,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -144,4 +144,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -269,4 +269,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -140,4 +140,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -186,4 +186,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -146,4 +146,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -156,4 +156,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -101,4 +101,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -213,4 +213,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -44,4 +44,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -42,4 +42,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -68,4 +68,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -154,4 +154,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -157,4 +157,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -108,4 +108,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -15,7 +15,7 @@
</rule>
<rule pattern="\.\."> // Spread operator
<token type="Operator"/>
</rule>
</rule>
<rule pattern="\^(?=\()"> // Sort operator
<token type="Operator"/>
</rule>
@ -54,7 +54,7 @@
</rule>
<rule pattern="(\+|-)?(0|[1-9]\d*)(\.\d+[eE](\+|-)?\d+|[eE](\+|-)?\d+|\.\d+)">
<token type="LiteralNumberFloat"/>
</rule>
</rule>
<rule pattern="(\+|-)?(0|[1-9]\d*)">
<token type="LiteralNumberInteger"/>
</rule>
@ -75,9 +75,9 @@
</rule>
<!-- NOTE: This expression matches everything remaining, which should be only JSONata names.
Therefore, it has been left as last intentionally -->
<rule pattern="[a-zA-Z0-9_]*">
<rule pattern="[a-zA-Z0-9_]*">
<token type="Name"/>
</rule>
</state>
</rules>
</lexer>
</lexer>

View File

@ -397,4 +397,4 @@
</rule>
</state>
</rules>
</lexer>
</lexer>

Some files were not shown because too many files have changed in this diff Show More