Integrate heuristics into lexer selection

Comments
Make it work again
2025-06-08 04:30:26 -03:00 · 2024-08-24 21:35:06 -03:00 · 2024-08-24 20:53:14 -03:00 · 2024-08-24 20:09:29 -03:00 · 2024-08-24 19:59:05 -03:00 · 2024-08-24 19:55:56 -03:00
28 changed files with 15443 additions and 1415 deletions
--- a/.gitignore
+++ b/.gitignore
@ -8,3 +8,4 @@ pygments/
 shard.lock
 .vscode/
 .crystal/
+venv/
--- a/README.md
+++ b/README.md
@ -29,7 +29,7 @@ This only covers the RegexLexers, which are the most common ones,
 but it means the supported languages are a subset of Chroma's, which
 is a subset of Pygments'.

-Currently Tartrazine supports ... 241 languages.
+Currently Tartrazine supports ... 248 languages.

 It has 331 themes (63 from Chroma, the rest are base16 themes via
 [Sixteen](https://github.com/ralsina/sixteen)
@ -47,7 +47,22 @@ To build from source:
 2. Run `make` to build the `tartrazine` binary
 3. Copy the binary somewhere in your PATH.

-## Usage
+## Usage as a CLI tool
+
+Show a syntax highlighted version of a C source file in your terminal:
+
+```shell
+$ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers -f terminal
+```
+
+Generate a standalone HTML file from a C source file with the syntax highlighted:
+
+```shell
+$ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers \
+  --standalone -f html -o whatever.html 
+```
+
+## Usage as a Library

 This works:

@ -56,7 +71,9 @@ require "tartrazine"

 lexer = Tartrazine.lexer("crystal")
 theme = Tartrazine.theme("catppuccin-macchiato")
-puts Tartrazine::Html.new.format(File.read(ARGV[0]), lexer, theme)
+formatter = Tartrazine::Html.new
+formatter.theme = theme
+puts formatter.format(File.read(ARGV[0]), lexer)
 ```

 ## Contributing
--- a/TODO.md
+++ b/TODO.md
@ -9,4 +9,7 @@
 * ✅ Implement lexer loader by file extension
 * ✅ Add --line-numbers to terminal formatter
 * Implement lexer loader by mime type
-* Implement Pygment's "DelegateLexer"
+* ✅ Implement Delegating lexers
+* ✅ Add RstLexer
+* Add Mako template lexer
+* Implement heuristic lexer detection
--- a/lexers/LICENSE-heuristics
+++ b/lexers/LICENSE-heuristics
--- a/lexers/LiquidLexer.xml
+++ b/lexers/LiquidLexer.xml
@ -0,0 +1,130 @@
+
+<lexer>
+  <config>
+    <name>liquid</name>
+    <alias>liquid</alias>
+    <filename>*.liquid</filename>
+  </config>
+  <rules>
+    <state name="root">
+      <rule pattern="[^{]+"><token type="Text"/></rule>
+      <rule pattern="(\{%)(\s*)"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/></bygroups><push state="tag-or-block"/></rule>
+      <rule pattern="(\{\{)(\s*)([^\s}]+)"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><usingself state="generic"/></bygroups><push state="output"/></rule>
+      <rule pattern="\{"><token type="Text"/></rule>
+    </state>
+    <state name="tag-or-block">
+      <rule pattern="(if|unless|elsif|case)(?=\s+)"><token type="KeywordReserved"/><push state="condition"/></rule>
+      <rule pattern="(when)(\s+)"><bygroups><token type="KeywordReserved"/><token type="TextWhitespace"/></bygroups><combined state="end-of-block" state="whitespace" state="generic"/></rule>
+      <rule pattern="(else)(\s*)(%\})"><bygroups><token type="KeywordReserved"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
+      <rule pattern="(capture)(\s+)([^\s%]+)(\s*)(%\})"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><usingself state="variable"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
+      <rule pattern="(comment)(\s*)(%\})"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><push state="comment"/></rule>
+      <rule pattern="(raw)(\s*)(%\})"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><push state="raw"/></rule>
+      <rule pattern="(end(case|unless|if))(\s*)(%\})"><bygroups><token type="KeywordReserved"/>None<token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
+      <rule pattern="(end([^\s%]+))(\s*)(%\})"><bygroups><token type="NameTag"/>None<token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
+      <rule pattern="(cycle)(\s+)(?:([^\s:]*)(:))?(\s*)"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><usingself state="generic"/><token type="Punctuation"/><token type="TextWhitespace"/></bygroups><push state="variable-tag-markup"/></rule>
+      <rule pattern="([^\s%]+)(\s*)"><bygroups><token type="NameTag"/><token type="TextWhitespace"/></bygroups><push state="tag-markup"/></rule>
+    </state>
+    <state name="output">
+      <rule><include state="whitespace"/></rule>
+      <rule pattern="\}\}"><token type="Punctuation"/><pop depth="1"/></rule>
+      <rule pattern="\|"><token type="Punctuation"/><push state="filters"/></rule>
+    </state>
+    <state name="filters">
+      <rule><include state="whitespace"/></rule>
+      <rule pattern="\}\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
+      <rule pattern="([^\s|:]+)(:?)(\s*)"><bygroups><token type="NameFunction"/><token type="Punctuation"/><token type="TextWhitespace"/></bygroups><push state="filter-markup"/></rule>
+    </state>
+    <state name="filter-markup">
+      <rule pattern="\|"><token type="Punctuation"/><pop depth="1"/></rule>
+      <rule><include state="end-of-tag"/></rule>
+      <rule><include state="default-param-markup"/></rule>
+    </state>
+    <state name="condition">
+      <rule><include state="end-of-block"/></rule>
+      <rule><include state="whitespace"/></rule>
+      <rule pattern="([^\s=!&gt;&lt;]+)(\s*)([=!&gt;&lt;]=?)(\s*)(\S+)(\s*)(%\})"><bygroups><usingself state="generic"/><token type="TextWhitespace"/><token type="Operator"/><token type="TextWhitespace"/><usingself state="generic"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups></rule>
+      <rule pattern="\b!"><token type="Operator"/></rule>
+      <rule pattern="\bnot\b"><token type="OperatorWord"/></rule>
+      <rule pattern="([\w.\&#x27;&quot;]+)(\s+)(contains)(\s+)([\w.\&#x27;&quot;]+)"><bygroups><usingself state="generic"/><token type="TextWhitespace"/><token type="OperatorWord"/><token type="TextWhitespace"/><usingself state="generic"/></bygroups></rule>
+      <rule><include state="generic"/></rule>
+      <rule><include state="whitespace"/></rule>
+    </state>
+    <state name="generic-value">
+      <rule><include state="generic"/></rule>
+      <rule><include state="end-at-whitespace"/></rule>
+    </state>
+    <state name="operator">
+      <rule pattern="(\s*)((=|!|&gt;|&lt;)=?)(\s*)"><bygroups><token type="TextWhitespace"/><token type="Operator"/>None<token type="TextWhitespace"/></bygroups><pop depth="1"/></rule>
+      <rule pattern="(\s*)(\bcontains\b)(\s*)"><bygroups><token type="TextWhitespace"/><token type="OperatorWord"/><token type="TextWhitespace"/></bygroups><pop depth="1"/></rule>
+    </state>
+    <state name="end-of-tag">
+      <rule pattern="\}\}"><token type="Punctuation"/><pop depth="1"/></rule>
+    </state>
+    <state name="end-of-block">
+      <rule pattern="%\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
+    </state>
+    <state name="end-at-whitespace">
+      <rule pattern="\s+"><token type="TextWhitespace"/><pop depth="1"/></rule>
+    </state>
+    <state name="param-markup">
+      <rule><include state="whitespace"/></rule>
+      <rule pattern="([^\s=:]+)(\s*)(=|:)"><bygroups><token type="NameAttribute"/><token type="TextWhitespace"/><token type="Operator"/></bygroups></rule>
+      <rule pattern="(\{\{)(\s*)([^\s}])(\s*)(\}\})"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><usingself state="variable"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups></rule>
+      <rule><include state="string"/></rule>
+      <rule><include state="number"/></rule>
+      <rule><include state="keyword"/></rule>
+      <rule pattern=","><token type="Punctuation"/></rule>
+    </state>
+    <state name="default-param-markup">
+      <rule><include state="param-markup"/></rule>
+      <rule pattern="."><token type="Text"/></rule>
+    </state>
+    <state name="variable-param-markup">
+      <rule><include state="param-markup"/></rule>
+      <rule><include state="variable"/></rule>
+      <rule pattern="."><token type="Text"/></rule>
+    </state>
+    <state name="tag-markup">
+      <rule pattern="%\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
+      <rule><include state="default-param-markup"/></rule>
+    </state>
+    <state name="variable-tag-markup">
+      <rule pattern="%\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
+      <rule><include state="variable-param-markup"/></rule>
+    </state>
+    <state name="keyword">
+      <rule pattern="\b(false|true)\b"><token type="KeywordConstant"/></rule>
+    </state>
+    <state name="variable">
+      <rule pattern="[a-zA-Z_]\w*"><token type="NameVariable"/></rule>
+      <rule pattern="(?&lt;=\w)\.(?=\w)"><token type="Punctuation"/></rule>
+    </state>
+    <state name="string">
+      <rule pattern="&#x27;[^&#x27;]*&#x27;"><token type="LiteralStringSingle"/></rule>
+      <rule pattern="&quot;[^&quot;]*&quot;"><token type="LiteralStringDouble"/></rule>
+    </state>
+    <state name="number">
+      <rule pattern="\d+\.\d+"><token type="LiteralNumberFloat"/></rule>
+      <rule pattern="\d+"><token type="LiteralNumberInteger"/></rule>
+    </state>
+    <state name="generic">
+      <rule><include state="keyword"/></rule>
+      <rule><include state="string"/></rule>
+      <rule><include state="number"/></rule>
+      <rule><include state="variable"/></rule>
+    </state>
+    <state name="whitespace">
+      <rule pattern="[ \t]+"><token type="TextWhitespace"/></rule>
+    </state>
+    <state name="comment">
+      <rule pattern="(\{%)(\s*)(endcomment)(\s*)(%\})"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><push state="#pop" state="#pop"/></rule>
+      <rule pattern="."><token type="Comment"/></rule>
+    </state>
+    <state name="raw">
+      <rule pattern="[^{]+"><token type="Text"/></rule>
+      <rule pattern="(\{%)(\s*)(endraw)(\s*)(%\})"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
+      <rule pattern="\{"><token type="Text"/></rule>
+    </state>
+  </rules>
+</lexer>
+
--- a/lexers/VelocityLexer.xml
+++ b/lexers/VelocityLexer.xml
@ -0,0 +1,55 @@
+
+<lexer>
+  <config>
+    <name>Velocity</name>
+    <alias>velocity</alias>
+    <filename>*.vm</filename>
+    <filename>*.fhtml</filename>
+    <dot_all>true</dot_all>
+  </config>
+  <rules>
+    <state name="root">
+      <rule pattern="[^{#$]+"><token type="Other"/></rule>
+      <rule pattern="(#)(\*.*?\*)(#)"><bygroups><token type="CommentPreproc"/><token type="Comment"/><token type="CommentPreproc"/></bygroups></rule>
+      <rule pattern="(##)(.*?$)"><bygroups><token type="CommentPreproc"/><token type="Comment"/></bygroups></rule>
+      <rule pattern="(#\{?)([a-zA-Z_]\w*)(\}?)(\s?\()"><bygroups><token type="CommentPreproc"/><token type="NameFunction"/><token type="CommentPreproc"/><token type="Punctuation"/></bygroups><push state="directiveparams"/></rule>
+      <rule pattern="(#\{?)([a-zA-Z_]\w*)(\}|\b)"><bygroups><token type="CommentPreproc"/><token type="NameFunction"/><token type="CommentPreproc"/></bygroups></rule>
+      <rule pattern="\$!?\{?"><token type="Punctuation"/><push state="variable"/></rule>
+    </state>
+    <state name="variable">
+      <rule pattern="[a-zA-Z_]\w*"><token type="NameVariable"/></rule>
+      <rule pattern="\("><token type="Punctuation"/><push state="funcparams"/></rule>
+      <rule pattern="(\.)([a-zA-Z_]\w*)"><bygroups><token type="Punctuation"/><token type="NameVariable"/></bygroups><push/></rule>
+      <rule pattern="\}"><token type="Punctuation"/><pop depth="1"/></rule>
+      <rule><pop depth="1"/></rule>
+    </state>
+    <state name="directiveparams">
+      <rule pattern="(&amp;&amp;|\|\||==?|!=?|[-&lt;&gt;+*%&amp;|^/])|\b(eq|ne|gt|lt|ge|le|not|in)\b"><token type="Operator"/></rule>
+      <rule pattern="\["><token type="Operator"/><push state="rangeoperator"/></rule>
+      <rule pattern="\b[a-zA-Z_]\w*\b"><token type="NameFunction"/></rule>
+      <rule><include state="funcparams"/></rule>
+    </state>
+    <state name="rangeoperator">
+      <rule pattern="\.\."><token type="Operator"/></rule>
+      <rule><include state="funcparams"/></rule>
+      <rule pattern="\]"><token type="Operator"/><pop depth="1"/></rule>
+    </state>
+    <state name="funcparams">
+      <rule pattern="\$!?\{?"><token type="Punctuation"/><push state="variable"/></rule>
+      <rule pattern="\s+"><token type="Text"/></rule>
+      <rule pattern="[,:]"><token type="Punctuation"/></rule>
+      <rule pattern="&quot;(\\\\|\\[^\\]|[^&quot;\\])*&quot;"><token type="LiteralStringDouble"/></rule>
+      <rule pattern="&#x27;(\\\\|\\[^\\]|[^&#x27;\\])*&#x27;"><token type="LiteralStringSingle"/></rule>
+      <rule pattern="0[xX][0-9a-fA-F]+[Ll]?"><token type="LiteralNumber"/></rule>
+      <rule pattern="\b[0-9]+\b"><token type="LiteralNumber"/></rule>
+      <rule pattern="(true|false|null)\b"><token type="KeywordConstant"/></rule>
+      <rule pattern="\("><token type="Punctuation"/><push/></rule>
+      <rule pattern="\)"><token type="Punctuation"/><pop depth="1"/></rule>
+      <rule pattern="\{"><token type="Punctuation"/><push/></rule>
+      <rule pattern="\}"><token type="Punctuation"/><pop depth="1"/></rule>
+      <rule pattern="\["><token type="Punctuation"/><push/></rule>
+      <rule pattern="\]"><token type="Punctuation"/><pop depth="1"/></rule>
+    </state>
+  </rules>
+</lexer>
+
--- a/lexers/bbcode.xml
+++ b/lexers/bbcode.xml
@ -0,0 +1,22 @@
+
+<lexer>
+  <config>
+    <name>BBCode</name>
+    <alias>bbcode</alias>
+    <mime_type>text/x-bbcode</mime_type>
+  </config>
+  <rules>
+    <state name="root">
+      <rule pattern="[^[]+"><token type="Text"/></rule>
+      <rule pattern="\[/?\w+"><token type="Keyword"/><push state="tag"/></rule>
+      <rule pattern="\["><token type="Text"/></rule>
+    </state>
+    <state name="tag">
+      <rule pattern="\s+"><token type="Text"/></rule>
+      <rule pattern="(\w+)(=)(&quot;?[^\s&quot;\]]+&quot;?)"><bygroups><token type="NameAttribute"/><token type="Operator"/><token type="LiteralString"/></bygroups></rule>
+      <rule pattern="(=)(&quot;?[^\s&quot;\]]+&quot;?)"><bygroups><token type="Operator"/><token type="LiteralString"/></bygroups></rule>
+      <rule pattern="\]"><token type="Keyword"/><pop depth="1"/></rule>
+    </state>
+  </rules>
+</lexer>
+
--- a/lexers/groff.xml
+++ b/lexers/groff.xml
@ -3,6 +3,7 @@
    <name>Groff</name>
    <alias>groff</alias>
    <alias>nroff</alias>
+    <alias>roff</alias>
    <alias>man</alias>
    <filename>*.[1-9]</filename>
    <filename>*.1p</filename>
--- a/heuristics/heuristics.yml
+++ b/heuristics/heuristics.yml
@ -30,12 +30,12 @@
 disambiguations:
 - extensions: ['.1', '.2', '.3', '.4', '.5', '.6', '.7', '.8', '.9']
  rules:
-  - language: Roff Manpage
+  - language: man
    and:
    - named_pattern: mdoc-date
    - named_pattern: mdoc-title
    - named_pattern: mdoc-heading
-  - language: Roff Manpage
+  - language: man
    and:
    - named_pattern: man-title
    - named_pattern: man-heading
@ -43,12 +43,12 @@ disambiguations:
    pattern: '^\.(?:[A-Za-z]{2}(?:\s|$)|\\")'
 - extensions: ['.1in', '.1m', '.1x', '.3in', '.3m', '.3p', '.3pm', '.3qt', '.3x', '.man', '.mdoc']
  rules:
-  - language: Roff Manpage
+  - language: man
    and:
    - named_pattern: mdoc-date
    - named_pattern: mdoc-title
    - named_pattern: mdoc-heading
-  - language: Roff Manpage
+  - language: man
    and:
    - named_pattern: man-title
    - named_pattern: man-heading
--- a/lexers/markdown.xml
+++ b/lexers/markdown.xml
@ -0,0 +1,56 @@
+
+<lexer>
+  <config>
+    <name>Markdown</name>
+    <alias>markdown</alias>
+    <alias>md</alias>
+    <filename>*.md</filename>
+    <filename>*.markdown</filename>
+    <mime_type>text/x-markdown</mime_type>
+  </config>
+  <rules>
+    <state name="root">
+      <rule pattern="(^#[^#].+)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
+      <rule pattern="(^#{2,6}[^#].+)(\n)"><bygroups><token type="GenericSubheading"/><token type="Text"/></bygroups></rule>
+      <rule pattern="^(.+)(\n)(=+)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
+      <rule pattern="^(.+)(\n)(-+)(\n)"><bygroups><token type="GenericSubheading"/><token type="Text"/><token type="GenericSubheading"/><token type="Text"/></bygroups></rule>
+      <rule pattern="^(\s*)([*-] )(\[[ xX]\])( .+\n)"><bygroups><token type="TextWhitespace"/><token type="Keyword"/><token type="Keyword"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^(\s*)([*-])(\s)(.+\n)"><bygroups><token type="TextWhitespace"/><token type="Keyword"/><token type="TextWhitespace"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^(\s*)([0-9]+\.)( .+\n)"><bygroups><token type="TextWhitespace"/><token type="Keyword"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^(\s*&gt;\s)(.+\n)"><bygroups><token type="Keyword"/><token type="GenericEmph"/></bygroups></rule>
+      <rule pattern="^(```\n)([\w\W]*?)(^```$)">
+      <bygroups>
+        <token type="LiteralStringBacktick"/>
+        <token type="Text"/>
+        <token type="LiteralStringBacktick"/>
+      </bygroups>
+      </rule>
+      <rule pattern="^(```)(\w+)(\n)([\w\W]*?)(^```$)">
+        <bygroups>
+          <token type="LiteralStringBacktick"/>
+          <token type="NameLabel"/>  
+          <token type="TextWhitespace"/>
+          <UsingByGroup lexer="2" content="4"/>  
+          <token type="LiteralStringBacktick"/>
+        </bygroups>
+      </rule>
+      <rule><include state="inline"/></rule>
+    </state>
+    <state name="inline">
+      <rule pattern="\\."><token type="Text"/></rule>
+      <rule pattern="([^`]?)(`[^`\n]+`)"><bygroups><token type="Text"/><token type="LiteralStringBacktick"/></bygroups></rule>
+      <rule pattern="([^\*]?)(\*\*[^* \n][^*\n]*\*\*)"><bygroups><token type="Text"/><token type="GenericStrong"/></bygroups></rule>
+      <rule pattern="([^_]?)(__[^_ \n][^_\n]*__)"><bygroups><token type="Text"/><token type="GenericStrong"/></bygroups></rule>
+      <rule pattern="([^\*]?)(\*[^* \n][^*\n]*\*)"><bygroups><token type="Text"/><token type="GenericEmph"/></bygroups></rule>
+      <rule pattern="([^_]?)(_[^_ \n][^_\n]*_)"><bygroups><token type="Text"/><token type="GenericEmph"/></bygroups></rule>
+      <rule pattern="([^~]?)(~~[^~ \n][^~\n]*~~)"><bygroups><token type="Text"/><token type="GenericDeleted"/></bygroups></rule>
+      <rule pattern="[@#][\w/:]+"><token type="NameEntity"/></rule>
+      <rule pattern="(!?\[)([^]]+)(\])(\()([^)]+)(\))"><bygroups><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="Text"/><token type="NameAttribute"/><token type="Text"/></bygroups></rule>
+      <rule pattern="(\[)([^]]+)(\])(\[)([^]]*)(\])"><bygroups><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="Text"/><token type="NameLabel"/><token type="Text"/></bygroups></rule>
+      <rule pattern="^(\s*\[)([^]]*)(\]:\s*)(.+)"><bygroups><token type="Text"/><token type="NameLabel"/><token type="Text"/><token type="NameAttribute"/></bygroups></rule>
+      <rule pattern="[^\\\s]+"><token type="Text"/></rule>
+      <rule pattern="."><token type="Text"/></rule>
+    </state>
+  </rules>
+</lexer>
+
--- a/lexers/moinwiki.xml
+++ b/lexers/moinwiki.xml
@ -0,0 +1,34 @@
+
+<lexer>
+  <config>
+    <name>MoinMoin/Trac Wiki markup</name>
+    <alias>trac-wiki</alias>
+    <alias>moin</alias>
+    <mime_type>text/x-trac-wiki</mime_type>
+    <case_insensitive>true</case_insensitive>
+  </config>
+  <rules>
+    <state name="root">
+      <rule pattern="^#.*$"><token type="Comment"/></rule>
+      <rule pattern="(!)(\S+)"><bygroups><token type="Keyword"/><token type="Text"/></bygroups></rule>
+      <rule pattern="^(=+)([^=]+)(=+)(\s*#.+)?$"><bygroups><token type="GenericHeading"/><usingself state="root"/><token type="GenericHeading"/><token type="LiteralString"/></bygroups></rule>
+      <rule pattern="(\{\{\{)(\n#!.+)?"><bygroups><token type="NameBuiltin"/><token type="NameNamespace"/></bygroups><push state="codeblock"/></rule>
+      <rule pattern="(\&#x27;\&#x27;\&#x27;?|\|\||`|__|~~|\^|,,|::)"><token type="Comment"/></rule>
+      <rule pattern="^( +)([.*-])( )"><bygroups><token type="Text"/><token type="NameBuiltin"/><token type="Text"/></bygroups></rule>
+      <rule pattern="^( +)([a-z]{1,5}\.)( )"><bygroups><token type="Text"/><token type="NameBuiltin"/><token type="Text"/></bygroups></rule>
+      <rule pattern="\[\[\w+.*?\]\]"><token type="Keyword"/></rule>
+      <rule pattern="(\[[^\s\]]+)(\s+[^\]]+?)?(\])"><bygroups><token type="Keyword"/><token type="LiteralString"/><token type="Keyword"/></bygroups></rule>
+      <rule pattern="^----+$"><token type="Keyword"/></rule>
+      <rule pattern="[^\n\&#x27;\[{!_~^,|]+"><token type="Text"/></rule>
+      <rule pattern="\n"><token type="Text"/></rule>
+      <rule pattern="."><token type="Text"/></rule>
+    </state>
+    <state name="codeblock">
+      <rule pattern="\}\}\}"><token type="NameBuiltin"/><pop depth="1"/></rule>
+      <rule pattern="\{\{\{"><token type="Text"/><push/></rule>
+      <rule pattern="[^{}]+"><token type="CommentPreproc"/></rule>
+      <rule pattern="."><token type="CommentPreproc"/></rule>
+    </state>
+  </rules>
+</lexer>
+
--- a/lexers/rst.xml
+++ b/lexers/rst.xml
@ -0,0 +1,76 @@
+
+<lexer>
+  <config>
+    <name>reStructuredText</name>
+    <alias>restructuredtext</alias>
+    <alias>rst</alias>
+    <alias>rest</alias>
+    <filename>*.rst</filename>
+    <filename>*.rest</filename>
+    <mime_type>text/x-rst</mime_type>
+    <mime_type>text/prs.fallenstein.rst</mime_type>
+  </config>
+  <rules>
+    <state name="root">
+      <rule pattern="^(=+|-+|`+|:+|\.+|\&#x27;+|&quot;+|~+|\^+|_+|\*+|\++|#+)([ \t]*\n)(.+)(\n)(\1)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
+      <rule pattern="^(\S.*)(\n)(={3,}|-{3,}|`{3,}|:{3,}|\.{3,}|\&#x27;{3,}|&quot;{3,}|~{3,}|\^{3,}|_{3,}|\*{3,}|\+{3,}|#{3,})(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
+      <rule pattern="^(\s*)([-*+])( .+\n(?:\1  .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^(\s*)([0-9#ivxlcmIVXLCM]+\.)( .+\n(?:\1  .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^(\s*)(\(?[0-9#ivxlcmIVXLCM]+\))( .+\n(?:\1  .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^(\s*)([A-Z]+\.)( .+\n(?:\1  .+\n)+)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^(\s*)(\(?[A-Za-z]+\))( .+\n(?:\1  .+\n)+)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^(\s*)(\|)( .+\n(?:\|  .+\n)*)"><bygroups><token type="Text"/><token type="Operator"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^( *\.\.)(\s*)((?:source)?code(?:-block)?)(::)([ \t]*)([^\n]+)(\n[ \t]*\n)([ \t]+)(.*)(\n)((?:(?:\8.*)?\n)+)"> 
+        <bygroups>
+          <token type="Punctuation"/>
+          <token type="Text"/>
+          <token type="OperatorWord"/>
+          <token type="Punctuation"/>
+          <token type="Text"/>
+          <token type="Keyword"/>
+          <token type="Text"/>
+          <token type="Text"/>
+          <UsingByGroup lexer="6" content="9,10,11"/>
+        </bygroups>
+      </rule>
+      <rule pattern="^( *\.\.)(\s*)([\w:-]+?)(::)(?:([ \t]*)(.*))">
+        <bygroups>
+          <token type="Punctuation"/>
+          <token type="Text"/>
+          <token type="OperatorWord"/>
+          <token type="Punctuation"/>
+          <token type="Text"/>
+          <usingself state="inline"/>
+        </bygroups>
+      </rule>
+      <rule pattern="^( *\.\.)(\s*)(_(?:[^:\\]|\\.)+:)(.*?)$"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^( *\.\.)(\s*)(\[.+\])(.*?)$"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^( *\.\.)(\s*)(\|.+\|)(\s*)([\w:-]+?)(::)(?:([ \t]*)(.*))"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="OperatorWord"/><token type="Punctuation"/><token type="Text"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="^ *\.\..*(\n( +.*\n|\n)+)?"><token type="Comment"/></rule>
+      <rule pattern="^( *)(:(?:\\\\|\\:|[^:\n])+:(?=\s))([ \t]*)"><bygroups><token type="Text"/><token type="NameClass"/><token type="Text"/></bygroups></rule>
+      <rule pattern="^(\S.*(?&lt;!::)\n)((?:(?: +.*)\n)+)"><bygroups><usingself state="inline"/><usingself state="inline"/></bygroups></rule>
+      <rule pattern="(::)(\n[ \t]*\n)([ \t]+)(.*)(\n)((?:(?:\3.*)?\n)+)"><bygroups><token type="LiteralStringEscape"/><token type="Text"/><token type="LiteralString"/><token type="LiteralString"/><token type="Text"/><token type="LiteralString"/></bygroups></rule>
+      <rule><include state="inline"/></rule>
+    </state>
+    <state name="inline">
+      <rule pattern="\\."><token type="Text"/></rule>
+      <rule pattern="``"><token type="LiteralString"/><push state="literal"/></rule>
+      <rule pattern="(`.+?)(&lt;.+?&gt;)(`__?)"><bygroups><token type="LiteralString"/><token type="LiteralStringInterpol"/><token type="LiteralString"/></bygroups></rule>
+      <rule pattern="`.+?`__?"><token type="LiteralString"/></rule>
+      <rule pattern="(`.+?`)(:[a-zA-Z0-9:-]+?:)?"><bygroups><token type="NameVariable"/><token type="NameAttribute"/></bygroups></rule>
+      <rule pattern="(:[a-zA-Z0-9:-]+?:)(`.+?`)"><bygroups><token type="NameAttribute"/><token type="NameVariable"/></bygroups></rule>
+      <rule pattern="\*\*.+?\*\*"><token type="GenericStrong"/></rule>
+      <rule pattern="\*.+?\*"><token type="GenericEmph"/></rule>
+      <rule pattern="\[.*?\]_"><token type="LiteralString"/></rule>
+      <rule pattern="&lt;.+?&gt;"><token type="NameTag"/></rule>
+      <rule pattern="[^\\\n\[*`:]+"><token type="Text"/></rule>
+      <rule pattern="."><token type="Text"/></rule>
+    </state>
+    <state name="literal">
+      <rule pattern="[^`]+"><token type="LiteralString"/></rule>
+      <rule pattern="``((?=$)|(?=[-/:.,; \n\x00‐‑‒–— &#x27;&quot;\)\]\}&gt;’”»!\?]))"><token type="LiteralString"/><pop depth="1"/></rule>
+      <rule pattern="`"><token type="LiteralString"/></rule>
+    </state>
+  </rules>
+</lexer>
+
--- a/scripts/lexer_metadata.py
+++ b/scripts/lexer_metadata.py
@ -40,15 +40,18 @@ for fname in glob.glob("lexers/*.xml"):
 with open("src/constants/lexers.cr", "w") as f:
    f.write("module Tartrazine\n")
    f.write("  LEXERS_BY_NAME = {\n")
-    for k, v in lexer_by_name.items():
+    for k in sorted(lexer_by_name.keys()):
+        v = lexer_by_name[k]
        f.write(f'"{k}" => "{v}", \n')
    f.write("}\n")
    f.write("  LEXERS_BY_MIMETYPE = {\n")
-    for k, v in lexer_by_mimetype.items():
+    for k in sorted(lexer_by_mimetype.keys()):
+        v = lexer_by_mimetype[k]
        f.write(f'"{k}" => "{v}", \n')
    f.write("}\n")
    f.write("  LEXERS_BY_FILENAME = {\n")
-    for k, v in lexer_by_filename.items():
+    for k in sorted(lexer_by_filename.keys()):
+        v = lexer_by_filename[k]
        f.write(f'"{k}" => {str(list(v)).replace("'", "\"")}, \n')
    f.write("}\n")
    f.write("end\n")
--- a/shard.yml
+++ b/shard.yml
@ -1,5 +1,5 @@
 name: tartrazine
-version: 0.4.0
+version: 0.6.0

 authors:
  - Roberto Alsina <roberto.alsina@gmail.com>
--- a/spec/tartrazine_spec.cr
+++ b/spec/tartrazine_spec.cr
@ -72,8 +72,8 @@ end

 # Helper that creates lexer and tokenizes
 def tokenize(lexer_name, text)
-  lexer = Tartrazine.lexer(lexer_name)
-  lexer.tokenize(text)
+  tokenizer = Tartrazine.lexer(lexer_name).tokenizer(text)
+  Tartrazine::Lexer.collapse_tokens(tokenizer.to_a)
 end

 # Helper that tokenizes using chroma to validate the lexer
--- a/src/actions.cr
+++ b/src/actions.cr
@ -8,19 +8,32 @@ require "./tartrazine"
 # perform a list of actions. These actions can emit tokens
 # or change the state machine.
 module Tartrazine
-  class Action
-    property type : String
-    property xml : XML::Node
+  enum ActionType
+    Bygroups
+    Combined
+    Include
+    Pop
+    Push
+    Token
+    Using
+    Usingbygroup
+    Usingself
+  end
+
+  struct Action
    property actions : Array(Action) = [] of Action

-    property token_type : String = ""
-    property states_to_push : Array(String) = [] of String
-    property depth = 0
-    property lexer_name : String = ""
-    property states_to_combine : Array(String) = [] of String
+    @content_index : Array(Int32) = [] of Int32
+    @depth : Int32 = 0
+    @lexer_index : Int32 = 0
+    @lexer_name : String = ""
+    @states : Array(String) = [] of String
+    @states_to_push : Array(String) = [] of String
+    @token_type : String = ""
+    @type : ActionType = ActionType::Token

-    def initialize(@type : String, @xml : XML::Node?)
-      # Extract information from the XML node we will use later
+    def initialize(t : String, xml : XML::Node?)
+      @type = ActionType.parse(t.capitalize)

      # Some actions may have actions in them, like this:
      # <bygroups>
@ -31,61 +44,56 @@ module Tartrazine
      #
      # The token actions match with the first 2 groups in the regex
      # the using action matches the 3rd and shunts it to another lexer
-
-      known_types = %w(token push pop bygroups using usingself include combined)
-      raise Exception.new(
-        "Unknown action type: #{@type}") unless known_types.includes? @type
-
-      @xml.children.each do |node|
+      xml.children.each do |node|
        next unless node.element?
        @actions << Action.new(node.name, node)
      end

+      # Prefetch the attributes we ned from the XML and keep them
      case @type
-      when "token"
-        @token_type = xml["type"]? || ""
-      when "push"
+      when ActionType::Token
+        @token_type = xml["type"]
+      when ActionType::Push
        @states_to_push = xml.attributes.select { |attrib|
          attrib.name == "state"
-        }.map &.content || [] of String
-      when "pop"
-        @depth = xml["depth"]?.try &.to_i || 0
-      when "using"
-        @lexer_name = xml["lexer"]?.try &.downcase || ""
-      when "combined"
-        @states_to_combine = xml.attributes.select { |attrib|
+        }.map &.content
+      when ActionType::Pop
+        @depth = xml["depth"].to_i
+      when ActionType::Using
+        @lexer_name = xml["lexer"].downcase
+      when ActionType::Combined
+        @states = xml.attributes.select { |attrib|
          attrib.name == "state"
        }.map &.content
+      when ActionType::Usingbygroup
+        @lexer_index = xml["lexer"].to_i
+        @content_index = xml["content"].split(",").map(&.to_i)
      end
    end

    # ameba:disable Metrics/CyclomaticComplexity
-    def emit(match : MatchData, lexer : Lexer, match_group = 0) : Array(Token)
-      case type
-      when "token"
+    def emit(match : MatchData, tokenizer : Tokenizer, match_group = 0) : Array(Token)
+      case @type
+      when ActionType::Token
        raise Exception.new "Can't have a token without a match" if match.empty?
        [Token.new(type: @token_type, value: String.new(match[match_group].value))]
-      when "push"
-        if @states_to_push.empty?
-          # Push without a state means push the current state
-          @states_to_push = [lexer.state_stack.last]
-        end
-        @states_to_push.each do |state|
-          if state == "#pop"
+      when ActionType::Push
+        to_push = @states_to_push.empty? ? [tokenizer.state_stack.last] : @states_to_push
+        to_push.each do |state|
+          if state == "#pop" && tokenizer.state_stack.size > 1
            # Pop the state
-            lexer.state_stack.pop
+            tokenizer.state_stack.pop
          else
            # Really push
-            lexer.state_stack << state
+            tokenizer.state_stack << state
          end
        end
        [] of Token
-      when "pop"
-        if lexer.state_stack.size > @depth
-          lexer.state_stack.pop(@depth)
-        end
+      when ActionType::Pop
+        to_pop = [@depth, tokenizer.state_stack.size - 1].min
+        tokenizer.state_stack.pop(to_pop)
        [] of Token
-      when "bygroups"
+      when ActionType::Bygroups
        # FIXME: handle
        # ><bygroups>
        # <token type="Punctuation"/>
@ -94,7 +102,7 @@ module Tartrazine
        #
        # where that None means skipping a group
        #
-        raise Exception.new "Can't have a token without a match" if match.empty?
+        raise Exception.new "Can't have a token without a match" if match.nil?

        # Each group matches an action. If the group match is empty,
        # the action is skipped.
@ -103,33 +111,47 @@ module Tartrazine
          begin
            next if match[i + 1].size == 0
          rescue IndexError
-            # No match for the last group
+            # FIXME: This should not actually happen
+            # No match for this group
            next
          end
-          result += e.emit(match, lexer, i + 1)
+          result += e.emit(match, tokenizer, i + 1)
        end
        result
-      when "using"
+      when ActionType::Using
        # Shunt to another lexer entirely
        return [] of Token if match.empty?
-        Tartrazine.lexer(@lexer_name).tokenize(String.new(match[match_group].value), usingself: true)
-      when "usingself"
+        Tartrazine.lexer(@lexer_name).tokenizer(
+          String.new(match[match_group].value),
+          secondary: true).to_a
+      when ActionType::Usingself
        # Shunt to another copy of this lexer
        return [] of Token if match.empty?
-        new_lexer = Lexer.from_xml(lexer.xml)
-        new_lexer.tokenize(String.new(match[match_group].value), usingself: true)
-      when "combined"
-        # Combine two states into one anonymous state
-        new_state = @states_to_combine.map { |name|
-          lexer.states[name]
+        tokenizer.lexer.tokenizer(
+          String.new(match[match_group].value),
+          secondary: true).to_a
+      when ActionType::Combined
+        # Combine two or more states into one anonymous state
+        new_state = @states.map { |name|
+          tokenizer.lexer.states[name]
        }.reduce { |state1, state2|
          state1 + state2
        }
-        lexer.states[new_state.name] = new_state
-        lexer.state_stack << new_state.name
+        tokenizer.lexer.states[new_state.name] = new_state
+        tokenizer.state_stack << new_state.name
        [] of Token
+      when ActionType::Usingbygroup
+        # Shunt to content-specified lexer
+        return [] of Token if match.empty?
+        content = ""
+        @content_index.each do |i|
+          content += String.new(match[i].value)
+        end
+        Tartrazine.lexer(String.new(match[@lexer_index].value)).tokenizer(
+          content,
+          secondary: true).to_a
      else
-        raise Exception.new("Unhandled action type: #{type}")
+        raise Exception.new("Unknown action type: #{@type}")
      end
    end
  end
--- a/src/bytes_regex.cr
+++ b/src/bytes_regex.cr
@ -3,7 +3,7 @@ module BytesRegex

  class Regex
    def initialize(pattern : String, multiline = false, dotall = false, ignorecase = false, anchored = false)
-      flags = LibPCRE2::UTF | LibPCRE2::DUPNAMES | LibPCRE2::UCP | LibPCRE2::NO_UTF_CHECK
+      flags = LibPCRE2::UTF | LibPCRE2::UCP | LibPCRE2::NO_UTF_CHECK
      flags |= LibPCRE2::MULTILINE if multiline
      flags |= LibPCRE2::DOTALL if dotall
      flags |= LibPCRE2::CASELESS if ignorecase
@ -31,7 +31,6 @@ module BytesRegex
    end

    def match(str : Bytes, pos = 0) : Array(Match)
-      match = [] of Match
      rc = LibPCRE2.match(
        @re,
        str,
@ -40,24 +39,25 @@ module BytesRegex
        LibPCRE2::NO_UTF_CHECK,
        @match_data,
        nil)
-      if rc >= 0
+      if rc > 0
        ovector = LibPCRE2.get_ovector_pointer(@match_data)
-        (0...rc).each do |i|
+        (0...rc).map do |i|
          m_start = ovector[2 * i]
-          m_size = ovector[2 * i + 1] - m_start
-          if m_size == 0
+          m_end = ovector[2 * i + 1]
+          if m_start == m_end
            m_value = Bytes.new(0)
          else
-            m_value = str[m_start...m_start + m_size]
+            m_value = str[m_start...m_end]
          end
-          match << Match.new(m_value, m_start, m_size)
+          Match.new(m_value, m_start, m_end - m_start)
        end
+      else
+        [] of Match
      end
-      match
    end
  end

-  class Match
+  struct Match
    property value : Bytes
    property start : UInt64
    property size : UInt64
--- a/src/constants/lexers.cr
+++ b/src/constants/lexers.cr
--- a/src/formatter.cr
+++ b/src/formatter.cr
@ -12,6 +12,10 @@ module Tartrazine
    property theme : Theme = Tartrazine.theme("default-dark")

    # Format the text using the given lexer.
+    def format(text : String, lexer : Lexer, io : IO = nil) : Nil
+      raise Exception.new("Not implemented")
+    end
+
    def format(text : String, lexer : Lexer) : String
      raise Exception.new("Not implemented")
    end
--- a/src/formatters/ansi.cr
+++ b/src/formatters/ansi.cr
@ -7,17 +7,27 @@ module Tartrazine
    def initialize(@theme : Theme = Tartrazine.theme("default-dark"), @line_numbers : Bool = false)
    end

+    private def line_label(i : Int32) : String
+      "#{i + 1}".rjust(4).ljust(5)
+    end
+
    def format(text : String, lexer : Lexer) : String
-      output = String.build do |outp|
-        lexer.group_tokens_in_lines(lexer.tokenize(text)).each_with_index do |line, i|
-          label = line_numbers? ? "#{i + 1}".rjust(4).ljust(5) : ""
-          outp << label
-          line.each do |token|
-            outp << colorize(token[:value], token[:type])
-          end
+      outp = String::Builder.new("")
+      format(text, lexer, outp)
+      outp.to_s
+    end
+
+    def format(text : String, lexer : BaseLexer, outp : IO) : Nil
+      tokenizer = lexer.tokenizer(text)
+      i = 0
+      outp << line_label(i) if line_numbers?
+      tokenizer.each do |token|
+        outp << colorize(token[:value], token[:type])
+        if token[:value].includes?("\n")
+          i += 1
+          outp << line_label(i) if line_numbers?
        end
      end
-      output
    end

    def colorize(text : String, token : String) : String
--- a/src/formatters/html.cr
+++ b/src/formatters/html.cr
@ -1,5 +1,6 @@
 require "../constants/token_abbrevs.cr"
 require "../formatter"
+require "html"

 module Tartrazine
  class Html < Formatter
@ -34,46 +35,52 @@ module Tartrazine
    end

    def format(text : String, lexer : Lexer) : String
-      text = format_text(text, lexer)
-      if standalone?
-        text = wrap_standalone(text)
-      end
-      text
+      outp = String::Builder.new("")
+      format(text, lexer, outp)
+      outp.to_s
+    end
+
+    def format(text : String, lexer : BaseLexer, io : IO) : Nil
+      pre, post = wrap_standalone
+      io << pre if standalone?
+      format_text(text, lexer, io)
+      io << post if standalone?
    end

    # Wrap text into a full HTML document, including the CSS for the theme
-    def wrap_standalone(text) : String
+    def wrap_standalone
      output = String.build do |outp|
        outp << "<!DOCTYPE html><html><head><style>"
        outp << style_defs
        outp << "</style></head><body>"
-        outp << text
-        outp << "</body></html>"
      end
-      output
+      {output.to_s, "</body></html>"}
    end

-    def format_text(text : String, lexer : Lexer) : String
-      lines = lexer.group_tokens_in_lines(lexer.tokenize(text))
-      output = String.build do |outp|
-        if surrounding_pre?
-          pre_style = wrap_long_lines? ? "style=\"white-space: pre-wrap; word-break: break-word;\"" : ""
-          outp << "<pre class=\"#{get_css_class("Background")}\" #{pre_style}>"
-        end
-        outp << "<code class=\"#{get_css_class("Background")}\">"
-        lines.each_with_index(offset: line_number_start - 1) do |line, i|
-          line_label = line_numbers? ? "#{i + 1}".rjust(4).ljust(5) : ""
-          line_class = highlighted?(i + 1) ? "class=\"#{get_css_class("LineHighlight")}\"" : ""
-          line_id = linkable_line_numbers? ? "id=\"#{line_number_id_prefix}#{i + 1}\"" : ""
-          outp << "<span #{line_id} #{line_class} style=\"user-select: none;\">#{line_label} </span>"
-          line.each do |token|
-            fragment = "<span class=\"#{get_css_class(token[:type])}\">#{token[:value]}</span>"
-            outp << fragment
-          end
-        end
-        outp << "</code></pre>"
+    private def line_label(i : Int32) : String
+      line_label = "#{i + 1}".rjust(4).ljust(5)
+      line_class = highlighted?(i + 1) ? "class=\"#{get_css_class("LineHighlight")}\"" : ""
+      line_id = linkable_line_numbers? ? "id=\"#{line_number_id_prefix}#{i + 1}\"" : ""
+      "<span #{line_id} #{line_class} style=\"user-select: none;\">#{line_label} </span>"
+    end
+
+    def format_text(text : String, lexer : BaseLexer, outp : IO)
+      tokenizer = lexer.tokenizer(text)
+      i = 0
+      if surrounding_pre?
+        pre_style = wrap_long_lines? ? "style=\"white-space: pre-wrap; word-break: break-word;\"" : ""
+        outp << "<pre class=\"#{get_css_class("Background")}\" #{pre_style}>"
      end
-      output
+      outp << "<code class=\"#{get_css_class("Background")}\">"
+      outp << line_label(i) if line_numbers?
+      tokenizer.each do |token|
+        outp << "<span class=\"#{get_css_class(token[:type])}\">#{HTML.escape(token[:value])}</span>"
+        if token[:value].ends_with? "\n"
+          i += 1
+          outp << line_label(i) if line_numbers?
+        end
+      end
+      outp << "</code></pre>"
    end

    # ameba:disable Metrics/CyclomaticComplexity
@ -104,15 +111,17 @@ module Tartrazine

    # Given a token type, return the CSS class to use.
    def get_css_class(token : String) : String
-      return class_prefix + Abbreviations[token] if theme.styles.has_key?(token)
-
-      # Themes don't contain information for each specific
-      # token type. However, they may contain information
-      # for a parent style. Worst case, we go to the root
-      # (Background) style.
-      class_prefix + Abbreviations[theme.style_parents(token).reverse.find { |parent|
-        theme.styles.has_key?(parent)
-      }]
+      if !theme.styles.has_key? token
+        # Themes don't contain information for each specific
+        # token type. However, they may contain information
+        # for a parent style. Worst case, we go to the root
+        # (Background) style.
+        parent = theme.style_parents(token).reverse.find { |dad|
+          theme.styles.has_key?(dad)
+        }
+        theme.styles[token] = theme.styles[parent]
+      end
+      class_prefix + Abbreviations[token]
    end

    # Is this line in the highlighted ranges?
--- a/src/formatters/json.cr
+++ b/src/formatters/json.cr
@ -4,8 +4,15 @@ module Tartrazine
  class Json < Formatter
    property name = "json"

-    def format(text : String, lexer : Lexer, _theme : Theme) : String
-      lexer.tokenize(text).to_json
+    def format(text : String, lexer : BaseLexer) : String
+      outp = String::Builder.new("")
+      format(text, lexer, outp)
+      outp.to_s
+    end
+
+    def format(text : String, lexer : BaseLexer, io : IO) : Nil
+      tokenizer = lexer.tokenizer(text)
+      io << Tartrazine::Lexer.collapse_tokens(tokenizer.to_a).to_json
    end
  end
 end
--- a/src/heuristics.cr
+++ b/src/heuristics.cr
@ -1,8 +1,12 @@
 require "yaml"

-module Tartrazine
-  # Use linguist's heuristics to disambiguate between languages
-
+# Use linguist's heuristics to disambiguate between languages
+# This is *shamelessly* stolen from https://github.com/github-linguist/linguist
+# and ported to Crystal. Deepest thanks to the authors of Linguist
+# for licensing it liberally.
+#
+# Consider this code (c) 2017 GitHub, Inc. even if I wrote it.
+module Linguist
  class Heuristic
    include YAML::Serializable

@ -34,12 +38,13 @@ module Tartrazine
    end
  end

-  class Rule
+  class LangRule
    include YAML::Serializable
    property pattern : (String | Array(String))?
    property negative_pattern : (String | Array(String))?
    property named_pattern : String?
-    property and : Array(Rule)?
+    property and : Array(LangRule)?
+    property language : String | Array(String)?

    # ameba:disable Metrics/CyclomaticComplexity
    def match(content, named_patterns)
@ -68,17 +73,9 @@ module Tartrazine
        result = p_arr.any? { |pat| ::Regex.new(pat).matches?(content) }
      end
      if and
-        result = and.as(Array(Rule)).all?(&.match(content, named_patterns))
+        result = and.as(Array(LangRule)).all?(&.match(content, named_patterns))
      end
      result
    end
  end
-
-  class LangRule < Rule
-    include YAML::Serializable
-    property language : String | Array(String)
-  end
 end
-
-# h = Tartrazine::Heuristic.from_yaml(File.read("heuristics/heuristics.yml"))
-# p! h.run(ARGV[0], File.read(ARGV[0]))
--- a/src/lexer.cr
+++ b/src/lexer.cr
@ -4,111 +4,169 @@ require "./constants/lexers"
 module Tartrazine
  class LexerFiles
    extend BakedFileSystem
-
    bake_folder "../lexers", __DIR__
  end

  # Get the lexer object for a language name
  # FIXME: support mimetypes
-  def self.lexer(name : String? = nil, filename : String? = nil) : Lexer
-    if name.nil? && filename.nil?
+  def self.lexer(name : String? = nil, filename : String? = nil) : BaseLexer
+    return lexer_by_name(name) if name && name != "autodetect"
+    return lexer_by_filename(filename) if filename
+
+    Lexer.from_xml(LexerFiles.get("/#{LEXERS_BY_NAME["plaintext"]}.xml").gets_to_end)
+  end
+
+  private def self.lexer_by_name(name : String) : BaseLexer
+    lexer_file_name = LEXERS_BY_NAME.fetch(name.downcase, nil)
+    return create_delegating_lexer(name) if lexer_file_name.nil? && name.includes? "+"
+    raise Exception.new("Unknown lexer: #{name}") if lexer_file_name.nil?
+
+    Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end)
+  end
+
+  private def self.lexer_by_filename(filename : String) : BaseLexer
+    candidates = Set(String).new
+    LEXERS_BY_FILENAME.each do |k, v|
+      candidates += v.to_set if File.match?(k, File.basename(filename))
+    end
+
+    case candidates.size
+    when 0
      lexer_file_name = LEXERS_BY_NAME["plaintext"]
-    elsif name && name != "autodetect"
-      lexer_file_name = LEXERS_BY_NAME[name.downcase]
+    when 1
+      lexer_file_name = candidates.first
    else
-      # Guess by filename
-      candidates = Set(String).new
-      LEXERS_BY_FILENAME.each do |k, v|
-        candidates += v.to_set if File.match?(k, File.basename(filename.to_s))
-      end
-      case candidates.size
-      when 0
-        lexer_file_name = LEXERS_BY_NAME["plaintext"]
-      when 1
-        lexer_file_name = candidates.first
-      else
-        raise Exception.new("Multiple lexers match the filename: #{candidates.to_a.join(", ")}")
+      lexer_file_name = self.lexer_by_content(filename)
+      begin
+        return self.lexer(lexer_file_name)
+      rescue ex : Exception
+        raise Exception.new("Multiple lexers match the filename: #{candidates.to_a.join(", ")}, heuristics suggest #{lexer_file_name} but there is no matching lexer.")
      end
    end
+
    Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end)
  end

+  private def self.lexer_by_content(fname : String) : String?
+    h = Linguist::Heuristic.from_yaml(LexerFiles.get("/heuristics.yml").gets_to_end)
+    result = h.run(fname, File.read(fname))
+    case result
+    when Nil
+      raise Exception.new "No lexer found for #{fname}"
+    when String
+      result.as(String)
+    when Array(String)
+      result.first
+    end
+  end
+
+  private def self.create_delegating_lexer(name : String) : BaseLexer
+    language, root = name.split("+", 2)
+    language_lexer = lexer(language)
+    root_lexer = lexer(root)
+    DelegatingLexer.new(language_lexer, root_lexer)
+  end
+
  # Return a list of all lexers
  def self.lexers : Array(String)
    LEXERS_BY_NAME.keys.sort!
  end

+  # A token, the output of the tokenizer
+  alias Token = NamedTuple(type: String, value: String)
+
+  abstract class BaseTokenizer
+  end
+
+  class Tokenizer < BaseTokenizer
+    include Iterator(Token)
+    property lexer : BaseLexer
+    property text : Bytes
+    property pos : Int32 = 0
+    @dq = Deque(Token).new
+    property state_stack = ["root"]
+
+    def initialize(@lexer : BaseLexer, text : String, secondary = false)
+      # Respect the `ensure_nl` config option
+      if text.size > 0 && text[-1] != '\n' && @lexer.config[:ensure_nl] && !secondary
+        text += "\n"
+      end
+      @text = text.to_slice
+    end
+
+    def next : Iterator::Stop | Token
+      if @dq.size > 0
+        return @dq.shift
+      end
+      if pos == @text.size
+        return stop
+      end
+
+      matched = false
+      while @pos < @text.size
+        @lexer.states[@state_stack.last].rules.each do |rule|
+          matched, new_pos, new_tokens = rule.match(@text, @pos, self)
+          if matched
+            @pos = new_pos
+            split_tokens(new_tokens).each { |token| @dq << token }
+            break
+          end
+        end
+        if !matched
+          if @text[@pos] == 10u8
+            @dq << {type: "Text", value: "\n"}
+            @state_stack = ["root"]
+          else
+            @dq << {type: "Error", value: String.new(@text[@pos..@pos])}
+          end
+          @pos += 1
+          break
+        end
+      end
+      self.next
+    end
+
+    # If a token contains a newline, split it into two tokens
+    def split_tokens(tokens : Array(Token)) : Array(Token)
+      split_tokens = [] of Token
+      tokens.each do |token|
+        if token[:value].includes?("\n")
+          values = token[:value].split("\n")
+          values.each_with_index do |value, index|
+            value += "\n" if index < values.size - 1
+            split_tokens << {type: token[:type], value: value}
+          end
+        else
+          split_tokens << token
+        end
+      end
+      split_tokens
+    end
+  end
+
+  abstract class BaseLexer
+    property config = {
+      name:             "",
+      priority:         0.0,
+      case_insensitive: false,
+      dot_all:          false,
+      not_multiline:    false,
+      ensure_nl:        false,
+    }
+    property states = {} of String => State
+
+    def tokenizer(text : String, secondary = false) : BaseTokenizer
+      Tokenizer.new(self, text, secondary)
+    end
+  end
+
  # This implements a lexer for Pygments RegexLexers as expressed
  # in Chroma's XML serialization.
  #
  # For explanations on what actions and states do
  # the Pygments documentation is a good place to start.
  # https://pygments.org/docs/lexerdevelopment/
-  class Lexer
-    property config = {
-      name:             "",
-      aliases:          [] of String,
-      filenames:        [] of String,
-      mime_types:       [] of String,
-      priority:         0.0,
-      case_insensitive: false,
-      dot_all:          false,
-      not_multiline:    false,
-      ensure_nl:        false,
-    }
-    property xml : String = ""
-
-    property states = {} of String => State
-
-    property state_stack = ["root"]
-
-    # Turn the text into a list of tokens. The `usingself` parameter
-    # is true when the lexer is being used to tokenize a string
-    # from a larger text that is already being tokenized.
-    # So, when it's true, we don't modify the text.
-    def tokenize(text : String, usingself = false) : Array(Token)
-      @state_stack = ["root"]
-      tokens = [] of Token
-      pos = 0
-      matched = false
-
-      # Respect the `ensure_nl` config option
-      if text.size > 0 && text[-1] != '\n' && config[:ensure_nl] && !usingself
-        text += "\n"
-      end
-
-      text_bytes = text.to_slice
-      # Loop through the text, applying rules
-      while pos < text_bytes.size
-        state = states[@state_stack.last]
-        # Log.trace { "Stack is #{@state_stack} State is #{state.name}, pos is #{pos}, text is #{text[pos..pos + 10]}" }
-        state.rules.each do |rule|
-          matched, new_pos, new_tokens = rule.match(text_bytes, pos, self)
-          if matched
-            # Move position forward, save the tokens,
-            # tokenize from the new position
-            # Log.trace { "MATCHED: #{rule.xml}" }
-            pos = new_pos
-            tokens += new_tokens
-            break
-          end
-          # Log.trace { "NOT MATCHED: #{rule.xml}" }
-        end
-        # If no rule matches, emit an error token
-        unless matched
-          if text_bytes[pos] == 10u8
-            # at EOL, reset state to "root"
-            tokens << {type: "Text", value: "\n"}
-            @state_stack = ["root"]
-          else
-            tokens << {type: "Error", value: String.new(text_bytes[pos..pos])}
-          end
-          pos += 1
-        end
-      end
-      Lexer.collapse_tokens(tokens)
-    end
-
+  class Lexer < BaseLexer
    # Collapse consecutive tokens of the same type for easier comparison
    # and smaller output
    def self.collapse_tokens(tokens : Array(Tartrazine::Token)) : Array(Tartrazine::Token)
@ -131,34 +189,8 @@ module Tartrazine
      result
    end

-    # Group tokens into lines, splitting them when a newline is found
-    def group_tokens_in_lines(tokens : Array(Token)) : Array(Array(Token))
-      split_tokens = [] of Token
-      tokens.each do |token|
-        if token[:value].includes?("\n")
-          values = token[:value].split("\n")
-          values.each_with_index do |value, index|
-            value += "\n" if index < values.size - 1
-            split_tokens << {type: token[:type], value: value}
-          end
-        else
-          split_tokens << token
-        end
-      end
-      lines = [Array(Token).new]
-      split_tokens.each do |token|
-        lines.last << token
-        if token[:value].includes?("\n")
-          lines << Array(Token).new
-        end
-      end
-      lines
-    end
-
-    # ameba:disable Metrics/CyclomaticComplexity
    def self.from_xml(xml : String) : Lexer
      l = Lexer.new
-      l.xml = xml
      lexer = XML.parse(xml).first_element_child
      if lexer
        config = lexer.children.find { |node|
@ -167,9 +199,6 @@ module Tartrazine
        if config
          l.config = {
            name:             xml_to_s(config, name) || "",
-            aliases:          xml_to_a(config, _alias) || [] of String,
-            filenames:        xml_to_a(config, filename) || [] of String,
-            mime_types:       xml_to_a(config, mime_type) || [] of String,
            priority:         xml_to_f(config, priority) || 0.0,
            not_multiline:    xml_to_s(config, not_multiline) == "true",
            dot_all:          xml_to_s(config, dot_all) == "true",
@ -219,12 +248,66 @@ module Tartrazine
    end
  end

+  # A lexer that takes two lexers as arguments. A root lexer
+  # and a language lexer. Everything is scalled using the
+  # language lexer, afterwards all `Other` tokens are lexed
+  # using the root lexer.
+  #
+  # This is useful for things like template languages, where
+  # you have Jinja + HTML or Jinja + CSS and so on.
+  class DelegatingLexer < BaseLexer
+    property language_lexer : BaseLexer
+    property root_lexer : BaseLexer
+
+    def initialize(@language_lexer : BaseLexer, @root_lexer : BaseLexer)
+    end
+
+    def tokenizer(text : String, secondary = false) : DelegatingTokenizer
+      DelegatingTokenizer.new(self, text, secondary)
+    end
+  end
+
+  # This Tokenizer works with a DelegatingLexer. It first tokenizes
+  # using the language lexer, and "Other" tokens are tokenized using
+  # the root lexer.
+  class DelegatingTokenizer < BaseTokenizer
+    include Iterator(Token)
+    @dq = Deque(Token).new
+    @language_tokenizer : BaseTokenizer
+
+    def initialize(@lexer : DelegatingLexer, text : String, secondary = false)
+      # Respect the `ensure_nl` config option
+      if text.size > 0 && text[-1] != '\n' && @lexer.config[:ensure_nl] && !secondary
+        text += "\n"
+      end
+      @language_tokenizer = @lexer.language_lexer.tokenizer(text, true)
+    end
+
+    def next : Iterator::Stop | Token
+      if @dq.size > 0
+        return @dq.shift
+      end
+      token = @language_tokenizer.next
+      if token.is_a? Iterator::Stop
+        return stop
+      elsif token.as(Token).[:type] == "Other"
+        root_tokenizer = @lexer.root_lexer.tokenizer(token.as(Token).[:value], true)
+        root_tokenizer.each do |root_token|
+          @dq << root_token
+        end
+      else
+        @dq << token.as(Token)
+      end
+      self.next
+    end
+  end
+
  # A Lexer state. A state has a name and a list of rules.
  # The state machine has a state stack containing references
  # to states to decide which rules to apply.
-  class State
+  struct State
    property name : String = ""
-    property rules = [] of Rule
+    property rules = [] of BaseRule

    def +(other : State)
      new_state = State.new
@ -233,7 +316,4 @@ module Tartrazine
      new_state
    end
  end
-
-  # A token, the output of the tokenizer
-  alias Token = NamedTuple(type: String, value: String)
 end
--- a/src/main.cr
+++ b/src/main.cr
@ -1,18 +1,6 @@
 require "docopt"
 require "./**"

-# Performance data (in milliseconds):
-#
-# Docopt parsing:            0.5
-# Instantiating a theme:     0.1
-# Instantiating a formatter: 1.0
-# Instantiating a lexer:     2.0
-# Tokenizing crycco.cr:     16.0
-# Formatting:                0.5
-# I/O:                       1.5
-# ---------------------------------
-# Total:                    21.6
-
 HELP = <<-HELP
 tartrazine: a syntax highlighting tool

@ -32,7 +20,8 @@ Usage:
 Options:
  -f <formatter>      Format to use (html, terminal, json)
  -t <theme>          Theme to use, see --list-themes [default: default-dark]
-  -l <lexer>          Lexer (language) to use, see --list-lexers [default: autodetect]
+  -l <lexer>          Lexer (language) to use, see --list-lexers. Use more than
+                      one lexer with "+" (e.g. jinja+yaml) [default: autodetect]
  -o <output>         Output file. Default is stdout.
  --standalone        Generate a standalone HTML file, which includes
                      all style information. If not given, it will generate just
@ -89,20 +78,20 @@ if options["-f"]

  if formatter.is_a?(Tartrazine::Html) && options["--css"]
    File.open("#{options["-t"].as(String)}.css", "w") do |outf|
-      outf.puts formatter.style_defs
+      outf << formatter.style_defs
    end
    exit 0
  end

  lexer = Tartrazine.lexer(name: options["-l"].as(String), filename: options["FILE"].as(String))
+
  input = File.open(options["FILE"].as(String)).gets_to_end
-  output = formatter.format(input, lexer)

  if options["-o"].nil?
-    puts output
+    outf = STDOUT
  else
-    File.open(options["-o"].as(String), "w") do |outf|
-      outf.puts output
-    end
+    outf = File.open(options["-o"].as(String), "w")
  end
+  formatter.format(input, lexer, outf)
+  outf.close
 end
--- a/src/rules.cr
+++ b/src/rules.cr
@ -15,28 +15,11 @@ module Tartrazine
  alias Match = BytesRegex::Match
  alias MatchData = Array(Match)

-  class Rule
-    property pattern : Regex = Regex.new ""
-    property actions : Array(Action) = [] of Action
+  abstract struct BaseRule
+    abstract def match(text : Bytes, pos : Int32, tokenizer : Tokenizer) : Tuple(Bool, Int32, Array(Token))
+    abstract def initialize(node : XML::Node)

-    def match(text : Bytes, pos, lexer) : Tuple(Bool, Int32, Array(Token))
-      match = pattern.match(text, pos)
-      # We don't match if the match doesn't move the cursor
-      # because that causes infinite loops
-      return false, pos, [] of Token if match.empty? || match[0].size == 0
-      tokens = [] of Token
-      actions.each do |action|
-        tokens += action.emit(match, lexer)
-      end
-      return true, pos + match[0].size, tokens
-    end
-
-    def initialize(node : XML::Node, multiline, dotall, ignorecase)
-      pattern = node["pattern"]
-      pattern = "(?m)" + pattern if multiline
-      @pattern = Regex.new(pattern, multiline, dotall, ignorecase, true)
-      add_actions(node)
-    end
+    @actions : Array(Action) = [] of Action

    def add_actions(node : XML::Node)
      node.children.each do |child|
@ -46,14 +29,36 @@ module Tartrazine
    end
  end

+  struct Rule < BaseRule
+    property pattern : Regex = Regex.new ""
+
+    def match(text : Bytes, pos, tokenizer) : Tuple(Bool, Int32, Array(Token))
+      match = pattern.match(text, pos)
+
+      # No match
+      return false, pos, [] of Token if match.size == 0
+      return true, pos + match[0].size, @actions.flat_map(&.emit(match, tokenizer))
+    end
+
+    def initialize(node : XML::Node)
+    end
+
+    def initialize(node : XML::Node, multiline, dotall, ignorecase)
+      pattern = node["pattern"]
+      pattern = "(?m)" + pattern if multiline
+      @pattern = Regex.new(pattern, multiline, dotall, ignorecase, true)
+      add_actions(node)
+    end
+  end
+
  # This rule includes another state. If any of the rules of the
  # included state matches, this rule matches.
-  class IncludeStateRule < Rule
-    property state : String = ""
+  struct IncludeStateRule < BaseRule
+    @state : String = ""

-    def match(text, pos, lexer) : Tuple(Bool, Int32, Array(Token))
-      lexer.states[state].rules.each do |rule|
-        matched, new_pos, new_tokens = rule.match(text, pos, lexer)
+    def match(text : Bytes, pos : Int32, tokenizer : Tokenizer) : Tuple(Bool, Int32, Array(Token))
+      tokenizer.@lexer.states[@state].rules.each do |rule|
+        matched, new_pos, new_tokens = rule.match(text, pos, tokenizer)
        return true, new_pos, new_tokens if matched
      end
      return false, pos, [] of Token
@ -69,13 +74,11 @@ module Tartrazine
  end

  # This rule always matches, unconditionally
-  class UnconditionalRule < Rule
-    def match(text, pos, lexer) : Tuple(Bool, Int32, Array(Token))
-      tokens = [] of Token
-      actions.each do |action|
-        tokens += action.emit([] of Match, lexer)
-      end
-      return true, pos, tokens
+  struct UnconditionalRule < BaseRule
+    NO_MATCH = [] of Match
+
+    def match(text, pos, tokenizer) : Tuple(Bool, Int32, Array(Token))
+      return true, pos, @actions.flat_map(&.emit(NO_MATCH, tokenizer))
    end

    def initialize(node : XML::Node)
--- a/src/styles.cr
+++ b/src/styles.cr
@ -9,7 +9,7 @@ require "xml"
 module Tartrazine
  alias Color = Sixteen::Color

-  class ThemeFiles
+  struct ThemeFiles
    extend BakedFileSystem
    bake_folder "../styles", __DIR__
  end
@ -39,7 +39,7 @@ module Tartrazine
    themes.to_a.sort!
  end

-  class Style
+  struct Style
    # These properties are tri-state.
    # true means it's set
    # false means it's not set
@ -79,7 +79,7 @@ module Tartrazine
    end
  end

-  class Theme
+  struct Theme
    property name : String = ""

    property styles = {} of String => Style
--- a/x2.html
+++ b/x2.html
Author	SHA1	Message	Date
Roberto Alsina	72afec773e	Integrate heuristics into lexer selection	2024-08-24 21:35:06 -03:00
Roberto Alsina	a5926af518	Comments	2024-08-24 20:53:14 -03:00
Roberto Alsina	fc9f834bc8	Make it work again	2024-08-24 20:09:29 -03:00
Roberto Alsina	58fd42d936	Rebase to main	2024-08-24 19:59:05 -03:00
Roberto Alsina	5a88a51f3e	Implement heuristics from linguist	2024-08-24 19:55:56 -03:00
Roberto Alsina	fd7c6fa4b3	Sort of working?	2024-08-24 19:55:56 -03:00
Roberto Alsina	6264bfc754	Beginning deserialization of data	2024-08-24 19:55:56 -03:00
Roberto Alsina	38196d6e96	Rst lexer	2024-08-24 19:49:02 -03:00
Roberto Alsina	c6cd74e339	248 languages	2024-08-23 14:49:01 -03:00
Roberto Alsina	17c66a6572	typo	2024-08-23 14:46:26 -03:00
Roberto Alsina	cd7e150aae	Merge pull request #1 from ralgozino/docs/improve-v0.6.0-instructions docs: improve readme and help message	2024-08-23 14:45:56 -03:00
Ramiro Algozino	176b8e9bc9	docs: improve readme and help message - Add example for printing output to the terminal - Fix example for usage as CLI tool (missing -f flag) - Add instructions in the help message for combining lexers	2024-08-23 18:30:14 +02:00
Roberto Alsina	d8ddf5d8b6	v0.6.0	2024-08-23 10:39:08 -03:00
Roberto Alsina	06556877ef	Merge branch 'more_lexers'	2024-08-23 10:34:17 -03:00
Roberto Alsina	3d5d073471	Implemented usingbygroup action, so code-in-markdown works	2024-08-23 10:20:03 -03:00
Roberto Alsina	a2884c4c78	Refactor	2024-08-22 21:58:21 -03:00
Roberto Alsina	bd3df10d2c	Use classes instead of structs to allow properties of the same type	2024-08-22 21:52:59 -03:00
Roberto Alsina	0f3b7fc3c5	Initial implementation of delegatinglexer	2024-08-22 20:55:08 -03:00
Roberto Alsina	7f4296e9d7	Some template lexers	2024-08-22 16:11:30 -03:00
Roberto Alsina	f883065092	Fix weird bug	2024-08-22 15:00:17 -03:00
Roberto Alsina	746abe53ea	Fix weird bug	2024-08-22 14:58:05 -03:00
Roberto Alsina	90971e8f1b	Generate constants sorted so git diffs are smaller	2024-08-22 10:24:09 -03:00
Roberto Alsina	057879c6ee	oops	2024-08-22 10:11:36 -03:00
Roberto Alsina	215d53e173	3 more lexers (markdown moinwiki bbcode)	2024-08-21 22:21:38 -03:00
Roberto Alsina	f435d7df21	0.5.1	2024-08-21 21:22:36 -03:00
Roberto Alsina	5b0a1789dc	v0.5.0	2024-08-21 21:22:36 -03:00
Roberto Alsina	76ef1fea41	Fix example code in README	2024-08-21 21:22:36 -03:00
Roberto Alsina	3ebedec6c1	Make formatter a bit more convenient	2024-08-19 11:26:34 -03:00
Roberto Alsina	57e63f2308	Make formatter a bit more convenient	2024-08-19 11:20:08 -03:00
Roberto Alsina	4a598a575b	Make formatter a bit more convenient	2024-08-19 11:18:54 -03:00
Roberto Alsina	9042138053	Make formatter a bit more convenient	2024-08-19 11:17:44 -03:00
Roberto Alsina	fa647e898a	Make formatter a bit more convenient	2024-08-19 10:15:02 -03:00
Roberto Alsina	ad92929a10	Make formatter a bit more convenient	2024-08-19 09:59:01 -03:00
Roberto Alsina	bb952a44b8	Use IO for output	2024-08-16 17:25:33 -03:00
Roberto Alsina	ae03e4612e	todo management	2024-08-16 14:05:34 -03:00
Roberto Alsina	471b2f5050	updated	2024-08-16 14:03:05 -03:00
Roberto Alsina	5a3b08e716	lint	2024-08-16 14:01:16 -03:00
Roberto Alsina	9ebb9f2765	Fix off-by-1	2024-08-16 13:36:11 -03:00
Roberto Alsina	7538fc76aa	Tokenize via an iterator, makes everything much faster	2024-08-16 13:27:02 -03:00
Roberto Alsina	788577b226	Fix comment	2024-08-15 23:56:52 -03:00
Roberto Alsina	1f01146b1f	Minor cleanup	2024-08-15 23:21:21 -03:00
Roberto Alsina	9041b763ea	Remove unused bits of lexer config	2024-08-15 23:17:49 -03:00
Roberto Alsina	ada30915c3	Idiomatic changes	2024-08-15 23:16:29 -03:00
Roberto Alsina	78eff45ea0	Idiomatic changes	2024-08-15 23:11:49 -03:00
Roberto Alsina	e817aedd60	Idiomatic changes	2024-08-15 22:41:24 -03:00
Roberto Alsina	20d6b65346	More idiomatic	2024-08-15 22:01:50 -03:00
Roberto Alsina	cb09dff9f1	Minor cleanup	2024-08-15 21:35:06 -03:00
Roberto Alsina	b589726352	Make action a struct, guard against popping too much	2024-08-15 21:16:17 -03:00
Roberto Alsina	a3a7b5bd9a	Many cleanups	2024-08-15 21:10:25 -03:00
Roberto Alsina	58e8dac038	Make usingself MUCH cheaper, since it was called many times when parsing C	2024-08-15 19:20:12 -03:00
Roberto Alsina	f72a40f095	Oops, escape things in HTML formatter!	2024-08-15 17:12:29 -03:00
Roberto Alsina	bf257a5b82	cleanup	2024-08-15 17:05:03 -03:00
Roberto Alsina	029495590c	cleanup	2024-08-15 17:04:48 -03:00
Roberto Alsina	115debdec6	Allocate match_data once	2024-08-15 17:04:16 -03:00
Roberto Alsina	4612db58fe	Prefetch XML data	2024-08-15 17:03:58 -03:00
Roberto Alsina	f45a86c83a	ignore	2024-08-15 16:35:58 -03:00