CLose to 100% tests working, but slooooooow

More tests pass
Some tests pass!
2025-09-17 10:48:12 +00:00 · 2024-08-13 20:45:46 -03:00 · 2024-08-13 20:09:36 -03:00 · 2024-08-13 19:19:12 -03:00 · 2024-08-13 14:02:13 -03:00 · 2024-08-12 20:10:50 -03:00
32 changed files with 1626 additions and 16573 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -7,5 +7,3 @@ chroma/
 pygments/
 shard.lock
 .vscode/
-.crystal/
-venv/
--- a/README.md
+++ b/README.md
@@ -4,17 +4,17 @@ Tartrazine is a library to syntax-highlight code. It is
 a port of [Pygments](https://pygments.org/) to
 [Crystal](https://crystal-lang.org/). Kind of.

-The CLI tool can be used to highlight many things in many styles.
+It's not currently usable because it's not finished, but:
+
+* The lexers work for the implemented languages
+* The provided styles work
+* There is a very very simple HTML formatter

 # A port of what? Why "kind of"?

-Pygments is a staple of the Python ecosystem, and it's great.
-It lets you highlight code in many languages, and it has many
-themes. Chroma is "Pygments for Go", it's actually a port of
-Pygments to Go, and it's great too.
-
-I wanted that in Crystal, so I started this project. But I did
-not read much of the Pygments code. Or much of Chroma's.
+Because I did not read the Pygments code. And this is actually
+based on [Chroma](https://github.com/alecthomas/chroma) ...
+although I did not read that code either.

 Chroma has taken most of the Pygments lexers and turned them into
 XML descriptions. What I did was take those XML files from Chroma
@@ -29,7 +29,7 @@ This only covers the RegexLexers, which are the most common ones,
 but it means the supported languages are a subset of Chroma's, which
 is a subset of Pygments'.

-Currently Tartrazine supports ... 247 languages.
+Currently Tartrazine supports ... 241 languages.

 It has 331 themes (63 from Chroma, the rest are base16 themes via
 [Sixteen](https://github.com/ralsina/sixteen)
@@ -47,22 +47,7 @@ To build from source:
 2. Run `make` to build the `tartrazine` binary
 3. Copy the binary somewhere in your PATH.

-## Usage as a CLI tool
-
-Show a syntax highlighted version of a C source file in your terminal:
-
-```shell
-$ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers -f terminal
-```
-
-Generate a standalone HTML file from a C source file with the syntax highlighted:
-
-```shell
-$ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers \
-  --standalone -f html -o whatever.html 
-```
-
-## Usage as a Library
+## Usage

 This works:

@@ -71,9 +56,7 @@ require "tartrazine"

 lexer = Tartrazine.lexer("crystal")
 theme = Tartrazine.theme("catppuccin-macchiato")
-formatter = Tartrazine::Html.new
-formatter.theme = theme
-puts formatter.format(File.read(ARGV[0]), lexer)
+puts Tartrazine::Html.new.format(File.read(ARGV[0]), lexer, theme)
 ```

 ## Contributing
@@ -86,4 +69,4 @@ puts formatter.format(File.read(ARGV[0]), lexer)

 ## Contributors

- [Roberto Alsina](https://github.com/ralsina) - creator and maintainer
+- [Roberto Alsina](https://github.com/ralsina) - creator and maintainer
--- a/TODO.md
+++ b/TODO.md
@@ -8,8 +8,4 @@
 * ✅ Implement lexer loader that respects aliases
 * ✅ Implement lexer loader by file extension
 * ✅ Add --line-numbers to terminal formatter
-* Implement lexer loader by mime type
-* ✅ Implement Delegating lexers
-* ✅ Add RstLexer
-* Add Mako template lexer
-* Implement heuristic lexer detection
+* Implement lexer loader by mime type
--- a/lexers/LICENSE-heuristics
+++ b/lexers/LICENSE-heuristics
@@ -1,22 +0,0 @@
-Copyright (c) 2017 GitHub, Inc.
-
-Permission is hereby granted, free of charge, to any person
-obtaining a copy of this software and associated documentation
-files (the "Software"), to deal in the Software without
-restriction, including without limitation the rights to use,
-copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the
-Software is furnished to do so, subject to the following
-conditions:
-
-The above copyright notice and this permission notice shall be
-included in all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
-OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
-OTHER DEALINGS IN THE SOFTWARE.
--- a/lexers/LiquidLexer.xml
+++ b/lexers/LiquidLexer.xml
@@ -1,130 +0,0 @@
-
-<lexer>
-  <config>
-    <name>liquid</name>
-    <alias>liquid</alias>
-    <filename>*.liquid</filename>
-  </config>
-  <rules>
-    <state name="root">
-      <rule pattern="[^{]+"><token type="Text"/></rule>
-      <rule pattern="(\{%)(\s*)"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/></bygroups><push state="tag-or-block"/></rule>
-      <rule pattern="(\{\{)(\s*)([^\s}]+)"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><usingself state="generic"/></bygroups><push state="output"/></rule>
-      <rule pattern="\{"><token type="Text"/></rule>
-    </state>
-    <state name="tag-or-block">
-      <rule pattern="(if|unless|elsif|case)(?=\s+)"><token type="KeywordReserved"/><push state="condition"/></rule>
-      <rule pattern="(when)(\s+)"><bygroups><token type="KeywordReserved"/><token type="TextWhitespace"/></bygroups><combined state="end-of-block" state="whitespace" state="generic"/></rule>
-      <rule pattern="(else)(\s*)(%\})"><bygroups><token type="KeywordReserved"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
-      <rule pattern="(capture)(\s+)([^\s%]+)(\s*)(%\})"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><usingself state="variable"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
-      <rule pattern="(comment)(\s*)(%\})"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><push state="comment"/></rule>
-      <rule pattern="(raw)(\s*)(%\})"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><push state="raw"/></rule>
-      <rule pattern="(end(case|unless|if))(\s*)(%\})"><bygroups><token type="KeywordReserved"/>None<token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
-      <rule pattern="(end([^\s%]+))(\s*)(%\})"><bygroups><token type="NameTag"/>None<token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
-      <rule pattern="(cycle)(\s+)(?:([^\s:]*)(:))?(\s*)"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><usingself state="generic"/><token type="Punctuation"/><token type="TextWhitespace"/></bygroups><push state="variable-tag-markup"/></rule>
-      <rule pattern="([^\s%]+)(\s*)"><bygroups><token type="NameTag"/><token type="TextWhitespace"/></bygroups><push state="tag-markup"/></rule>
-    </state>
-    <state name="output">
-      <rule><include state="whitespace"/></rule>
-      <rule pattern="\}\}"><token type="Punctuation"/><pop depth="1"/></rule>
-      <rule pattern="\|"><token type="Punctuation"/><push state="filters"/></rule>
-    </state>
-    <state name="filters">
-      <rule><include state="whitespace"/></rule>
-      <rule pattern="\}\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
-      <rule pattern="([^\s|:]+)(:?)(\s*)"><bygroups><token type="NameFunction"/><token type="Punctuation"/><token type="TextWhitespace"/></bygroups><push state="filter-markup"/></rule>
-    </state>
-    <state name="filter-markup">
-      <rule pattern="\|"><token type="Punctuation"/><pop depth="1"/></rule>
-      <rule><include state="end-of-tag"/></rule>
-      <rule><include state="default-param-markup"/></rule>
-    </state>
-    <state name="condition">
-      <rule><include state="end-of-block"/></rule>
-      <rule><include state="whitespace"/></rule>
-      <rule pattern="([^\s=!&gt;&lt;]+)(\s*)([=!&gt;&lt;]=?)(\s*)(\S+)(\s*)(%\})"><bygroups><usingself state="generic"/><token type="TextWhitespace"/><token type="Operator"/><token type="TextWhitespace"/><usingself state="generic"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups></rule>
-      <rule pattern="\b!"><token type="Operator"/></rule>
-      <rule pattern="\bnot\b"><token type="OperatorWord"/></rule>
-      <rule pattern="([\w.\&#x27;&quot;]+)(\s+)(contains)(\s+)([\w.\&#x27;&quot;]+)"><bygroups><usingself state="generic"/><token type="TextWhitespace"/><token type="OperatorWord"/><token type="TextWhitespace"/><usingself state="generic"/></bygroups></rule>
-      <rule><include state="generic"/></rule>
-      <rule><include state="whitespace"/></rule>
-    </state>
-    <state name="generic-value">
-      <rule><include state="generic"/></rule>
-      <rule><include state="end-at-whitespace"/></rule>
-    </state>
-    <state name="operator">
-      <rule pattern="(\s*)((=|!|&gt;|&lt;)=?)(\s*)"><bygroups><token type="TextWhitespace"/><token type="Operator"/>None<token type="TextWhitespace"/></bygroups><pop depth="1"/></rule>
-      <rule pattern="(\s*)(\bcontains\b)(\s*)"><bygroups><token type="TextWhitespace"/><token type="OperatorWord"/><token type="TextWhitespace"/></bygroups><pop depth="1"/></rule>
-    </state>
-    <state name="end-of-tag">
-      <rule pattern="\}\}"><token type="Punctuation"/><pop depth="1"/></rule>
-    </state>
-    <state name="end-of-block">
-      <rule pattern="%\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
-    </state>
-    <state name="end-at-whitespace">
-      <rule pattern="\s+"><token type="TextWhitespace"/><pop depth="1"/></rule>
-    </state>
-    <state name="param-markup">
-      <rule><include state="whitespace"/></rule>
-      <rule pattern="([^\s=:]+)(\s*)(=|:)"><bygroups><token type="NameAttribute"/><token type="TextWhitespace"/><token type="Operator"/></bygroups></rule>
-      <rule pattern="(\{\{)(\s*)([^\s}])(\s*)(\}\})"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><usingself state="variable"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups></rule>
-      <rule><include state="string"/></rule>
-      <rule><include state="number"/></rule>
-      <rule><include state="keyword"/></rule>
-      <rule pattern=","><token type="Punctuation"/></rule>
-    </state>
-    <state name="default-param-markup">
-      <rule><include state="param-markup"/></rule>
-      <rule pattern="."><token type="Text"/></rule>
-    </state>
-    <state name="variable-param-markup">
-      <rule><include state="param-markup"/></rule>
-      <rule><include state="variable"/></rule>
-      <rule pattern="."><token type="Text"/></rule>
-    </state>
-    <state name="tag-markup">
-      <rule pattern="%\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
-      <rule><include state="default-param-markup"/></rule>
-    </state>
-    <state name="variable-tag-markup">
-      <rule pattern="%\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
-      <rule><include state="variable-param-markup"/></rule>
-    </state>
-    <state name="keyword">
-      <rule pattern="\b(false|true)\b"><token type="KeywordConstant"/></rule>
-    </state>
-    <state name="variable">
-      <rule pattern="[a-zA-Z_]\w*"><token type="NameVariable"/></rule>
-      <rule pattern="(?&lt;=\w)\.(?=\w)"><token type="Punctuation"/></rule>
-    </state>
-    <state name="string">
-      <rule pattern="&#x27;[^&#x27;]*&#x27;"><token type="LiteralStringSingle"/></rule>
-      <rule pattern="&quot;[^&quot;]*&quot;"><token type="LiteralStringDouble"/></rule>
-    </state>
-    <state name="number">
-      <rule pattern="\d+\.\d+"><token type="LiteralNumberFloat"/></rule>
-      <rule pattern="\d+"><token type="LiteralNumberInteger"/></rule>
-    </state>
-    <state name="generic">
-      <rule><include state="keyword"/></rule>
-      <rule><include state="string"/></rule>
-      <rule><include state="number"/></rule>
-      <rule><include state="variable"/></rule>
-    </state>
-    <state name="whitespace">
-      <rule pattern="[ \t]+"><token type="TextWhitespace"/></rule>
-    </state>
-    <state name="comment">
-      <rule pattern="(\{%)(\s*)(endcomment)(\s*)(%\})"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><push state="#pop" state="#pop"/></rule>
-      <rule pattern="."><token type="Comment"/></rule>
-    </state>
-    <state name="raw">
-      <rule pattern="[^{]+"><token type="Text"/></rule>
-      <rule pattern="(\{%)(\s*)(endraw)(\s*)(%\})"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
-      <rule pattern="\{"><token type="Text"/></rule>
-    </state>
-  </rules>
-</lexer>
-
--- a/lexers/VelocityLexer.xml
+++ b/lexers/VelocityLexer.xml
@@ -1,55 +0,0 @@
-
-<lexer>
-  <config>
-    <name>Velocity</name>
-    <alias>velocity</alias>
-    <filename>*.vm</filename>
-    <filename>*.fhtml</filename>
-    <dot_all>true</dot_all>
-  </config>
-  <rules>
-    <state name="root">
-      <rule pattern="[^{#$]+"><token type="Other"/></rule>
-      <rule pattern="(#)(\*.*?\*)(#)"><bygroups><token type="CommentPreproc"/><token type="Comment"/><token type="CommentPreproc"/></bygroups></rule>
-      <rule pattern="(##)(.*?$)"><bygroups><token type="CommentPreproc"/><token type="Comment"/></bygroups></rule>
-      <rule pattern="(#\{?)([a-zA-Z_]\w*)(\}?)(\s?\()"><bygroups><token type="CommentPreproc"/><token type="NameFunction"/><token type="CommentPreproc"/><token type="Punctuation"/></bygroups><push state="directiveparams"/></rule>
-      <rule pattern="(#\{?)([a-zA-Z_]\w*)(\}|\b)"><bygroups><token type="CommentPreproc"/><token type="NameFunction"/><token type="CommentPreproc"/></bygroups></rule>
-      <rule pattern="\$!?\{?"><token type="Punctuation"/><push state="variable"/></rule>
-    </state>
-    <state name="variable">
-      <rule pattern="[a-zA-Z_]\w*"><token type="NameVariable"/></rule>
-      <rule pattern="\("><token type="Punctuation"/><push state="funcparams"/></rule>
-      <rule pattern="(\.)([a-zA-Z_]\w*)"><bygroups><token type="Punctuation"/><token type="NameVariable"/></bygroups><push/></rule>
-      <rule pattern="\}"><token type="Punctuation"/><pop depth="1"/></rule>
-      <rule><pop depth="1"/></rule>
-    </state>
-    <state name="directiveparams">
-      <rule pattern="(&amp;&amp;|\|\||==?|!=?|[-&lt;&gt;+*%&amp;|^/])|\b(eq|ne|gt|lt|ge|le|not|in)\b"><token type="Operator"/></rule>
-      <rule pattern="\["><token type="Operator"/><push state="rangeoperator"/></rule>
-      <rule pattern="\b[a-zA-Z_]\w*\b"><token type="NameFunction"/></rule>
-      <rule><include state="funcparams"/></rule>
-    </state>
-    <state name="rangeoperator">
-      <rule pattern="\.\."><token type="Operator"/></rule>
-      <rule><include state="funcparams"/></rule>
-      <rule pattern="\]"><token type="Operator"/><pop depth="1"/></rule>
-    </state>
-    <state name="funcparams">
-      <rule pattern="\$!?\{?"><token type="Punctuation"/><push state="variable"/></rule>
-      <rule pattern="\s+"><token type="Text"/></rule>
-      <rule pattern="[,:]"><token type="Punctuation"/></rule>
-      <rule pattern="&quot;(\\\\|\\[^\\]|[^&quot;\\])*&quot;"><token type="LiteralStringDouble"/></rule>
-      <rule pattern="&#x27;(\\\\|\\[^\\]|[^&#x27;\\])*&#x27;"><token type="LiteralStringSingle"/></rule>
-      <rule pattern="0[xX][0-9a-fA-F]+[Ll]?"><token type="LiteralNumber"/></rule>
-      <rule pattern="\b[0-9]+\b"><token type="LiteralNumber"/></rule>
-      <rule pattern="(true|false|null)\b"><token type="KeywordConstant"/></rule>
-      <rule pattern="\("><token type="Punctuation"/><push/></rule>
-      <rule pattern="\)"><token type="Punctuation"/><pop depth="1"/></rule>
-      <rule pattern="\{"><token type="Punctuation"/><push/></rule>
-      <rule pattern="\}"><token type="Punctuation"/><pop depth="1"/></rule>
-      <rule pattern="\["><token type="Punctuation"/><push/></rule>
-      <rule pattern="\]"><token type="Punctuation"/><pop depth="1"/></rule>
-    </state>
-  </rules>
-</lexer>
-
--- a/lexers/bbcode.xml
+++ b/lexers/bbcode.xml
@@ -1,22 +0,0 @@
-
-<lexer>
-  <config>
-    <name>BBCode</name>
-    <alias>bbcode</alias>
-    <mime_type>text/x-bbcode</mime_type>
-  </config>
-  <rules>
-    <state name="root">
-      <rule pattern="[^[]+"><token type="Text"/></rule>
-      <rule pattern="\[/?\w+"><token type="Keyword"/><push state="tag"/></rule>
-      <rule pattern="\["><token type="Text"/></rule>
-    </state>
-    <state name="tag">
-      <rule pattern="\s+"><token type="Text"/></rule>
-      <rule pattern="(\w+)(=)(&quot;?[^\s&quot;\]]+&quot;?)"><bygroups><token type="NameAttribute"/><token type="Operator"/><token type="LiteralString"/></bygroups></rule>
-      <rule pattern="(=)(&quot;?[^\s&quot;\]]+&quot;?)"><bygroups><token type="Operator"/><token type="LiteralString"/></bygroups></rule>
-      <rule pattern="\]"><token type="Keyword"/><pop depth="1"/></rule>
-    </state>
-  </rules>
-</lexer>
-
--- a/lexers/groff.xml
+++ b/lexers/groff.xml
@@ -3,7 +3,6 @@
    <name>Groff</name>
    <alias>groff</alias>
    <alias>nroff</alias>
-    <alias>roff</alias>
    <alias>man</alias>
    <filename>*.[1-9]</filename>
    <filename>*.1p</filename>
@@ -88,4 +87,4 @@
      </rule>
    </state>
  </rules>
-</lexer>
+</lexer>
--- a/lexers/heuristics.yml
+++ b/lexers/heuristics.yml
@@ -1,913 +0,0 @@
-# A collection of simple regexp-based rules that can be applied to content
-# to disambiguate languages with the same file extension.
-#
-# There are two top-level keys: disambiguations and named_patterns.
-#
-# disambiguations     - a list of disambiguation rules, one for each
-#                       extension or group of extensions.
-# extensions          - an array of file extensions that this block applies to.
-# rules               - list of rules that are applied in order to the content
-#                       of a file with a matching extension. Rules are evaluated
-#                       until one of them matches. If none matches, no language
-#                       is returned.
-# language            - Language to be returned if the rule matches.
-# pattern             - Ruby-compatible regular expression that makes the rule
-#                       match. If no pattern is specified, the rule always matches.
-#                       Pattern can be a string with a single regular expression
-#                       or an array of strings that will be merged in a single
-#                       regular expression (with union).
-# and                 - An and block merges multiple rules and checks that all of
-#                       them must match.
-# negative_pattern    - Same as pattern, but checks for absence of matches.
-# named_pattern       - A pattern can be reused by specifying it in the
-#                       named_patterns section and referencing it here by its
-#                       key.
-# named_patterns      - Key-value map of reusable named patterns.
-#
-# Please keep this list alphabetized.
-#
---
-disambiguations:
- extensions: ['.1', '.2', '.3', '.4', '.5', '.6', '.7', '.8', '.9']
-  rules:
-  - language: man
-    and:
-    - named_pattern: mdoc-date
-    - named_pattern: mdoc-title
-    - named_pattern: mdoc-heading
-  - language: man
-    and:
-    - named_pattern: man-title
-    - named_pattern: man-heading
-  - language: Roff
-    pattern: '^\.(?:[A-Za-z]{2}(?:\s|$)|\\")'
- extensions: ['.1in', '.1m', '.1x', '.3in', '.3m', '.3p', '.3pm', '.3qt', '.3x', '.man', '.mdoc']
-  rules:
-  - language: man
-    and:
-    - named_pattern: mdoc-date
-    - named_pattern: mdoc-title
-    - named_pattern: mdoc-heading
-  - language: man
-    and:
-    - named_pattern: man-title
-    - named_pattern: man-heading
-  - language: Roff
- extensions: ['.al']
-  rules:
-  # AL pattern source from https://github.com/microsoft/AL/blob/master/grammar/alsyntax.tmlanguage - keyword.other.applicationobject.al
-  - language: AL
-    and:
-    - pattern: '\b(?i:(CODEUNIT|PAGE|PAGEEXTENSION|PAGECUSTOMIZATION|DOTNET|ENUM|ENUMEXTENSION|VALUE|QUERY|REPORT|TABLE|TABLEEXTENSION|XMLPORT|PROFILE|CONTROLADDIN|REPORTEXTENSION|INTERFACE|PERMISSIONSET|PERMISSIONSETEXTENSION|ENTITLEMENT))\b'
-  # Open-ended fallback to Perl AutoLoader
-  - language: Perl
- extensions: ['.app']
-  rules:
-  - language: Erlang
-    pattern: '^\{\s*(?:application|''application'')\s*,\s*(?:[a-z]+[\w@]*|''[^'']+'')\s*,\s*\[(?:.|[\r\n])*\]\s*\}\.[ \t]*$'
- extensions: ['.as']
-  rules:
-  - language: ActionScript
-    pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:\{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+\s*:\s*[\w<>.]+\s*(,\s*\w+\s*:\s*[\w<>.]+\s*)*)?\)))'
- extensions: ['.asc']
-  rules:
-  - language: Public Key
-    pattern: '^(----[- ]BEGIN|ssh-(rsa|dss)) '
-  - language: AsciiDoc
-    pattern: '^[=-]+\s|\{\{[A-Za-z]'
-  - language: AGS Script
-    pattern: '^(\/\/.+|((import|export)\s+)?(function|int|float|char)\s+((room|repeatedly|on|game)_)?([A-Za-z]+[A-Za-z_0-9]+)\s*[;\(])'
- extensions: ['.asm']
-  rules:
-  - language: Motorola 68K Assembly
-    named_pattern: m68k
- extensions: ['.asy']
-  rules:
-  - language: LTspice Symbol
-    pattern: '^SymbolType[ \t]'
-  - language: Asymptote
- extensions: ['.bas']
-  rules:
-  - language: FreeBasic
-    pattern: '^[ \t]*#(?i)(?:define|endif|endmacro|ifn?def|include|lang|macro)(?:$|\s)'
-  - language: BASIC
-    pattern: '\A\s*\d'
-  - language: VBA
-    and:
-    - named_pattern: vb-module
-    - named_pattern: vba
-  - language: Visual Basic 6.0
-    named_pattern: vb-module
- extensions: ['.bb']
-  rules:
-  - language: BlitzBasic
-    pattern: '(<^\s*; |End Function)'
-  - language: BitBake
-    pattern: '^(# |include|require|inherit)\b'
-  - language: Clojure
-    pattern: '\((def|defn|defmacro|let)\s'
- extensions: ['.bf']
-  rules:
-  - language: Beef
-    pattern: '(?-m)^\s*using\s+(System|Beefy)(\.(.*))?;\s*$'
-  - language: HyPhy
-    pattern:
-    - '(?-m)^\s*#include\s+".*";\s*$'
-    - '\sfprintf\s*\('
-  - language: Brainfuck
-    pattern: '(>\+>|>\+<)'
- extensions: ['.bi']
-  rules:
-  - language: FreeBasic
-    pattern: '^[ \t]*#(?i)(?:define|endif|endmacro|ifn?def|if|include|lang|macro)(?:$|\s)'
- extensions: ['.bs']
-  rules:
-  - language: Bikeshed
-    pattern: '^(?i:<pre\s+class)\s*=\s*(''|\"|\b)metadata\b\1[^>\r\n]*>'
-  - language: BrighterScript
-    pattern:
-    - (?i:^\s*(?=^sub\s)(?:sub\s*\w+\(.*?\))|(?::\s*sub\(.*?\))$)
-    - (?i:^\s*(end\ssub)$)
-    - (?i:^\s*(?=^function\s)(?:function\s*\w+\(.*?\)\s*as\s*\w*)|(?::\s*function\(.*?\)\s*as\s*\w*)$)
-    - (?i:^\s*(end\sfunction)$)
-  - language: Bluespec BH
-    pattern: '^package\s+[A-Za-z_][A-Za-z0-9_'']*(?:\s*\(|\s+where)'
- extensions: ['.builds']
-  rules:
-  - language: XML
-    pattern: '^(\s*)(?i:<Project|<Import|<Property|<?xml|xmlns)'
- extensions: ['.ch']
-  rules:
-  - language: xBase
-    pattern: '^\s*#\s*(?i:if|ifdef|ifndef|define|command|xcommand|translate|xtranslate|include|pragma|undef)\b'
- extensions: ['.cl']
-  rules:
-  - language: Common Lisp
-    pattern: '^\s*\((?i:defun|in-package|defpackage) '
-  - language: Cool
-    pattern: '^class'
-  - language: OpenCL
-    pattern: '\/\* |\/\/ |^\}'
- extensions: ['.cls']
-  rules:
-  - language: Visual Basic 6.0
-    and:
-    - named_pattern: vb-class
-    - pattern: '^\s*BEGIN(?:\r?\n|\r)\s*MultiUse\s*=.*(?:\r?\n|\r)\s*Persistable\s*='
-  - language: VBA
-    named_pattern: vb-class
-  - language: TeX
-    pattern: '^\s*\\(?:NeedsTeXFormat|ProvidesClass)\{'
-  - language: ObjectScript
-    pattern: '^Class\s'
- extensions: ['.cmp']
-  rules:
-  - language: Gerber Image
-    pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
- extensions: ['.cs']
-  rules:
-  - language: Smalltalk
-    pattern: '![\w\s]+methodsFor: '
-  - language: 'C#'
-    pattern: '^\s*(using\s+[A-Z][\s\w.]+;|namespace\s*[\w\.]+\s*(\{|;)|\/\/)'
- extensions: ['.csc']
-  rules:
-  - language: GSC
-    named_pattern: gsc
- extensions: ['.csl']
-  rules:
-  - language: XML
-    pattern: '(?i:^\s*(<\?xml|xmlns))'
-  - language: Kusto
-    pattern: '(^\|\s*(where|extend|project|limit|summarize))|(^\.\w+)'
- extensions: ['.d']
-  rules:
-  - language: D
-    # see http://dlang.org/spec/grammar
-    # ModuleDeclaration | ImportDeclaration | FuncDeclaration | unittest
-    pattern: '^module\s+[\w.]*\s*;|import\s+[\w\s,.:]*;|\w+\s+\w+\s*\(.*\)(?:\(.*\))?\s*\{[^}]*\}|unittest\s*(?:\(.*\))?\s*\{[^}]*\}'
-  - language: DTrace
-    # see http://dtrace.org/guide/chp-prog.html, http://dtrace.org/guide/chp-profile.html, http://dtrace.org/guide/chp-opt.html
-    pattern: '^(\w+:\w*:\w*:\w*|BEGIN|END|provider\s+|(tick|profile)-\w+\s+\{[^}]*\}|#pragma\s+D\s+(option|attributes|depends_on)\s|#pragma\s+ident\s)'
-  - language: Makefile
-    # path/target : dependency \
-    # target : \
-    #  : dependency
-    # path/file.ext1 : some/path/../file.ext2
-    pattern: '([\/\\].*:\s+.*\s\\$|: \\$|^[ %]:|^[\w\s\/\\.]+\w+\.\w+\s*:\s+[\w\s\/\\.]+\w+\.\w+)'
- extensions: ['.dsp']
-  rules:
-  - language: Microsoft Developer Studio Project
-    pattern: '# Microsoft Developer Studio Generated Build File'
-  - language: Faust
-    pattern: '\bprocess\s*[(=]|\b(library|import)\s*\(\s*"|\bdeclare\s+(name|version|author|copyright|license)\s+"'
- extensions: ['.e']
-  rules:
-  - language: E
-    pattern:
-    - '^\s*(def|var)\s+(.+):='
-    - '^\s*(def|to)\s+(\w+)(\(.+\))?\s+\{'
-    - '^\s*(when)\s+(\(.+\))\s+->\s+\{'
-  - language: Eiffel
-    pattern:
-    - '^\s*\w+\s*(?:,\s*\w+)*[:]\s*\w+\s'
-    - '^\s*\w+\s*(?:\(\s*\w+[:][^)]+\))?(?:[:]\s*\w+)?(?:--.+\s+)*\s+(?:do|local)\s'
-    - '^\s*(?:across|deferred|elseif|ensure|feature|from|inherit|inspect|invariant|note|once|require|undefine|variant|when)\s*$'
-  - language: Euphoria
-    named_pattern: euphoria
- extensions: ['.ecl']
-  rules:
-  - language: ECLiPSe
-    pattern: '^[^#]+:-'
-  - language: ECL
-    pattern: ':='
- extensions: ['.es']
-  rules:
-  - language: Erlang
-    pattern: '^\s*(?:%%|main\s*\(.*?\)\s*->)'
-  - language: JavaScript
-    pattern: '\/\/|("|'')use strict\1|export\s+default\s|\/\*(?:.|[\r\n])*?\*\/'
- extensions: ['.ex']
-  rules:
-  - language: Elixir
-    pattern:
-    - '^\s*@moduledoc\s'
-    - '^\s*(?:cond|import|quote|unless)\s'
-    - '^\s*def(?:exception|impl|macro|module|protocol)[(\s]'
-  - language: Euphoria
-    named_pattern: euphoria
- extensions: ['.f']
-  rules:
-  - language: Forth
-    pattern: '^: '
-  - language: Filebench WML
-    pattern: 'flowop'
-  - language: Fortran
-    named_pattern: fortran
- extensions: ['.for']
-  rules:
-  - language: Forth
-    pattern: '^: '
-  - language: Fortran
-    named_pattern: fortran
- extensions: ['.fr']
-  rules:
-  - language: Forth
-    pattern: '^(: |also |new-device|previous )'
-  - language: Frege
-    pattern: '^\s*(import|module|package|data|type) '
-  - language: Text
- extensions: ['.frm']
-  rules:
-  - language: VBA
-    and:
-    - named_pattern: vb-form
-    - pattern: '^\s*Begin\s+\{[0-9A-Z\-]*\}\s?'
-  - language: Visual Basic 6.0
-    and:
-    - named_pattern: vb-form
-    - pattern: '^\s*Begin\s+VB\.Form\s+'
- extensions: ['.fs']
-  rules:
-  - language: Forth
-    pattern: '^(: |new-device)'
-  - language: 'F#'
-    pattern: '^\s*(#light|import|let|module|namespace|open|type)'
-  - language: GLSL
-    pattern: '^\s*(#version|precision|uniform|varying|vec[234])'
-  - language: Filterscript
-    pattern: '#include|#pragma\s+(rs|version)|__attribute__'
- extensions: ['.ftl']
-  rules:
-  - language: FreeMarker
-    pattern: '^(?:<|[a-zA-Z-][a-zA-Z0-9_-]+[ \t]+\w)|\$\{\w+[^\r\n]*?\}|^[ \t]*(?:<#--.*?-->|<#([a-z]+)(?=\s|>)[^>]*>.*?</#\1>|\[#--.*?--\]|\[#([a-z]+)(?=\s|\])[^\]]*\].*?\[#\2\])'
-  - language: Fluent
-    pattern: '^-?[a-zA-Z][a-zA-Z0-9_-]* *=|\{\$-?[a-zA-Z][-\w]*(?:\.[a-zA-Z][-\w]*)?\}'
- extensions: ['.g']
-  rules:
-  - language: GAP
-    pattern: '\s*(Declare|BindGlobal|KeyDependentOperation|Install(Method|GlobalFunction)|SetPackageInfo)'
-  - language: G-code
-    pattern: '^[MG][0-9]+(?:\r?\n|\r)'
- extensions: ['.gd']
-  rules:
-  - language: GAP
-    pattern: '\s*(Declare|BindGlobal|KeyDependentOperation)'
-  - language: GDScript
-    pattern: '\s*(extends|var|const|enum|func|class|signal|tool|yield|assert|onready)'
- extensions: ['.gml']
-  rules:
-  - language: XML
-    pattern: '(?i:^\s*(<\?xml|xmlns))'
-  - language: Graph Modeling Language
-    pattern: '(?i:^\s*(graph|node)\s+\[$)'
-  - language: Gerber Image
-    pattern: '^[DGMT][0-9]{2}\*$'
-  - language: Game Maker Language
- extensions: ['.gs']
-  rules:
-  - language: GLSL
-    pattern: '^#version\s+[0-9]+\b'
-  - language: Gosu
-    pattern: '^uses (java|gw)\.'
-  - language: Genie
-    pattern: '^\[indent=[0-9]+\]'
- extensions: ['.gsc']
-  rules:
-  - language: GSC
-    named_pattern: gsc
- extensions: ['.gsh']
-  rules:
-  - language: GSC
-    named_pattern: gsc
- extensions: ['.gts']
-  rules:
-  - language: Gerber Image
-    pattern: '^G0.'
-  - language: Glimmer TS
-    negative_pattern: '^G0.'
- extensions: ['.h']
-  rules:
-  - language: Objective-C
-    named_pattern: objectivec
-  - language: C++
-    named_pattern: cpp
-  - language: C
- extensions: ['.hh']
-  rules:
-  - language: Hack
-    pattern: '<\?hh'
- extensions: ['.html']
-  rules:
-  - language: Ecmarkup
-    pattern: '<emu-(?:alg|annex|biblio|clause|eqn|example|figure|gann|gmod|gprose|grammar|intro|not-ref|note|nt|prodref|production|rhs|table|t|xref)(?:$|\s|>)'
-  - language: HTML
- extensions: ['.i']
-  rules:
-  - language: Motorola 68K Assembly
-    named_pattern: m68k
-  - language: SWIG
-    pattern: '^[ \t]*%[a-z_]+\b|^%[{}]$'
- extensions: ['.ice']
-  rules:
-  - language: JSON
-    pattern: '\A\s*[{\[]'
-  - language: Slice
- extensions: ['.inc']
-  rules:
-  - language: Motorola 68K Assembly
-    named_pattern: m68k
-  - language: PHP
-    pattern: '^<\?(?:php)?'
-  - language: SourcePawn
-    pattern:
-    - '^public\s+(?:SharedPlugin(?:\s+|:)__pl_\w+\s*=(?:\s*\{)?|(?:void\s+)?__pl_\w+_SetNTVOptional\(\)(?:\s*\{)?)'
-    - '^methodmap\s+\w+\s+<\s+\w+'
-    - '^\s*MarkNativeAsOptional\s*\('
-  - language: NASL
-    pattern:
-    - '^\s*include\s*\(\s*(?:"|'')[\\/\w\-\.:\s]+\.(?:nasl|inc)\s*(?:"|'')\s*\)\s*;'
-    - '^\s*(?:global|local)_var\s+(?:\w+(?:\s*=\s*[\w\-"'']+)?\s*)(?:,\s*\w+(?:\s*=\s*[\w\-"'']+)?\s*)*+\s*;'
-    - '^\s*namespace\s+\w+\s*\{'
-    - '^\s*object\s+\w+\s*(?:extends\s+\w+(?:::\w+)?)?\s*\{'
-    - '^\s*(?:public\s+|private\s+|\s*)function\s+\w+\s*\([\w\s,]*\)\s*\{'
-  - language: POV-Ray SDL
-    pattern: '^\s*#(declare|local|macro|while)\s'
-  - language: Pascal
-    pattern:
-    - '(?i:^\s*\{\$(?:mode|ifdef|undef|define)[ ]+[a-z0-9_]+\})'
-    - '^\s*end[.;]\s*$'
-  - language: BitBake
-    pattern: '^inherit(\s+[\w.-]+)+\s*$'
- extensions: ['.json']
-  rules:
-  - language: OASv2-json
-    pattern: '"swagger":\s?"2.[0-9.]+"'
-  - language: OASv3-json
-    pattern: '"openapi":\s?"3.[0-9.]+"'
-  - language: JSON
- extensions: ['.l']
-  rules:
-  - language: Common Lisp
-    pattern: '\(def(un|macro)\s'
-  - language: Lex
-    pattern: '^(%[%{}]xs|<.*>)'
-  - language: Roff
-    pattern: '^\.[A-Za-z]{2}(\s|$)'
-  - language: PicoLisp
-    pattern: '^\((de|class|rel|code|data|must)\s'
- extensions: ['.lean']
-  rules:
-  - language: Lean
-    pattern: '^import [a-z]'
-  - language: Lean 4
-    pattern: '^import [A-Z]'
- extensions: ['.ls']
-  rules:
-  - language: LoomScript
-    pattern: '^\s*package\s*[\w\.\/\*\s]*\s*\{'
-  - language: LiveScript
- extensions: ['.lsp', '.lisp']
-  rules:
-  - language: Common Lisp
-    pattern: '^\s*\((?i:defun|in-package|defpackage) '
-  - language: NewLisp
-    pattern: '^\s*\(define '
- extensions: ['.m']
-  rules:
-  - language: Objective-C
-    named_pattern: objectivec
-  - language: Mercury
-    pattern: ':- module'
-  - language: MUF
-    pattern: '^: '
-  - language: M
-    pattern: '^\s*;'
-  - language: Mathematica
-    and:
-      - pattern: '\(\*'
-      - pattern: '\*\)$'
-  - language: MATLAB
-    pattern: '^\s*%'
-  - language: Limbo
-    pattern: '^\w+\s*:\s*module\s*\{'
- extensions: ['.m4']
-  rules:
-  - language: M4Sugar
-    pattern:
-    - 'AC_DEFUN|AC_PREREQ|AC_INIT'
-    - '^_?m4_'
-  - language: 'M4'
- extensions: ['.mask']
-  rules:
-  - language: Unity3D Asset
-    pattern: 'tag:unity3d.com'
- extensions: ['.mc']
-  rules:
-  - language: Win32 Message File
-    pattern: '(?i)^[ \t]*(?>\/\*\s*)?MessageId=|^\.$'
-  - language: M4
-    pattern: '^dnl|^divert\((?:-?\d+)?\)|^\w+\(`[^\r\n]*?''[),]'
-  - language: Monkey C
-    pattern: '\b(?:using|module|function|class|var)\s+\w'
- extensions: ['.md']
-  rules:
-  - language: Markdown
-    pattern:
-    - '(^[-A-Za-z0-9=#!\*\[|>])|<\/'
-    - '\A\z'
-  - language: GCC Machine Description
-    pattern: '^(;;|\(define_)'
-  - language: Markdown
- extensions: ['.ml']
-  rules:
-  - language: OCaml
-    pattern: '(^\s*module)|let rec |match\s+(\S+\s)+with'
-  - language: Standard ML
-    pattern: '=> |case\s+(\S+\s)+of'
- extensions: ['.mod']
-  rules:
-  - language: XML
-    pattern: '<!ENTITY '
-  - language: NMODL
-    pattern: '\b(NEURON|INITIAL|UNITS)\b'
-  - language: Modula-2
-    pattern: '^\s*(?i:MODULE|END) [\w\.]+;'
-  - language: [Linux Kernel Module, AMPL]
- extensions: ['.mojo']
-  rules:
-  - language: Mojo
-    pattern: '^\s*(alias|def|from|fn|import|struct|trait)\s'
-  - language: XML
-    pattern: '^\s*<\?xml'
- extensions: ['.ms']
-  rules:
-  - language: Roff
-    pattern: '^[.''][A-Za-z]{2}(\s|$)'
-  - language: Unix Assembly
-    and:
-      - negative_pattern: '/\*'
-      - pattern: '^\s*\.(?:include\s|globa?l\s|[A-Za-z][_A-Za-z0-9]*:)'
-  - language: MAXScript
- extensions: ['.n']
-  rules:
-  - language: Roff
-    pattern: '^[.'']'
-  - language: Nemerle
-    pattern: '^(module|namespace|using)\s'
- extensions: ['.ncl']
-  rules:
-  - language: XML
-    pattern: '^\s*<\?xml\s+version'
-  - language: Gerber Image
-    pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
-  - language: Text
-    pattern: 'THE_TITLE'
- extensions: ['.nl']
-  rules:
-  - language: NL
-    pattern: '^(b|g)[0-9]+ '
-  - language: NewLisp
- extensions: ['.nu']
-  rules:
-  - language: Nushell
-    pattern: '^\s*(import|export|module|def|let|let-env) '
-  - language: Nu
- extensions: ['.odin']
-  rules:
-  - language: Object Data Instance Notation
-    pattern: '(?:^|<)\s*[A-Za-z0-9_]+\s*=\s*<'
-  - language: Odin
-    pattern: 'package\s+\w+|\b(?:im|ex)port\s*"[\w:./]+"|\w+\s*::\s*(?:proc|struct)\s*\(|^\s*//\s'
- extensions: ['.p']
-  rules:
-  - language: Gnuplot
-    pattern:
-    - '^s?plot\b'
-    - '^set\s+(term|terminal|out|output|[xy]tics|[xy]label|[xy]range|style)\b'
-  - language: OpenEdge ABL
- extensions: ['.php']
-  rules:
-  - language: Hack
-    pattern: '<\?hh'
-  - language: PHP
-    pattern: '<\?[^h]'
- extensions: ['.pkl']
-  rules:
-    - language: Pkl
-      pattern:
-      - '^\s*(module|import|amends|extends|local|const|fixed|abstract|open|class|typealias|@\w+)\b'
-      - '^\s*[a-zA-Z0-9_$]+\s*(=|{|:)|^\s*`[^`]+`\s*(=|{|:)|for\s*\(|when\s*\('
-    - language: Pickle
- extensions: ['.pl']
-  rules:
-  - language: Prolog
-    pattern: '^[^#]*:-'
-  - language: Perl
-    and:
-      - negative_pattern: '^\s*use\s+v6\b'
-      - named_pattern: perl
-  - language: Raku
-    named_pattern: raku
- extensions: ['.plist']
-  rules:
-  - language: XML Property List
-    pattern: '^\s*(?:<\?xml\s|<!DOCTYPE\s+plist|<plist(?:\s+version\s*=\s*(["''])\d+(?:\.\d+)?\1)?\s*>\s*$)'
-  - language: OpenStep Property List
- extensions: ['.plt']
-  rules:
-  - language: Prolog
-    pattern: '^\s*:-'
- extensions: ['.pm']
-  rules:
-  - language: Perl
-    and:
-      - negative_pattern: '^\s*use\s+v6\b'
-      - named_pattern: perl
-  - language: Raku
-    named_pattern: raku
-  - language: X PixMap
-    pattern: '^\s*\/\* XPM \*\/'
- extensions: ['.pod']
-  rules:
-  - language: Pod 6
-    pattern: '^[\s&&[^\r\n]]*=(comment|begin pod|begin para|item\d+)'
-  - language: Pod
- extensions: ['.pp']
-  rules:
-  - language: Pascal
-    pattern: '^\s*end[.;]'
-  - language: Puppet
-    pattern: '^\s+\w+\s+=>\s'
- extensions: ['.pro']
-  rules:
-  - language: Proguard
-    pattern: '^-(include\b.*\.pro$|keep\b|keepclassmembers\b|keepattributes\b)'
-  - language: Prolog
-    pattern: '^[^\[#]+:-'
-  - language: INI
-    pattern: 'last_client='
-  - language: QMake
-    and:
-    - pattern: HEADERS
-    - pattern: SOURCES
-  - language: IDL
-    pattern: '^\s*(?i:function|pro|compile_opt) \w[ \w,:]*$'
- extensions: ['.properties']
-  rules:
-  - language: INI
-    and:
-    - named_pattern: key_equals_value
-    - pattern: '^[;\[]'
-  - language: Java Properties
-    and:
-    - named_pattern: key_equals_value
-    - pattern: '^[#!]'
-  - language: INI
-    named_pattern: key_equals_value
-  - language: Java Properties
-    pattern: '^[^#!][^:]*:'
- extensions: ['.q']
-  rules:
-  - language: q
-    pattern: '((?i:[A-Z.][\w.]*:\{)|^\\(cd?|d|l|p|ts?) )'
-  - language: HiveQL
-    pattern: '(?i:SELECT\s+[\w*,]+\s+FROM|(CREATE|ALTER|DROP)\s(DATABASE|SCHEMA|TABLE))'
- extensions: ['.qs']
-  rules:
-  - language: Q#
-    pattern: '^((\/{2,3})?\s*(namespace|operation)\b)'
-  - language: Qt Script
-    pattern: '(\w+\.prototype\.\w+|===|\bvar\b)'
- extensions: ['.r']
-  rules:
-  - language: Rebol
-    pattern: '(?i:\bRebol\b)'
-  - language: Rez
-    pattern: '(#include\s+["<](Types\.r|Carbon\/Carbon\.r)[">])|((resource|data|type)\s+''[A-Za-z0-9]{4}''\s+((\(.*\)\s+){0,1}){)'
-  - language: R
-    pattern: '<-|^\s*#'
- extensions: ['.re']
-  rules:
-  - language: Reason
-    pattern:
-    - '^\s*module\s+type\s'
-    - '^\s*(?:include|open)\s+\w+\s*;\s*$'
-    - '^\s*let\s+(?:module\s\w+\s*=\s*\{|\w+:\s+.*=.*;\s*$)'
-  - language: C++
-    pattern:
-    - '^\s*#(?:(?:if|ifdef|define|pragma)\s+\w|\s*include\s+<[^>]+>)'
-    - '^\s*template\s*<'
- extensions: ['.res']
-  rules:
-  - language: ReScript
-    pattern:
-    - '^\s*(let|module|type)\s+\w*\s+=\s+'
-    - '^\s*(?:include|open)\s+\w+\s*$'
- extensions: ['.rno']
-  rules:
-  - language: RUNOFF
-    pattern: '(?i:^\.!|^\f|\f$|^\.end lit(?:eral)?\b|^\.[a-zA-Z].*?;\.[a-zA-Z](?:[; \t])|\^\*[^\s*][^*]*\\\*(?=$|\s)|^\.c;[ \t]*\w+)'
-  - language: Roff
-    pattern: '^\.\\" '
- extensions: ['.rpy']
-  rules:
-  - language: Python
-    pattern: '^(import|from|class|def)\s'
-  - language: "Ren'Py"
- extensions: ['.rs']
-  rules:
-  - language: Rust
-    pattern: '^(use |fn |mod |pub |macro_rules|impl|#!?\[)'
-  - language: RenderScript
-    pattern: '#include|#pragma\s+(rs|version)|__attribute__'
-  - language: XML
-    pattern: '^\s*<\?xml'
- extensions: ['.s']
-  rules:
-  - language: Motorola 68K Assembly
-    named_pattern: m68k
- extensions: ['.sc']
-  rules:
-  - language: SuperCollider
-    pattern: '(?i:\^(this|super)\.|^\s*~\w+\s*=\.)'
-  - language: Scala
-    pattern: '(^\s*import (scala|java)\.|^\s*class\b)'
- extensions: ['.scd']
-  rules:
-  - language: SuperCollider
-    pattern: '(?i:\^(this|super)\.|^\s*(~\w+\s*=\.|SynthDef\b))'
-  - language: Markdown
-    # Markdown syntax for scdoc
-    pattern: '^#+\s+(NAME|SYNOPSIS|DESCRIPTION)'
- extensions: ['.sol']
-  rules:
-  - language: Solidity
-    pattern: '\bpragma\s+solidity\b|\b(?:abstract\s+)?contract\s+(?!\d)[a-zA-Z0-9$_]+(?:\s+is\s+(?:[a-zA-Z0-9$_][^\{]*?)?)?\s*\{'
-  - language: Gerber Image
-    pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
- extensions: ['.sql']
-  rules:
-   # Postgres
-  - language: PLpgSQL
-    pattern: '(?i:^\\i\b|AS\s+\$\$|LANGUAGE\s+''?plpgsql''?|BEGIN(\s+WORK)?\s*;)'
-  # IBM db2
-  - language: SQLPL
-    pattern: '(?i:ALTER\s+MODULE|MODE\s+DB2SQL|\bSYS(CAT|PROC)\.|ASSOCIATE\s+RESULT\s+SET|\bEND!\s*$)'
-  # Oracle
-  - language: PLSQL
-    pattern: '(?i:\$\$PLSQL_|XMLTYPE|systimestamp|\.nextval|CONNECT\s+BY|AUTHID\s+(DEFINER|CURRENT_USER)|constructor\W+function)'
-  # T-SQL
-  - language: TSQL
-    pattern: '(?i:^\s*GO\b|BEGIN(\s+TRY|\s+CATCH)|OUTPUT\s+INSERTED|DECLARE\s+@|\[dbo\])'
-  - language: SQL
- extensions: ['.srt']
-  rules:
-  - language: SubRip Text
-    pattern: '^(\d{2}:\d{2}:\d{2},\d{3})\s*(-->)\s*(\d{2}:\d{2}:\d{2},\d{3})$'
- extensions: ['.st']
-  rules:
-  - language: StringTemplate
-    pattern: '\$\w+[($]|(.)!\s*.+?\s*!\1|<!\s*.+?\s*!>|\[!\s*.+?\s*!\]|\{!\s*.+?\s*!\}'
-  - language: Smalltalk
-    pattern: '\A\s*[\[{(^"''\w#]|[a-zA-Z_]\w*\s*:=\s*[a-zA-Z_]\w*|class\s*>>\s*[a-zA-Z_]\w*|^[a-zA-Z_]\w*\s+[a-zA-Z_]\w*:|^Class\s*\{|if(?:True|False):\s*\['
- extensions: ['.star']
-  rules:
-  - language: STAR
-    pattern: '^loop_\s*$'
-  - language: Starlark
- extensions: ['.stl']
-  rules:
-  - language: STL
-    pattern: '\A\s*solid(?:$|\s)[\s\S]*^endsolid(?:$|\s)'
- extensions: ['.sw']
-  rules:
-  - language: Sway
-    pattern: '^\s*(?:(?:abi|dep|fn|impl|mod|pub|trait)\s|#\[)'
-  - language: XML
-    pattern: '^\s*<\?xml\s+version'
- extensions: ['.t']
-  rules:
-  - language: Perl
-    and:
-      - negative_pattern: '^\s*use\s+v6\b'
-      - named_pattern: perl
-  - language: Raku
-    pattern: '^\s*(?:use\s+v6\b|\bmodule\b|\bmy\s+class\b)'
-  - language: Turing
-    pattern: '^\s*%[ \t]+|^\s*var\s+\w+(\s*:\s*\w+)?\s*:=\s*\w+'
- extensions: ['.tag']
-  rules:
-  - language: Java Server Pages
-    pattern: '<%[@!=\s]?\s*(taglib|tag|include|attribute|variable)\s'
- extensions: ['.tlv']
-  rules:
-  - language: TL-Verilog
-    pattern: '^\\.{0,10}TLV_version'
- extensions: ['.toc']
-  rules:
-  - language: World of Warcraft Addon Data
-    pattern: '^## |@no-lib-strip@'
-  - language: TeX
-    pattern: '^\\(contentsline|defcounter|beamer|boolfalse)'
- extensions: ['.ts']
-  rules:
-  - language: XML
-    pattern: '<TS\b'
-  - language: TypeScript
- extensions: ['.tst']
-  rules:
-  - language: GAP
-    pattern: 'gap> '
-  # Heads up - we don't usually write heuristics like this (with no regex match)
-  - language: Scilab
- extensions: ['.tsx']
-  rules:
-  - language: TSX
-    pattern: '^\s*(import.+(from\s+|require\()[''"]react|\/\/\/\s*<reference\s)'
-  - language: XML
-    pattern: '(?i:^\s*<\?xml\s+version)'
- extensions: ['.txt']
-  rules:
-    # The following RegExp is simply a collapsed and simplified form of the
-    # VIM_MODELINE pattern in `./lib/linguist/strategy/modeline.rb`.
-  - language: Vim Help File
-    pattern: '(?:(?:^|[ \t])(?:vi|Vi(?=m))(?:m[<=>]?[0-9]+|m)?|[ \t]ex)(?=:(?=[ \t]*set?[ \t][^\r\n:]+:)|:(?![ \t]*set?[ \t]))(?:(?:[ \t]*:[ \t]*|[ \t])\w*(?:[ \t]*=(?:[^\\\s]|\\.)*)?)*[ \t:](?:filetype|ft|syntax)[ \t]*=(help)(?=$|\s|:)'
-  - language: Adblock Filter List
-    pattern: |-
-      (?x)\A
-      \[
-      (?<version>
-        (?:
-          [Aa]d[Bb]lock
-          (?:[ \t][Pp]lus)?
-          |
-          u[Bb]lock
-          (?:[ \t][Oo]rigin)?
-          |
-          [Aa]d[Gg]uard
-        )
-        (?:[ \t] \d+(?:\.\d+)*+)?
-      )
-      (?:
-        [ \t]?;[ \t]?
-        \g<version>
-      )*+
-      \]
-    # HACK: This is a contrived use of heuristics needed to address
-    # an unusual edge-case. See https://git.io/JULye for discussion.
-  - language: Text
- extensions: ['.typ']
-  rules:
-  - language: Typst
-    pattern: '^#(import|show|let|set)'
-  - language: XML
- extensions: ['.url']
-  rules:
-  - language: INI
-    pattern: '^\[InternetShortcut\](?:\r?\n|\r)(?>[^\s\[][^\r\n]*(?:\r?\n|\r))*URL='
- extensions: ['.v']
-  rules:
-  - language: Coq
-    pattern: '(?:^|\s)(?:Proof|Qed)\.(?:$|\s)|(?:^|\s)Require[ \t]+(Import|Export)\s'
-  - language: Verilog
-    pattern: '^[ \t]*module\s+[^\s()]+\s+\#?\(|^[ \t]*`(?:define|ifdef|ifndef|include|timescale)|^[ \t]*always[ \t]+@|^[ \t]*initial[ \t]+(begin|@)'
-  - language: V
-    pattern: '\$(?:if|else)[ \t]|^[ \t]*fn\s+[^\s()]+\(.*?\).*?\{|^[ \t]*for\s*\{'
- extensions: ['.vba']
-  rules:
-  - language: Vim Script
-    pattern: '^UseVimball'
-  - language: VBA
- extensions: ['.w']
-  rules:
-  - language: OpenEdge ABL
-    pattern: '&ANALYZE-SUSPEND _UIB-CODE-BLOCK _CUSTOM _DEFINITIONS'
-  - language: CWeb
-    pattern: '^@(<|\w+\.)'
- extensions: ['.x']
-  rules:
-  - language: DirectX 3D File
-    pattern:  '^xof 030(2|3)(?:txt|bin|tzip|bzip)\b'
-  - language: RPC
-    pattern: '\b(program|version)\s+\w+\s*\{|\bunion\s+\w+\s+switch\s*\('
-  - language: Logos
-    pattern: '^%(end|ctor|hook|group)\b'
-  - language: Linker Script
-    pattern: 'OUTPUT_ARCH\(|OUTPUT_FORMAT\(|SECTIONS'
- extensions: ['.yaml', '.yml']
-  rules:
-  - language: MiniYAML
-    pattern: '^\t+.*?[^\s:].*?:'
-    negative_pattern: '---'
-  - language: OASv2-yaml
-    pattern: 'swagger:\s?''?"?2.[0-9.]+''?"?'
-  - language: OASv3-yaml
-    pattern: 'openapi:\s?''?"?3.[0-9.]+''?"?'
-  - language: YAML
- extensions: ['.yy']
-  rules:
-  - language: JSON
-    pattern: '\"modelName\"\:\s*\"GM'
-  - language: Yacc
-named_patterns:
-  cpp:
-  - '^\s*#\s*include <(cstdint|string|vector|map|list|array|bitset|queue|stack|forward_list|unordered_map|unordered_set|(i|o|io)stream)>'
-  - '^\s*template\s*<'
-  - '^[ \t]*(try|constexpr)'
-  - '^[ \t]*catch\s*\('
-  - '^[ \t]*(class|(using[ \t]+)?namespace)\s+\w+'
-  - '^[ \t]*(private|public|protected):$'
-  - '__has_cpp_attribute|__cplusplus >'
-  - 'std::\w+'
-  euphoria:
-  - '^\s*namespace\s'
-  - '^\s*(?:public\s+)?include\s'
-  - '^\s*(?:(?:public|export|global)\s+)?(?:atom|constant|enum|function|integer|object|procedure|sequence|type)\s'
-  fortran: '^(?i:[c*][^abd-z]|      (subroutine|program|end|data)\s|\s*!)'
-  gsc:
-  - '^\s*#\s*(?:using|insert|include|define|namespace)[ \t]+\w'
-  - '^\s*(?>(?:autoexec|private)\s+){0,2}function\s+(?>(?:autoexec|private)\s+){0,2}\w+\s*\('
-  - '\b(?:level|self)[ \t]+thread[ \t]+(?:\[\[[ \t]*(?>\w+\.)*\w+[ \t]*\]\]|\w+)[ \t]*\([^\r\n\)]*\)[ \t]*;'
-  - '^[ \t]*#[ \t]*(?:precache|using_animtree)[ \t]*\('
-  key_equals_value: '^[^#!;][^=]*='
-  m68k:
-  - '(?im)\bmoveq(?:\.l)?\s+#(?:\$-?[0-9a-f]{1,3}|%[0-1]{1,8}|-?[0-9]{1,3}),\s*d[0-7]\b'
-  - '(?im)^\s*move(?:\.[bwl])?\s+(?:sr|usp),\s*[^\s]+'
-  - '(?im)^\s*move\.[bwl]\s+.*\b[ad]\d'
-  - '(?im)^\s*movem\.[bwl]\b'
-  - '(?im)^\s*move[mp](?:\.[wl])?\b'
-  - '(?im)^\s*btst\b'
-  - '(?im)^\s*dbra\b'
-  man-heading:  '^[.''][ \t]*SH +(?:[^"\s]+|"[^"\s]+)'
-  man-title:    '^[.''][ \t]*TH +(?:[^"\s]+|"[^"]+") +"?(?:[1-9]|@[^\s@]+@)'
-  mdoc-date:    '^[.''][ \t]*Dd +(?:[^"\s]+|"[^"]+")'
-  mdoc-heading: '^[.''][ \t]*Sh +(?:[^"\s]|"[^"]+")'
-  mdoc-title:   '^[.''][ \t]*Dt +(?:[^"\s]+|"[^"]+") +"?(?:[1-9]|@[^\s@]+@)'
-  objectivec: '^\s*(@(interface|class|protocol|property|end|synchronised|selector|implementation)\b|#import\s+.+\.h[">])'
-  perl:
-  - '\buse\s+(?:strict\b|v?5\b)'
-  - '^\s*use\s+(?:constant|overload)\b'
-  - '^\s*(?:\*|(?:our\s*)?@)EXPORT\s*='
-  - '^\s*package\s+[^\W\d]\w*(?:::\w+)*\s*(?:[;{]|\sv?\d)'
-  - '[\s$][^\W\d]\w*(?::\w+)*->[a-zA-Z_\[({]'
-  raku: '^\s*(?:use\s+v6\b|\bmodule\b|\b(?:my\s+)?class\b)'
-  vb-class: '^[ ]*VERSION [0-9]\.[0-9] CLASS'
-  vb-form: '^[ ]*VERSION [0-9]\.[0-9]{2}'
-  vb-module: '^[ ]*Attribute VB_Name = '
-  vba:
-  - '\b(?:VBA|[vV]ba)(?:\b|[0-9A-Z_])'
-    # VBA7 new 64-bit features
-  - '^[ ]*(?:Public|Private)? Declare PtrSafe (?:Sub|Function)\b'
-  - '^[ ]*#If Win64\b'
-  - '^[ ]*(?:Dim|Const) [0-9a-zA-Z_]*[ ]*As Long(?:Ptr|Long)\b'
-  # Top module declarations unique to VBA
-  - '^[ ]*Option (?:Private Module|Compare (?:Database|Text|Binary))\b'
-  # General VBA libraries and objects
-  - '(?: |\()(?:Access|Excel|Outlook|PowerPoint|Visio|Word|VBIDE)\.\w'
-  - '\b(?:(?:Active)?VBProjects?|VBComponents?|Application\.(?:VBE|ScreenUpdating))\b'
-  # AutoCAD, Outlook, PowerPoint and Word objects
-  - '\b(?:ThisDrawing|AcadObject|Active(?:Explorer|Inspector|Window\.Presentation|Presentation|Document)|Selection\.(?:Find|Paragraphs))\b'
-  # Excel objects
-  - '\b(?:(?:This|Active)?Workbooks?|Worksheets?|Active(?:Sheet|Chart|Cell)|WorksheetFunction)\b'
-  - '\b(?:Range\(".*|Cells\([0-9a-zA-Z_]*, (?:[0-9a-zA-Z_]*|"[a-zA-Z]{1,3}"))\)'
--- a/lexers/markdown.xml
+++ b/lexers/markdown.xml
@@ -1,56 +0,0 @@
-
-<lexer>
-  <config>
-    <name>Markdown</name>
-    <alias>markdown</alias>
-    <alias>md</alias>
-    <filename>*.md</filename>
-    <filename>*.markdown</filename>
-    <mime_type>text/x-markdown</mime_type>
-  </config>
-  <rules>
-    <state name="root">
-      <rule pattern="(^#[^#].+)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
-      <rule pattern="(^#{2,6}[^#].+)(\n)"><bygroups><token type="GenericSubheading"/><token type="Text"/></bygroups></rule>
-      <rule pattern="^(.+)(\n)(=+)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
-      <rule pattern="^(.+)(\n)(-+)(\n)"><bygroups><token type="GenericSubheading"/><token type="Text"/><token type="GenericSubheading"/><token type="Text"/></bygroups></rule>
-      <rule pattern="^(\s*)([*-] )(\[[ xX]\])( .+\n)"><bygroups><token type="TextWhitespace"/><token type="Keyword"/><token type="Keyword"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^(\s*)([*-])(\s)(.+\n)"><bygroups><token type="TextWhitespace"/><token type="Keyword"/><token type="TextWhitespace"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^(\s*)([0-9]+\.)( .+\n)"><bygroups><token type="TextWhitespace"/><token type="Keyword"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^(\s*&gt;\s)(.+\n)"><bygroups><token type="Keyword"/><token type="GenericEmph"/></bygroups></rule>
-      <rule pattern="^(```\n)([\w\W]*?)(^```$)">
-      <bygroups>
-        <token type="LiteralStringBacktick"/>
-        <token type="Text"/>
-        <token type="LiteralStringBacktick"/>
-      </bygroups>
-      </rule>
-      <rule pattern="^(```)(\w+)(\n)([\w\W]*?)(^```$)">
-        <bygroups>
-          <token type="LiteralStringBacktick"/>
-          <token type="NameLabel"/>  
-          <token type="TextWhitespace"/>
-          <UsingByGroup lexer="2" content="4"/>  
-          <token type="LiteralStringBacktick"/>
-        </bygroups>
-      </rule>
-      <rule><include state="inline"/></rule>
-    </state>
-    <state name="inline">
-      <rule pattern="\\."><token type="Text"/></rule>
-      <rule pattern="([^`]?)(`[^`\n]+`)"><bygroups><token type="Text"/><token type="LiteralStringBacktick"/></bygroups></rule>
-      <rule pattern="([^\*]?)(\*\*[^* \n][^*\n]*\*\*)"><bygroups><token type="Text"/><token type="GenericStrong"/></bygroups></rule>
-      <rule pattern="([^_]?)(__[^_ \n][^_\n]*__)"><bygroups><token type="Text"/><token type="GenericStrong"/></bygroups></rule>
-      <rule pattern="([^\*]?)(\*[^* \n][^*\n]*\*)"><bygroups><token type="Text"/><token type="GenericEmph"/></bygroups></rule>
-      <rule pattern="([^_]?)(_[^_ \n][^_\n]*_)"><bygroups><token type="Text"/><token type="GenericEmph"/></bygroups></rule>
-      <rule pattern="([^~]?)(~~[^~ \n][^~\n]*~~)"><bygroups><token type="Text"/><token type="GenericDeleted"/></bygroups></rule>
-      <rule pattern="[@#][\w/:]+"><token type="NameEntity"/></rule>
-      <rule pattern="(!?\[)([^]]+)(\])(\()([^)]+)(\))"><bygroups><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="Text"/><token type="NameAttribute"/><token type="Text"/></bygroups></rule>
-      <rule pattern="(\[)([^]]+)(\])(\[)([^]]*)(\])"><bygroups><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="Text"/><token type="NameLabel"/><token type="Text"/></bygroups></rule>
-      <rule pattern="^(\s*\[)([^]]*)(\]:\s*)(.+)"><bygroups><token type="Text"/><token type="NameLabel"/><token type="Text"/><token type="NameAttribute"/></bygroups></rule>
-      <rule pattern="[^\\\s]+"><token type="Text"/></rule>
-      <rule pattern="."><token type="Text"/></rule>
-    </state>
-  </rules>
-</lexer>
-
--- a/lexers/moinwiki.xml
+++ b/lexers/moinwiki.xml
@@ -1,34 +0,0 @@
-
-<lexer>
-  <config>
-    <name>MoinMoin/Trac Wiki markup</name>
-    <alias>trac-wiki</alias>
-    <alias>moin</alias>
-    <mime_type>text/x-trac-wiki</mime_type>
-    <case_insensitive>true</case_insensitive>
-  </config>
-  <rules>
-    <state name="root">
-      <rule pattern="^#.*$"><token type="Comment"/></rule>
-      <rule pattern="(!)(\S+)"><bygroups><token type="Keyword"/><token type="Text"/></bygroups></rule>
-      <rule pattern="^(=+)([^=]+)(=+)(\s*#.+)?$"><bygroups><token type="GenericHeading"/><usingself state="root"/><token type="GenericHeading"/><token type="LiteralString"/></bygroups></rule>
-      <rule pattern="(\{\{\{)(\n#!.+)?"><bygroups><token type="NameBuiltin"/><token type="NameNamespace"/></bygroups><push state="codeblock"/></rule>
-      <rule pattern="(\&#x27;\&#x27;\&#x27;?|\|\||`|__|~~|\^|,,|::)"><token type="Comment"/></rule>
-      <rule pattern="^( +)([.*-])( )"><bygroups><token type="Text"/><token type="NameBuiltin"/><token type="Text"/></bygroups></rule>
-      <rule pattern="^( +)([a-z]{1,5}\.)( )"><bygroups><token type="Text"/><token type="NameBuiltin"/><token type="Text"/></bygroups></rule>
-      <rule pattern="\[\[\w+.*?\]\]"><token type="Keyword"/></rule>
-      <rule pattern="(\[[^\s\]]+)(\s+[^\]]+?)?(\])"><bygroups><token type="Keyword"/><token type="LiteralString"/><token type="Keyword"/></bygroups></rule>
-      <rule pattern="^----+$"><token type="Keyword"/></rule>
-      <rule pattern="[^\n\&#x27;\[{!_~^,|]+"><token type="Text"/></rule>
-      <rule pattern="\n"><token type="Text"/></rule>
-      <rule pattern="."><token type="Text"/></rule>
-    </state>
-    <state name="codeblock">
-      <rule pattern="\}\}\}"><token type="NameBuiltin"/><pop depth="1"/></rule>
-      <rule pattern="\{\{\{"><token type="Text"/><push/></rule>
-      <rule pattern="[^{}]+"><token type="CommentPreproc"/></rule>
-      <rule pattern="."><token type="CommentPreproc"/></rule>
-    </state>
-  </rules>
-</lexer>
-
--- a/lexers/rst.xml
+++ b/lexers/rst.xml
@@ -1,76 +0,0 @@
-
-<lexer>
-  <config>
-    <name>reStructuredText</name>
-    <alias>restructuredtext</alias>
-    <alias>rst</alias>
-    <alias>rest</alias>
-    <filename>*.rst</filename>
-    <filename>*.rest</filename>
-    <mime_type>text/x-rst</mime_type>
-    <mime_type>text/prs.fallenstein.rst</mime_type>
-  </config>
-  <rules>
-    <state name="root">
-      <rule pattern="^(=+|-+|`+|:+|\.+|\&#x27;+|&quot;+|~+|\^+|_+|\*+|\++|#+)([ \t]*\n)(.+)(\n)(\1)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
-      <rule pattern="^(\S.*)(\n)(={3,}|-{3,}|`{3,}|:{3,}|\.{3,}|\&#x27;{3,}|&quot;{3,}|~{3,}|\^{3,}|_{3,}|\*{3,}|\+{3,}|#{3,})(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
-      <rule pattern="^(\s*)([-*+])( .+\n(?:\1  .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^(\s*)([0-9#ivxlcmIVXLCM]+\.)( .+\n(?:\1  .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^(\s*)(\(?[0-9#ivxlcmIVXLCM]+\))( .+\n(?:\1  .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^(\s*)([A-Z]+\.)( .+\n(?:\1  .+\n)+)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^(\s*)(\(?[A-Za-z]+\))( .+\n(?:\1  .+\n)+)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^(\s*)(\|)( .+\n(?:\|  .+\n)*)"><bygroups><token type="Text"/><token type="Operator"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^( *\.\.)(\s*)((?:source)?code(?:-block)?)(::)([ \t]*)([^\n]+)(\n[ \t]*\n)([ \t]+)(.*)(\n)((?:(?:\8.*)?\n)+)"> 
-        <bygroups>
-          <token type="Punctuation"/>
-          <token type="Text"/>
-          <token type="OperatorWord"/>
-          <token type="Punctuation"/>
-          <token type="Text"/>
-          <token type="Keyword"/>
-          <token type="Text"/>
-          <token type="Text"/>
-          <UsingByGroup lexer="6" content="9,10,11"/>
-        </bygroups>
-      </rule>
-      <rule pattern="^( *\.\.)(\s*)([\w:-]+?)(::)(?:([ \t]*)(.*))">
-        <bygroups>
-          <token type="Punctuation"/>
-          <token type="Text"/>
-          <token type="OperatorWord"/>
-          <token type="Punctuation"/>
-          <token type="Text"/>
-          <usingself state="inline"/>
-        </bygroups>
-      </rule>
-      <rule pattern="^( *\.\.)(\s*)(_(?:[^:\\]|\\.)+:)(.*?)$"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^( *\.\.)(\s*)(\[.+\])(.*?)$"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^( *\.\.)(\s*)(\|.+\|)(\s*)([\w:-]+?)(::)(?:([ \t]*)(.*))"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="OperatorWord"/><token type="Punctuation"/><token type="Text"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="^ *\.\..*(\n( +.*\n|\n)+)?"><token type="Comment"/></rule>
-      <rule pattern="^( *)(:(?:\\\\|\\:|[^:\n])+:(?=\s))([ \t]*)"><bygroups><token type="Text"/><token type="NameClass"/><token type="Text"/></bygroups></rule>
-      <rule pattern="^(\S.*(?&lt;!::)\n)((?:(?: +.*)\n)+)"><bygroups><usingself state="inline"/><usingself state="inline"/></bygroups></rule>
-      <rule pattern="(::)(\n[ \t]*\n)([ \t]+)(.*)(\n)((?:(?:\3.*)?\n)+)"><bygroups><token type="LiteralStringEscape"/><token type="Text"/><token type="LiteralString"/><token type="LiteralString"/><token type="Text"/><token type="LiteralString"/></bygroups></rule>
-      <rule><include state="inline"/></rule>
-    </state>
-    <state name="inline">
-      <rule pattern="\\."><token type="Text"/></rule>
-      <rule pattern="``"><token type="LiteralString"/><push state="literal"/></rule>
-      <rule pattern="(`.+?)(&lt;.+?&gt;)(`__?)"><bygroups><token type="LiteralString"/><token type="LiteralStringInterpol"/><token type="LiteralString"/></bygroups></rule>
-      <rule pattern="`.+?`__?"><token type="LiteralString"/></rule>
-      <rule pattern="(`.+?`)(:[a-zA-Z0-9:-]+?:)?"><bygroups><token type="NameVariable"/><token type="NameAttribute"/></bygroups></rule>
-      <rule pattern="(:[a-zA-Z0-9:-]+?:)(`.+?`)"><bygroups><token type="NameAttribute"/><token type="NameVariable"/></bygroups></rule>
-      <rule pattern="\*\*.+?\*\*"><token type="GenericStrong"/></rule>
-      <rule pattern="\*.+?\*"><token type="GenericEmph"/></rule>
-      <rule pattern="\[.*?\]_"><token type="LiteralString"/></rule>
-      <rule pattern="&lt;.+?&gt;"><token type="NameTag"/></rule>
-      <rule pattern="[^\\\n\[*`:]+"><token type="Text"/></rule>
-      <rule pattern="."><token type="Text"/></rule>
-    </state>
-    <state name="literal">
-      <rule pattern="[^`]+"><token type="LiteralString"/></rule>
-      <rule pattern="``((?=$)|(?=[-/:.,; \n\x00‐‑‒–— &#x27;&quot;\)\]\}&gt;’”»!\?]))"><token type="LiteralString"/><pop depth="1"/></rule>
-      <rule pattern="`"><token type="LiteralString"/></rule>
-    </state>
-  </rules>
-</lexer>
-
--- a/scripts/lexer_metadata.py
+++ b/scripts/lexer_metadata.py
@@ -40,18 +40,15 @@ for fname in glob.glob("lexers/*.xml"):
 with open("src/constants/lexers.cr", "w") as f:
    f.write("module Tartrazine\n")
    f.write("  LEXERS_BY_NAME = {\n")
-    for k in sorted(lexer_by_name.keys()):
-        v = lexer_by_name[k]
+    for k, v in lexer_by_name.items():
        f.write(f'"{k}" => "{v}", \n')
    f.write("}\n")
    f.write("  LEXERS_BY_MIMETYPE = {\n")
-    for k in sorted(lexer_by_mimetype.keys()):
-        v = lexer_by_mimetype[k]
+    for k, v in lexer_by_mimetype.items():
        f.write(f'"{k}" => "{v}", \n')
    f.write("}\n")
    f.write("  LEXERS_BY_FILENAME = {\n")
-    for k in sorted(lexer_by_filename.keys()):
-        v = lexer_by_filename[k]
-        f.write(f'"{k}" => {str(sorted(list(v))).replace("'", "\"")}, \n')
+    for k, v in lexer_by_filename.items():
+        f.write(f'"{k}" => {str(list(v)).replace("'", "\"")}, \n')
    f.write("}\n")
    f.write("end\n")
--- a/shard.yml
+++ b/shard.yml
@@ -1,5 +1,5 @@
 name: tartrazine
-version: 0.6.1
+version: 0.2.0

 authors:
  - Roberto Alsina <roberto.alsina@gmail.com>
--- a/spec/tartrazine_spec.cr
+++ b/spec/tartrazine_spec.cr
@@ -14,18 +14,15 @@ unicode_problems = {
  "#{__DIR__}/tests/java/test_string_literals.txt",
  "#{__DIR__}/tests/json/test_strings.txt",
  "#{__DIR__}/tests/systemd/example1.txt",
-  "#{__DIR__}/tests/c++/test_unicode_identifiers.txt",
 }

 # These testcases fail because of differences in the way chroma and tartrazine tokenize
 # but tartrazine is correct
 bad_in_chroma = {
  "#{__DIR__}/tests/bash_session/test_comment_after_prompt.txt",
-  "#{__DIR__}/tests/html/javascript_backtracking.txt",
  "#{__DIR__}/tests/java/test_default.txt",
  "#{__DIR__}/tests/java/test_multiline_string.txt",
  "#{__DIR__}/tests/java/test_numeric_literals.txt",
-  "#{__DIR__}/tests/octave/test_multilinecomment.txt",
  "#{__DIR__}/tests/php/test_string_escaping_run.txt",
  "#{__DIR__}/tests/python_2/test_cls_builtin.txt",
 }
@@ -33,14 +30,22 @@ bad_in_chroma = {
 known_bad = {
  "#{__DIR__}/tests/bash_session/fake_ps2_prompt.txt",
  "#{__DIR__}/tests/bash_session/prompt_in_output.txt",
-  "#{__DIR__}/tests/bash_session/ps2_prompt.txt",
  "#{__DIR__}/tests/bash_session/test_newline_in_echo_no_ps2.txt",
-  "#{__DIR__}/tests/bash_session/test_newline_in_echo_ps2.txt",
-  "#{__DIR__}/tests/bash_session/test_newline_in_ls_no_ps2.txt",
  "#{__DIR__}/tests/bash_session/test_newline_in_ls_ps2.txt",
+  "#{__DIR__}/tests/bash_session/ps2_prompt.txt",
+  "#{__DIR__}/tests/bash_session/test_newline_in_ls_no_ps2.txt",
  "#{__DIR__}/tests/bash_session/test_virtualenv.txt",
+  "#{__DIR__}/tests/bash_session/test_newline_in_echo_ps2.txt",
+  "#{__DIR__}/tests/c/test_string_resembling_decl_end.txt",
+  "#{__DIR__}/tests/html/css_backtracking.txt",
  "#{__DIR__}/tests/mcfunction/data.txt",
  "#{__DIR__}/tests/mcfunction/selectors.txt",
+  "#{__DIR__}/tests/php/anonymous_class.txt",
+  "#{__DIR__}/tests/html/javascript_unclosed.txt",
+# BAD FOR ONIGMO
+"#{__DIR__}/tests/json/test_backtracking.txt",
+
+
 }

 # Tests that fail because of a limitation in PCRE2
@@ -56,6 +61,7 @@ describe Tartrazine do
        end
      else
        it "parses #{testcase}".split("/")[-2...].join("/") do
+          p! testcase
          text = File.read(testcase).split("---input---\n").last.split("---tokens---").first
          lexer_name = File.basename(File.dirname(testcase)).downcase
          unless failing_lexers.includes?(lexer_name) ||
@@ -72,8 +78,8 @@ end

 # Helper that creates lexer and tokenizes
 def tokenize(lexer_name, text)
-  tokenizer = Tartrazine.lexer(lexer_name).tokenizer(text)
-  Tartrazine::Lexer.collapse_tokens(tokenizer.to_a)
+  lexer = Tartrazine.lexer(lexer_name)
+  lexer.tokenize(text)
 end

 # Helper that tokenizes using chroma to validate the lexer
--- a/src/actions.cr
+++ b/src/actions.cr
@@ -8,33 +8,12 @@ require "./tartrazine"
 # perform a list of actions. These actions can emit tokens
 # or change the state machine.
 module Tartrazine
-  enum ActionType
-    Bygroups
-    Combined
-    Include
-    Pop
-    Push
-    Token
-    Using
-    Usingbygroup
-    Usingself
-  end
-
-  struct Action
+  class Action
+    property type : String
+    property xml : XML::Node
    property actions : Array(Action) = [] of Action

-    @content_index : Array(Int32) = [] of Int32
-    @depth : Int32 = 0
-    @lexer_index : Int32 = 0
-    @lexer_name : String = ""
-    @states : Array(String) = [] of String
-    @states_to_push : Array(String) = [] of String
-    @token_type : String = ""
-    @type : ActionType = ActionType::Token
-
-    def initialize(t : String, xml : XML::Node?)
-      @type = ActionType.parse(t.capitalize)
-
+    def initialize(@type : String, @xml : XML::Node?)
      # Some actions may have actions in them, like this:
      # <bygroups>
      # <token type="GenericPrompt"/>
@@ -44,56 +23,48 @@ module Tartrazine
      #
      # The token actions match with the first 2 groups in the regex
      # the using action matches the 3rd and shunts it to another lexer
-      xml.children.each do |node|
+      @xml.children.each do |node|
        next unless node.element?
        @actions << Action.new(node.name, node)
      end
-
-      # Prefetch the attributes we ned from the XML and keep them
-      case @type
-      when ActionType::Token
-        @token_type = xml["type"]
-      when ActionType::Push
-        @states_to_push = xml.attributes.select { |attrib|
-          attrib.name == "state"
-        }.map &.content
-      when ActionType::Pop
-        @depth = xml["depth"].to_i
-      when ActionType::Using
-        @lexer_name = xml["lexer"].downcase
-      when ActionType::Combined
-        @states = xml.attributes.select { |attrib|
-          attrib.name == "state"
-        }.map &.content
-      when ActionType::Usingbygroup
-        @lexer_index = xml["lexer"].to_i
-        @content_index = xml["content"].split(",").map(&.to_i)
-      end
    end

    # ameba:disable Metrics/CyclomaticComplexity
-    def emit(match : MatchData, tokenizer : Tokenizer, match_group = 0) : Array(Token)
-      case @type
-      when ActionType::Token
-        raise Exception.new "Can't have a token without a match" if match.empty?
-        [Token.new(type: @token_type, value: String.new(match[match_group].value))]
-      when ActionType::Push
-        to_push = @states_to_push.empty? ? [tokenizer.state_stack.last] : @states_to_push
-        to_push.each do |state|
-          if state == "#pop" && tokenizer.state_stack.size > 1
+    def emit(match, lexer : Lexer, match_group = 0) : Array(Token)
+      case type
+      when "token"
+        raise Exception.new "Can't have a token without a match" if match.nil?
+        [Token.new(type: xml["type"], value: match[match_group].as(Onigmo::Match).value)]
+      when "push"
+        states_to_push = xml.attributes.select { |attrib|
+          attrib.name == "state"
+        }.map &.content
+        if states_to_push.empty?
+          # Push without a state means push the current state
+          states_to_push = [lexer.state_stack.last]
+        end
+        states_to_push.each do |state|
+          if state == "#pop"
            # Pop the state
-            tokenizer.state_stack.pop
+            Log.trace { "Popping state" }
+            lexer.state_stack.pop
          else
            # Really push
-            tokenizer.state_stack << state
+            lexer.state_stack << state
+            Log.trace { "Pushed #{lexer.state_stack}" }
          end
        end
        [] of Token
-      when ActionType::Pop
-        to_pop = [@depth, tokenizer.state_stack.size - 1].min
-        tokenizer.state_stack.pop(to_pop)
+      when "pop"
+        depth = xml["depth"].to_i
+        Log.trace { "Popping #{depth} states" }
+        if lexer.state_stack.size <= depth
+          Log.trace { "Can't pop #{depth} states, only have #{lexer.state_stack.size}" }
+        else
+          lexer.state_stack.pop(depth)
+        end
        [] of Token
-      when ActionType::Bygroups
+      when "bygroups"
        # FIXME: handle
        # ><bygroups>
        # <token type="Punctuation"/>
@@ -108,50 +79,38 @@ module Tartrazine
        # the action is skipped.
        result = [] of Token
        @actions.each_with_index do |e, i|
-          begin
-            next if match[i + 1].size == 0
-          rescue IndexError
-            # FIXME: This should not actually happen
-            # No match for this group
-            next
-          end
-          result += e.emit(match, tokenizer, i + 1)
+          next if match[i + 1]?.nil?
+          result += e.emit(match, lexer, i + 1)
        end
        result
-      when ActionType::Using
+      when "using"
        # Shunt to another lexer entirely
-        return [] of Token if match.empty?
-        Tartrazine.lexer(@lexer_name).tokenizer(
-          String.new(match[match_group].value),
-          secondary: true).to_a
-      when ActionType::Usingself
+        return [] of Token if match.nil?
+        lexer_name = xml["lexer"].downcase
+        Log.trace { "to tokenize: #{match[match_group]}" }
+        Tartrazine.lexer(lexer_name).tokenize(match[match_group].as(Onigmo::Match).value, usingself: true)
+      when "usingself"
        # Shunt to another copy of this lexer
-        return [] of Token if match.empty?
-        tokenizer.lexer.tokenizer(
-          String.new(match[match_group].value),
-          secondary: true).to_a
-      when ActionType::Combined
-        # Combine two or more states into one anonymous state
-        new_state = @states.map { |name|
-          tokenizer.lexer.states[name]
+        return [] of Token if match.nil?
+
+        new_lexer = Lexer.from_xml(lexer.xml)
+        Log.trace { "to tokenize: #{match[match_group]}" }
+        new_lexer.tokenize(match[match_group].as(Onigmo::Match).value, usingself: true)
+      when "combined"
+        # Combine two states into one anonymous state
+        states = xml.attributes.select { |attrib|
+          attrib.name == "state"
+        }.map &.content
+        new_state = states.map { |name|
+          lexer.states[name]
        }.reduce { |state1, state2|
          state1 + state2
        }
-        tokenizer.lexer.states[new_state.name] = new_state
-        tokenizer.state_stack << new_state.name
+        lexer.states[new_state.name] = new_state
+        lexer.state_stack << new_state.name
        [] of Token
-      when ActionType::Usingbygroup
-        # Shunt to content-specified lexer
-        return [] of Token if match.empty?
-        content = ""
-        @content_index.each do |i|
-          content += String.new(match[i].value)
-        end
-        Tartrazine.lexer(String.new(match[@lexer_index].value)).tokenizer(
-          content,
-          secondary: true).to_a
      else
-        raise Exception.new("Unknown action type: #{@type}")
+        raise Exception.new("Unknown action type: #{type}: #{xml}")
      end
    end
  end
--- a/src/bytes_regex.cr
+++ b/src/bytes_regex.cr
@@ -1,73 +0,0 @@
-module BytesRegex
-  extend self
-
-  class Regex
-    def initialize(pattern : String, multiline = false, dotall = false, ignorecase = false, anchored = false)
-      flags = LibPCRE2::UTF | LibPCRE2::UCP | LibPCRE2::NO_UTF_CHECK
-      flags |= LibPCRE2::MULTILINE if multiline
-      flags |= LibPCRE2::DOTALL if dotall
-      flags |= LibPCRE2::CASELESS if ignorecase
-      flags |= LibPCRE2::ANCHORED if anchored
-      if @re = LibPCRE2.compile(
-           pattern,
-           pattern.bytesize,
-           flags,
-           out errorcode,
-           out erroroffset,
-           nil)
-      else
-        msg = String.new(256) do |buffer|
-          bytesize = LibPCRE2.get_error_message(errorcode, buffer, 256)
-          {bytesize, 0}
-        end
-        raise Exception.new "Error #{msg} compiling regex at offset #{erroroffset}"
-      end
-      @match_data = LibPCRE2.match_data_create_from_pattern(@re, nil)
-    end
-
-    def finalize
-      LibPCRE2.match_data_free(@match_data)
-      LibPCRE2.code_free(@re)
-    end
-
-    def match(str : Bytes, pos = 0) : Array(Match)
-      rc = LibPCRE2.match(
-        @re,
-        str,
-        str.size,
-        pos,
-        LibPCRE2::NO_UTF_CHECK,
-        @match_data,
-        nil)
-      if rc > 0
-        ovector = LibPCRE2.get_ovector_pointer(@match_data)
-        (0...rc).map do |i|
-          m_start = ovector[2 * i]
-          m_end = ovector[2 * i + 1]
-          if m_start == m_end
-            m_value = Bytes.new(0)
-          else
-            m_value = str[m_start...m_end]
-          end
-          Match.new(m_value, m_start, m_end - m_start)
-        end
-      else
-        [] of Match
-      end
-    end
-  end
-
-  struct Match
-    property value : Bytes
-    property start : UInt64
-    property size : UInt64
-
-    def initialize(@value : Bytes, @start : UInt64, @size : UInt64)
-    end
-  end
-end
-
-# pattern = "foo"
-# str = "foo bar"
-# re = BytesRegex::Regex.new(pattern)
-# p! String.new(re.match(str.to_slice)[0].value)
--- a/src/constants/lexers.cr
+++ b/src/constants/lexers.cr
--- a/src/formatter.cr
+++ b/src/formatter.cr
@@ -9,19 +9,12 @@ module Tartrazine
  # This is the base class for all formatters.
  abstract class Formatter
    property name : String = ""
-    property theme : Theme = Tartrazine.theme("default-dark")

-    # Format the text using the given lexer.
-    def format(text : String, lexer : Lexer, io : IO = nil) : Nil
+    def format(text : String, lexer : Lexer, theme : Theme) : String
      raise Exception.new("Not implemented")
    end

-    def format(text : String, lexer : Lexer) : String
-      raise Exception.new("Not implemented")
-    end
-
-    # Return the styles, if the formatter supports it.
-    def style_defs : String
+    def get_style_defs(theme : Theme) : String
      raise Exception.new("Not implemented")
    end
  end
--- a/src/formatters/ansi.cr
+++ b/src/formatters/ansi.cr
@@ -4,33 +4,20 @@ module Tartrazine
  class Ansi < Formatter
    property? line_numbers : Bool = false

-    def initialize(@theme : Theme = Tartrazine.theme("default-dark"), @line_numbers : Bool = false)
-    end
-
-    private def line_label(i : Int32) : String
-      "#{i + 1}".rjust(4).ljust(5)
-    end
-
-    def format(text : String, lexer : Lexer) : String
-      outp = String::Builder.new("")
-      format(text, lexer, outp)
-      outp.to_s
-    end
-
-    def format(text : String, lexer : BaseLexer, outp : IO) : Nil
-      tokenizer = lexer.tokenizer(text)
-      i = 0
-      outp << line_label(i) if line_numbers?
-      tokenizer.each do |token|
-        outp << colorize(token[:value], token[:type])
-        if token[:value].includes?("\n")
-          i += 1
-          outp << line_label(i) if line_numbers?
+    def format(text : String, lexer : Lexer, theme : Theme) : String
+      output = String.build do |outp|
+        lexer.group_tokens_in_lines(lexer.tokenize(text)).each_with_index do |line, i|
+          label = line_numbers? ? "#{i + 1}".rjust(4).ljust(5) : ""
+          outp << label
+          line.each do |token|
+            outp << colorize(token[:value], token[:type], theme)
+          end
        end
      end
+      output
    end

-    def colorize(text : String, token : String) : String
+    def colorize(text : String, token : String, theme : Theme) : String
      style = theme.styles.fetch(token, nil)
      return text if style.nil?
      if theme.styles.has_key?(token)
--- a/src/formatters/html.cr
+++ b/src/formatters/html.cr
@@ -1,6 +1,5 @@
 require "../constants/token_abbrevs.cr"
 require "../formatter"
-require "html"

 module Tartrazine
  class Html < Formatter
@@ -16,78 +15,56 @@ module Tartrazine
    property? standalone : Bool = false
    property? surrounding_pre : Bool = true
    property? wrap_long_lines : Bool = false
-    property weight_of_bold : Int32 = 600
+    property? weight_of_bold : Int32 = 600

-    property theme : Theme
-
-    def initialize(@theme : Theme = Tartrazine.theme("default-dark"), *,
-                   @highlight_lines = [] of Range(Int32, Int32),
-                   @class_prefix : String = "",
-                   @line_number_id_prefix = "line-",
-                   @line_number_start = 1,
-                   @tab_width = 8,
-                   @line_numbers : Bool = false,
-                   @linkable_line_numbers : Bool = true,
-                   @standalone : Bool = false,
-                   @surrounding_pre : Bool = true,
-                   @wrap_long_lines : Bool = false,
-                   @weight_of_bold : Int32 = 600)
-    end
-
-    def format(text : String, lexer : Lexer) : String
-      outp = String::Builder.new("")
-      format(text, lexer, outp)
-      outp.to_s
-    end
-
-    def format(text : String, lexer : BaseLexer, io : IO) : Nil
-      pre, post = wrap_standalone
-      io << pre if standalone?
-      format_text(text, lexer, io)
-      io << post if standalone?
+    def format(text : String, lexer : Lexer, theme : Theme) : String
+      text = format_text(text, lexer, theme)
+      if standalone?
+        text = wrap_standalone(text, theme)
+      end
+      text
    end

    # Wrap text into a full HTML document, including the CSS for the theme
-    def wrap_standalone
+    def wrap_standalone(text, theme) : String
      output = String.build do |outp|
        outp << "<!DOCTYPE html><html><head><style>"
-        outp << style_defs
+        outp << get_style_defs(theme)
        outp << "</style></head><body>"
+        outp << text
+        outp << "</body></html>"
      end
-      {output.to_s, "</body></html>"}
+      output
    end

-    private def line_label(i : Int32) : String
-      line_label = "#{i + 1}".rjust(4).ljust(5)
-      line_class = highlighted?(i + 1) ? "class=\"#{get_css_class("LineHighlight")}\"" : ""
-      line_id = linkable_line_numbers? ? "id=\"#{line_number_id_prefix}#{i + 1}\"" : ""
-      "<span #{line_id} #{line_class} style=\"user-select: none;\">#{line_label} </span>"
-    end
-
-    def format_text(text : String, lexer : BaseLexer, outp : IO)
-      tokenizer = lexer.tokenizer(text)
-      i = 0
-      if surrounding_pre?
-        pre_style = wrap_long_lines? ? "style=\"white-space: pre-wrap; word-break: break-word;\"" : ""
-        outp << "<pre class=\"#{get_css_class("Background")}\" #{pre_style}>"
-      end
-      outp << "<code class=\"#{get_css_class("Background")}\">"
-      outp << line_label(i) if line_numbers?
-      tokenizer.each do |token|
-        outp << "<span class=\"#{get_css_class(token[:type])}\">#{HTML.escape(token[:value])}</span>"
-        if token[:value].ends_with? "\n"
-          i += 1
-          outp << line_label(i) if line_numbers?
+    def format_text(text : String, lexer : Lexer, theme : Theme) : String
+      lines = lexer.group_tokens_in_lines(lexer.tokenize(text))
+      output = String.build do |outp|
+        if surrounding_pre?
+          pre_style = wrap_long_lines? ? "style=\"white-space: pre-wrap; word-break: break-word;\"" : ""
+          outp << "<pre class=\"#{get_css_class("Background", theme)}\" #{pre_style}>"
        end
+        outp << "<code class=\"#{get_css_class("Background", theme)}\">"
+        lines.each_with_index(offset: line_number_start - 1) do |line, i|
+          line_label = line_numbers? ? "#{i + 1}".rjust(4).ljust(5) : ""
+          line_class = highlighted?(i + 1) ? "class=\"#{get_css_class("LineHighlight", theme)}\"" : ""
+          line_id = linkable_line_numbers? ? "id=\"#{line_number_id_prefix}#{i + 1}\"" : ""
+          outp << "<span #{line_id} #{line_class} style=\"user-select: none;\">#{line_label} </span>"
+          line.each do |token|
+            fragment = "<span class=\"#{get_css_class(token[:type], theme)}\">#{token[:value]}</span>"
+            outp << fragment
+          end
+        end
+        outp << "</code></pre>"
      end
-      outp << "</code></pre>"
+      output
    end

    # ameba:disable Metrics/CyclomaticComplexity
-    def style_defs : String
+    def get_style_defs(theme : Theme) : String
      output = String.build do |outp|
        theme.styles.each do |token, style|
-          outp << ".#{get_css_class(token)} {"
+          outp << ".#{get_css_class(token, theme)} {"
          # These are set or nil
          outp << "color: ##{style.color.try &.hex};" if style.color
          outp << "background-color: ##{style.background.try &.hex};" if style.background
@@ -110,21 +87,18 @@ module Tartrazine
    end

    # Given a token type, return the CSS class to use.
-    def get_css_class(token : String) : String
-      if !theme.styles.has_key? token
-        # Themes don't contain information for each specific
-        # token type. However, they may contain information
-        # for a parent style. Worst case, we go to the root
-        # (Background) style.
-        parent = theme.style_parents(token).reverse.find { |dad|
-          theme.styles.has_key?(dad)
-        }
-        theme.styles[token] = theme.styles[parent]
-      end
-      class_prefix + Abbreviations[token]
+    def get_css_class(token, theme)
+      return class_prefix + Abbreviations[token] if theme.styles.has_key?(token)
+
+      # Themes don't contain information for each specific
+      # token type. However, they may contain information
+      # for a parent style. Worst case, we go to the root
+      # (Background) style.
+      class_prefix + Abbreviations[theme.style_parents(token).reverse.find { |parent|
+        theme.styles.has_key?(parent)
+      }]
    end

-    # Is this line in the highlighted ranges?
    def highlighted?(line : Int) : Bool
      highlight_lines.any?(&.includes?(line))
    end
--- a/src/formatters/json.cr
+++ b/src/formatters/json.cr
@@ -4,15 +4,8 @@ module Tartrazine
  class Json < Formatter
    property name = "json"

-    def format(text : String, lexer : BaseLexer) : String
-      outp = String::Builder.new("")
-      format(text, lexer, outp)
-      outp.to_s
-    end
-
-    def format(text : String, lexer : BaseLexer, io : IO) : Nil
-      tokenizer = lexer.tokenizer(text)
-      io << Tartrazine::Lexer.collapse_tokens(tokenizer.to_a).to_json
+    def format(text : String, lexer : Lexer, _theme : Theme) : String
+      lexer.tokenize(text).to_json
    end
  end
 end
--- a/src/heuristics.cr
+++ b/src/heuristics.cr
@@ -1,81 +0,0 @@
-require "yaml"
-
-# Use linguist's heuristics to disambiguate between languages
-# This is *shamelessly* stolen from https://github.com/github-linguist/linguist
-# and ported to Crystal. Deepest thanks to the authors of Linguist
-# for licensing it liberally.
-#
-# Consider this code (c) 2017 GitHub, Inc. even if I wrote it.
-module Linguist
-  class Heuristic
-    include YAML::Serializable
-
-    property disambiguations : Array(Disambiguation)
-    property named_patterns : Hash(String, String | Array(String))
-
-    # Run the heuristics on the given filename and content
-    def run(filename, content)
-      ext = File.extname filename
-      disambiguation = disambiguations.find do |item|
-        item.extensions.includes? ext
-      end
-      disambiguation.try &.run(content, named_patterns)
-    end
-  end
-
-  class Disambiguation
-    include YAML::Serializable
-    property extensions : Array(String)
-    property rules : Array(LangRule)
-
-    def run(content, named_patterns)
-      rules.each do |rule|
-        if rule.match(content, named_patterns)
-          return rule.language
-        end
-      end
-      nil
-    end
-  end
-
-  class LangRule
-    include YAML::Serializable
-    property pattern : (String | Array(String))?
-    property negative_pattern : (String | Array(String))?
-    property named_pattern : String?
-    property and : Array(LangRule)?
-    property language : String | Array(String)?
-
-    # ameba:disable Metrics/CyclomaticComplexity
-    def match(content, named_patterns)
-      # This rule matches without conditions
-      return true if !pattern && !negative_pattern && !named_pattern && !and
-
-      if pattern
-        p_arr = [] of String
-        p_arr << pattern.as(String) if pattern.is_a? String
-        p_arr = pattern.as(Array(String)) if pattern.is_a? Array(String)
-        return true if p_arr.any? { |pat| ::Regex.new(pat).matches?(content) }
-      end
-      if negative_pattern
-        p_arr = [] of String
-        p_arr << negative_pattern.as(String) if negative_pattern.is_a? String
-        p_arr = negative_pattern.as(Array(String)) if negative_pattern.is_a? Array(String)
-        return true if p_arr.none? { |pat| ::Regex.new(pat).matches?(content) }
-      end
-      if named_pattern
-        p_arr = [] of String
-        if named_patterns[named_pattern].is_a? String
-          p_arr << named_patterns[named_pattern].as(String)
-        else
-          p_arr = named_patterns[named_pattern].as(Array(String))
-        end
-        result = p_arr.any? { |pat| ::Regex.new(pat).matches?(content) }
-      end
-      if and
-        result = and.as(Array(LangRule)).all?(&.match(content, named_patterns))
-      end
-      result
-    end
-  end
-end
--- a/src/lexer.cr
+++ b/src/lexer.cr
@@ -1,172 +1,107 @@
-require "baked_file_system"
 require "./constants/lexers"

 module Tartrazine
  class LexerFiles
    extend BakedFileSystem
+
    bake_folder "../lexers", __DIR__
  end

  # Get the lexer object for a language name
  # FIXME: support mimetypes
-  def self.lexer(name : String? = nil, filename : String? = nil) : BaseLexer
-    return lexer_by_name(name) if name && name != "autodetect"
-    return lexer_by_filename(filename) if filename
-
-    Lexer.from_xml(LexerFiles.get("/#{LEXERS_BY_NAME["plaintext"]}.xml").gets_to_end)
-  end
-
-  private def self.lexer_by_name(name : String) : BaseLexer
-    lexer_file_name = LEXERS_BY_NAME.fetch(name.downcase, nil)
-    return create_delegating_lexer(name) if lexer_file_name.nil? && name.includes? "+"
-    raise Exception.new("Unknown lexer: #{name}") if lexer_file_name.nil?
-
-    Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end)
-  end
-
-  private def self.lexer_by_filename(filename : String) : BaseLexer
-    candidates = Set(String).new
-    LEXERS_BY_FILENAME.each do |k, v|
-      candidates += v.to_set if File.match?(k, File.basename(filename))
-    end
-
-    case candidates.size
-    when 0
+  def self.lexer(name : String? = nil, filename : String? = nil) : Lexer
+    if name.nil? && filename.nil?
      lexer_file_name = LEXERS_BY_NAME["plaintext"]
-    when 1
-      lexer_file_name = candidates.first
+    elsif name && name != "autodetect"
+      lexer_file_name = LEXERS_BY_NAME[name.downcase]
    else
-      lexer_file_name = self.lexer_by_content(filename)
-      begin
-        return self.lexer(lexer_file_name)
-      rescue ex : Exception
-        raise Exception.new("Multiple lexers match the filename: #{candidates.to_a.join(", ")}, heuristics suggest #{lexer_file_name} but there is no matching lexer.")
+      # Guess by filename
+      candidates = Set(String).new
+      LEXERS_BY_FILENAME.each do |k, v|
+        candidates += v.to_set if File.match?(k, File.basename(filename.to_s))
+      end
+      case candidates.size
+      when 0
+        lexer_file_name = LEXERS_BY_NAME["plaintext"]
+      when 1
+        lexer_file_name = candidates.first
+      else
+        raise Exception.new("Multiple lexers match the filename: #{candidates.to_a.join(", ")}")
      end
    end
-
    Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end)
  end

-  private def self.lexer_by_content(fname : String) : String?
-    h = Linguist::Heuristic.from_yaml(LexerFiles.get("/heuristics.yml").gets_to_end)
-    result = h.run(fname, File.read(fname))
-    case result
-    when Nil
-      raise Exception.new "No lexer found for #{fname}"
-    when String
-      result.as(String)
-    when Array(String)
-      result.first
-    end
-  end
-
-  private def self.create_delegating_lexer(name : String) : BaseLexer
-    language, root = name.split("+", 2)
-    language_lexer = lexer(language)
-    root_lexer = lexer(root)
-    DelegatingLexer.new(language_lexer, root_lexer)
-  end
-
  # Return a list of all lexers
  def self.lexers : Array(String)
    LEXERS_BY_NAME.keys.sort!
  end

-  # A token, the output of the tokenizer
-  alias Token = NamedTuple(type: String, value: String)
-
-  abstract class BaseTokenizer
-  end
-
-  class Tokenizer < BaseTokenizer
-    include Iterator(Token)
-    property lexer : BaseLexer
-    property text : Bytes
-    property pos : Int32 = 0
-    @dq = Deque(Token).new
-    property state_stack = ["root"]
-
-    def initialize(@lexer : BaseLexer, text : String, secondary = false)
-      # Respect the `ensure_nl` config option
-      if text.size > 0 && text[-1] != '\n' && @lexer.config[:ensure_nl] && !secondary
-        text += "\n"
-      end
-      @text = text.to_slice
-    end
-
-    def next : Iterator::Stop | Token
-      if @dq.size > 0
-        return @dq.shift
-      end
-      if pos == @text.size
-        return stop
-      end
-
-      matched = false
-      while @pos < @text.size
-        @lexer.states[@state_stack.last].rules.each do |rule|
-          matched, new_pos, new_tokens = rule.match(@text, @pos, self)
-          if matched
-            @pos = new_pos
-            split_tokens(new_tokens).each { |token| @dq << token }
-            break
-          end
-        end
-        if !matched
-          if @text[@pos] == 10u8
-            @dq << {type: "Text", value: "\n"}
-            @state_stack = ["root"]
-          else
-            @dq << {type: "Error", value: String.new(@text[@pos..@pos])}
-          end
-          @pos += 1
-          break
-        end
-      end
-      self.next
-    end
-
-    # If a token contains a newline, split it into two tokens
-    def split_tokens(tokens : Array(Token)) : Array(Token)
-      split_tokens = [] of Token
-      tokens.each do |token|
-        if token[:value].includes?("\n")
-          values = token[:value].split("\n")
-          values.each_with_index do |value, index|
-            value += "\n" if index < values.size - 1
-            split_tokens << {type: token[:type], value: value}
-          end
-        else
-          split_tokens << token
-        end
-      end
-      split_tokens
-    end
-  end
-
-  abstract class BaseLexer
-    property config = {
-      name:             "",
-      priority:         0.0,
-      case_insensitive: false,
-      dot_all:          false,
-      not_multiline:    false,
-      ensure_nl:        false,
-    }
-    property states = {} of String => State
-
-    def tokenizer(text : String, secondary = false) : BaseTokenizer
-      Tokenizer.new(self, text, secondary)
-    end
-  end
-
  # This implements a lexer for Pygments RegexLexers as expressed
  # in Chroma's XML serialization.
  #
  # For explanations on what actions and states do
  # the Pygments documentation is a good place to start.
  # https://pygments.org/docs/lexerdevelopment/
-  class Lexer < BaseLexer
+  class Lexer
+    property config = {
+      name:             "",
+      aliases:          [] of String,
+      filenames:        [] of String,
+      mime_types:       [] of String,
+      priority:         0.0,
+      case_insensitive: false,
+      dot_all:          false,
+      not_multiline:    false,
+      ensure_nl:        false,
+    }
+    property xml : String = ""
+
+    property states = {} of String => State
+
+    property state_stack = ["root"]
+
+    # Turn the text into a list of tokens. The `usingself` parameter
+    # is true when the lexer is being used to tokenize a string
+    # from a larger text that is already being tokenized.
+    # So, when it's true, we don't modify the text.
+    def tokenize(text, usingself = false) : Array(Token)
+      @state_stack = ["root"]
+      tokens = [] of Token
+      pos = 0
+      matched = false
+
+      # Respect the `ensure_nl` config option
+      if text.size > 0 && text[-1] != '\n' && config[:ensure_nl] && !usingself
+        text += "\n"
+      end
+
+      # Loop through the text, applying rules
+      while pos < text.size
+        state = states[@state_stack.last]
+        # Log.trace { "Stack is #{@state_stack} State is #{state.name}, pos is #{pos}, text is #{text[pos..pos + 10]}" }
+        state.rules.each do |rule|
+          matched, new_pos, new_tokens = rule.match(text, pos, self)
+          if matched
+            # Move position forward, save the tokens,
+            # tokenize from the new position
+            # Log.trace { "MATCHED: #{rule.xml}" }
+            pos = new_pos
+            tokens += new_tokens
+            break
+          end
+          # Log.trace { "NOT MATCHED: #{rule.xml}" }
+        end
+        # If no rule matches, emit an error token
+        unless matched
+          # Log.trace { "Error at #{pos}" }
+          tokens << {type: "Error", value: "#{text[pos]}"}
+          pos += 1
+        end
+      end
+      Lexer.collapse_tokens(tokens)
+    end
+
    # Collapse consecutive tokens of the same type for easier comparison
    # and smaller output
    def self.collapse_tokens(tokens : Array(Tartrazine::Token)) : Array(Tartrazine::Token)
@@ -189,8 +124,34 @@ module Tartrazine
      result
    end

+    # Group tokens into lines, splitting them when a newline is found
+    def group_tokens_in_lines(tokens : Array(Token)) : Array(Array(Token))
+      split_tokens = [] of Token
+      tokens.each do |token|
+        if token[:value].includes?("\n")
+          values = token[:value].split("\n")
+          values.each_with_index do |value, index|
+            value += "\n" if index < values.size - 1
+            split_tokens << {type: token[:type], value: value}
+          end
+        else
+          split_tokens << token
+        end
+      end
+      lines = [Array(Token).new]
+      split_tokens.each do |token|
+        lines.last << token
+        if token[:value].includes?("\n")
+          lines << Array(Token).new
+        end
+      end
+      lines
+    end
+
+    # ameba:disable Metrics/CyclomaticComplexity
    def self.from_xml(xml : String) : Lexer
      l = Lexer.new
+      l.xml = xml
      lexer = XML.parse(xml).first_element_child
      if lexer
        config = lexer.children.find { |node|
@@ -199,6 +160,9 @@ module Tartrazine
        if config
          l.config = {
            name:             xml_to_s(config, name) || "",
+            aliases:          xml_to_a(config, _alias) || [] of String,
+            filenames:        xml_to_a(config, filename) || [] of String,
+            mime_types:       xml_to_a(config, mime_type) || [] of String,
            priority:         xml_to_f(config, priority) || 0.0,
            not_multiline:    xml_to_s(config, not_multiline) == "true",
            dot_all:          xml_to_s(config, dot_all) == "true",
@@ -248,66 +212,12 @@ module Tartrazine
    end
  end

-  # A lexer that takes two lexers as arguments. A root lexer
-  # and a language lexer. Everything is scalled using the
-  # language lexer, afterwards all `Other` tokens are lexed
-  # using the root lexer.
-  #
-  # This is useful for things like template languages, where
-  # you have Jinja + HTML or Jinja + CSS and so on.
-  class DelegatingLexer < BaseLexer
-    property language_lexer : BaseLexer
-    property root_lexer : BaseLexer
-
-    def initialize(@language_lexer : BaseLexer, @root_lexer : BaseLexer)
-    end
-
-    def tokenizer(text : String, secondary = false) : DelegatingTokenizer
-      DelegatingTokenizer.new(self, text, secondary)
-    end
-  end
-
-  # This Tokenizer works with a DelegatingLexer. It first tokenizes
-  # using the language lexer, and "Other" tokens are tokenized using
-  # the root lexer.
-  class DelegatingTokenizer < BaseTokenizer
-    include Iterator(Token)
-    @dq = Deque(Token).new
-    @language_tokenizer : BaseTokenizer
-
-    def initialize(@lexer : DelegatingLexer, text : String, secondary = false)
-      # Respect the `ensure_nl` config option
-      if text.size > 0 && text[-1] != '\n' && @lexer.config[:ensure_nl] && !secondary
-        text += "\n"
-      end
-      @language_tokenizer = @lexer.language_lexer.tokenizer(text, true)
-    end
-
-    def next : Iterator::Stop | Token
-      if @dq.size > 0
-        return @dq.shift
-      end
-      token = @language_tokenizer.next
-      if token.is_a? Iterator::Stop
-        return stop
-      elsif token.as(Token).[:type] == "Other"
-        root_tokenizer = @lexer.root_lexer.tokenizer(token.as(Token).[:value], true)
-        root_tokenizer.each do |root_token|
-          @dq << root_token
-        end
-      else
-        @dq << token.as(Token)
-      end
-      self.next
-    end
-  end
-
  # A Lexer state. A state has a name and a list of rules.
  # The state machine has a state stack containing references
  # to states to decide which rules to apply.
-  struct State
+  class State
    property name : String = ""
-    property rules = [] of BaseRule
+    property rules = [] of Rule

    def +(other : State)
      new_state = State.new
@@ -316,4 +226,7 @@ module Tartrazine
      new_state
    end
  end
+
+  # A token, the output of the tokenizer
+  alias Token = NamedTuple(type: String, value: String)
 end
--- a/src/main.cr
+++ b/src/main.cr
@@ -20,8 +20,7 @@ Usage:
 Options:
  -f <formatter>      Format to use (html, terminal, json)
  -t <theme>          Theme to use, see --list-themes [default: default-dark]
-  -l <lexer>          Lexer (language) to use, see --list-lexers. Use more than
-                      one lexer with "+" (e.g. jinja+yaml) [default: autodetect]
+  -l <lexer>          Lexer (language) to use, see --list-lexers [default: autodetect]
  -o <output>         Output file. Default is stdout.
  --standalone        Generate a standalone HTML file, which includes
                      all style information. If not given, it will generate just
@@ -55,8 +54,6 @@ if options["--list-formatters"]
  exit 0
 end

-theme = Tartrazine.theme(options["-t"].as(String))
-
 if options["-f"]
  formatter = options["-f"].as(String)
  case formatter
@@ -64,11 +61,9 @@ if options["-f"]
    formatter = Tartrazine::Html.new
    formatter.standalone = options["--standalone"] != nil
    formatter.line_numbers = options["--line-numbers"] != nil
-    formatter.theme = theme
  when "terminal"
    formatter = Tartrazine::Ansi.new
    formatter.line_numbers = options["--line-numbers"] != nil
-    formatter.theme = theme
  when "json"
    formatter = Tartrazine::Json.new
  else
@@ -76,9 +71,11 @@ if options["-f"]
    exit 1
  end

+  theme = Tartrazine.theme(options["-t"].as(String))
+
  if formatter.is_a?(Tartrazine::Html) && options["--css"]
    File.open("#{options["-t"].as(String)}.css", "w") do |outf|
-      outf << formatter.style_defs
+      outf.puts formatter.get_style_defs(theme)
    end
    exit 0
  end
@@ -86,12 +83,13 @@ if options["-f"]
  lexer = Tartrazine.lexer(name: options["-l"].as(String), filename: options["FILE"].as(String))

  input = File.open(options["FILE"].as(String)).gets_to_end
+  output = formatter.format(input, lexer, theme)

  if options["-o"].nil?
-    outf = STDOUT
+    puts output
  else
-    outf = File.open(options["-o"].as(String), "w")
+    File.open(options["-o"].as(String), "w") do |outf|
+      outf.puts output
+    end
  end
-  formatter.format(input, lexer, outf)
-  outf.close
 end
--- a/src/onigmo.cr
+++ b/src/onigmo.cr
@@ -0,0 +1,85 @@
+@[Link("onigmo")]
+@[Link(ldflags: "#{__DIR__}/onigmo/onigwrap.o")]
+
+lib LibOnigmo
+  type Regex = Pointer(Void)
+  type Region = Pointer(Void)
+
+  fun create = onigwrap_create(pattern : LibC::Char*, len : UInt32,
+                               ignoreCase : Int32,
+                               multiline : Int32,
+                               dotall : Int32) : Regex
+  fun free = onigwrap_free(re : Regex)
+  fun region_free = onigwrap_region_free(region : Region)
+
+  fun search = onigwrap_search(re : Regex, str : LibC::Char*, offset : UInt32, length : UInt32) : Region
+  fun num_regs = onigwrap_num_regs(region : Region) : Int32
+  fun pos = onigwrap_pos(region : Region, index : Int32) : Int32
+  fun len = onigwrap_len(region : Region, index : Int32) : Int32
+end
+
+module Onigmo
+  class Match
+    property begin : Int32
+    property end : Int32
+    property value : String
+
+    def initialize(@begin, @end, @value)
+    end
+
+    def to_s
+      @value
+    end
+  end
+
+  class Regex
+    def initialize(@pattern : String, @ignorecase = false, @multiline = false, @dotall = false)
+      @re = LibOnigmo.create(@pattern.to_unsafe, @pattern.bytesize, @ignorecase ? 1 : 0, @multiline ? 1 : 0, @dotall ? 1 : 0)
+    end
+
+    def finalize
+      LibOnigmo.free(@re)
+    end
+
+    def match(str : String, offset = 0)
+      # The offset argument is a character index, but Onigmo expects a byte index
+      offset = str.char_index_to_byte_index(offset)
+      if offset.nil?
+        raise Exception.new "Invalid offset"
+      end
+
+      region = LibOnigmo.search(@re, str.to_unsafe, offset, str.bytesize)
+      result = [] of Match?
+      num_regs = LibOnigmo.num_regs(region)
+      if num_regs > 0
+        (0...num_regs).each do |i|
+          pos = LibOnigmo.pos(region, i)
+          l = LibOnigmo.len(region, i)
+          if pos == -1 || l == -1
+            result << nil
+          else
+            b = str.byte_index_to_char_index(pos)
+            e = str.byte_index_to_char_index(pos + l)
+            # p! pos, l, b, e, str[pos..]
+            if b.nil? || e.nil?
+              raise Exception.new "Invalid substring"
+            end
+
+            v = str[b...e]
+            result << Match.new(b, b + v.size, v)
+          end
+        end
+      else
+        return [] of Match
+      end
+      LibOnigmo.region_free(region)
+      result
+    end
+  end
+end
+
+# pattern = "\\w"
+# str = "α"
+
+# re = Onigmo::Regex.new(pattern, false, false, false)
+# p! re.match(str)
--- a/src/onigmo/onigwrap.c
+++ b/src/onigmo/onigwrap.c
@@ -0,0 +1,94 @@
+#include "onigmo.h"
+
+regex_t *onigwrap_create(char *pattern, int len, int ignoreCase, int multiline, int dotall)
+{
+	regex_t *reg;
+
+	OnigErrorInfo einfo;
+
+	OnigOptionType onigOptions = ONIG_OPTION_DEFAULT;
+
+	if (ignoreCase == 1)
+		onigOptions |= ONIG_OPTION_IGNORECASE;
+
+	if (multiline == 1)
+		onigOptions |= ONIG_OPTION_NEGATE_SINGLELINE;
+
+	if (dotall == 1)
+		onigOptions |= ONIG_OPTION_DOTALL;
+
+	OnigUChar *stringStart = (OnigUChar*) pattern;
+	OnigUChar *stringEnd   = (OnigUChar*) pattern + len;
+	int res = onig_new(&reg, stringStart, stringEnd, onigOptions, ONIG_ENCODING_UTF8, ONIG_SYNTAX_PYTHON, &einfo);
+
+	return reg;
+}
+
+void onigwrap_region_free(OnigRegion *region)	
+{
+	onig_region_free(region, 1);
+}
+
+void onigwrap_free(regex_t *reg)
+{
+	onig_free(reg);
+}
+
+int onigwrap_index_in(regex_t *reg, char *charPtr, int offset, int length)
+{
+	OnigUChar *stringStart  = (OnigUChar*) charPtr;
+	OnigUChar *stringEnd    = (OnigUChar*) (charPtr + length);
+	OnigUChar *stringOffset = (OnigUChar*) (charPtr + offset);
+	OnigUChar *stringRange  = (OnigUChar*) stringEnd;
+
+	OnigRegion *region = onig_region_new();
+	int result = onig_search(reg, stringStart, stringEnd, stringOffset, stringRange, region, ONIG_OPTION_NONE);
+	onig_region_free(region, 1);
+
+	if (result >= 0)
+		return result >> 1;
+	if (result == ONIG_MISMATCH)
+		return -1;
+	return -2;
+}
+
+OnigRegion *onigwrap_search(regex_t *reg, char *charPtr, int offset, int length)
+{
+	OnigUChar *stringStart  = (OnigUChar*) charPtr;
+	OnigUChar *stringEnd    = (OnigUChar*) (charPtr + length);
+	OnigUChar *stringOffset = (OnigUChar*) (charPtr + offset);
+	OnigUChar *stringRange  = (OnigUChar*) stringEnd;
+
+	OnigRegion *region = onig_region_new();
+
+	int result = onig_search(reg, stringStart, stringEnd, stringOffset, stringRange, region, ONIG_OPTION_NONE);
+	return region;
+}
+
+int onigwrap_num_regs(OnigRegion *region)
+{
+	return region->num_regs;
+}
+
+int onigwrap_pos(OnigRegion *region, int nth)
+{
+	if (nth < region->num_regs)
+	{
+		int result = region->beg[nth];
+		if (result < 0)
+			return -1;
+		return result;
+	}
+	return -1;
+}
+
+int onigwrap_len(OnigRegion *region, int nth)
+{
+	if (nth < region->num_regs)
+	{
+		int result = region->end[nth] - region->beg[nth];
+		return result;
+	}
+	return -1;
+}
+
--- a/src/onigmo/onigwrap.h
+++ b/src/onigmo/onigwrap.h
@@ -0,0 +1,32 @@
+#include "onigmo.h"
+
+#if defined(_WIN32)
+#define ONIGWRAP_EXTERN extern __declspec(dllexport)
+#else
+#define ONIGWRAP_EXTERN extern
+#endif
+
+ONIGWRAP_EXTERN
+regex_t *onigwrap_create(char *pattern, int len, int ignoreCase, int multiline);
+
+ONIGWRAP_EXTERN
+void onigwrap_region_free(OnigRegion *region);
+
+ONIGWRAP_EXTERN
+void onigwrap_free(regex_t *reg);
+
+ONIGWRAP_EXTERN
+int onigwrap_index_in(regex_t *reg, char *charPtr, int offset, int length);
+
+ONIGWRAP_EXTERN
+OnigRegion *onigwrap_search(regex_t *reg, char *charPtr, int offset, int length);
+
+ONIGWRAP_EXTERN
+int onigwrap_num_regs(OnigRegion *region);
+
+ONIGWRAP_EXTERN
+int onigwrap_pos(OnigRegion *region, int nth);
+
+ONIGWRAP_EXTERN
+int onigwrap_len(OnigRegion *region, int nth);
+
--- a/src/rules.cr
+++ b/src/rules.cr
@@ -1,9 +1,9 @@
 require "./actions"
-require "./bytes_regex"
 require "./formatter"
-require "./lexer"
 require "./rules"
 require "./styles"
+require "./lexer"
+require "./onigmo"

 # These are lexer rules. They match with the text being parsed
 # and perform actions, either emitting tokens or changing the
@@ -11,15 +11,48 @@ require "./styles"
 module Tartrazine
  # This rule matches via a regex pattern

-  alias Regex = BytesRegex::Regex
-  alias Match = BytesRegex::Match
-  alias MatchData = Array(Match)
+  alias Regex = Onigmo::Regex

-  abstract struct BaseRule
-    abstract def match(text : Bytes, pos : Int32, tokenizer : Tokenizer) : Tuple(Bool, Int32, Array(Token))
-    abstract def initialize(node : XML::Node)
+  class Rule
+    property pattern : Regex = Regex.new ""
+    property pattern2 : ::Regex = ::Regex.new ""
+    property actions : Array(Action) = [] of Action
+    property xml : String = "foo"

-    @actions : Array(Action) = [] of Action
+    def match(text, pos, lexer) : Tuple(Bool, Int32, Array(Token))
+      match = pattern.match(text, pos)
+      match2 = pattern2.match(text, pos)
+      # We don't match if the match doesn't move the cursor
+      # because that causes infinite loops
+      # The `match.begin > pos` is the same as the ANCHORED option
+      return false, pos, [] of Token if match.empty? || match[0].nil? || match[0].try { |m| m.begin > pos }
+      # p! match.map(&.to_s), match2, text[pos-1..pos + 20],"----------------------"
+      # Log.trace { "#{match}, #{pattern.inspect}, #{text}, #{pos}" }
+      tokens = [] of Token
+      # Emit the tokens
+      actions.each do |action|
+        # Emit the token
+        tokens += action.emit(match, lexer)
+      end
+      # Log.trace { "#{xml}, #{match[0].end}, #{tokens}" }
+      return true, pos + match[0].as(Onigmo::Match).value.size, tokens
+    end
+
+    def initialize(node : XML::Node, multiline, dotall, ignorecase)
+      @xml = node.to_s
+      pattern = node["pattern"]
+      # flags = Regex::Options::ANCHORED
+      flags = ::Regex::Options::NO_UTF_CHECK
+      # MULTILINE implies DOTALL which we don't want, so we
+      # use in-pattern flag (?m) instead
+      flags |= ::Regex::Options::MULTILINE if multiline
+      pattern = "(?m)" + pattern if multiline
+      flags |= ::Regex::Options::DOTALL if dotall
+      flags |= ::Regex::Options::IGNORE_CASE if ignorecase
+      @pattern = Regex.new(pattern, ignorecase, multiline, dotall)
+      @pattern2 = ::Regex.new(pattern, flags)
+      add_actions(node)
+    end

    def add_actions(node : XML::Node)
      node.children.each do |child|
@@ -29,42 +62,23 @@ module Tartrazine
    end
  end

-  struct Rule < BaseRule
-    property pattern : Regex = Regex.new ""
-
-    def match(text : Bytes, pos, tokenizer) : Tuple(Bool, Int32, Array(Token))
-      match = pattern.match(text, pos)
-
-      # No match
-      return false, pos, [] of Token if match.size == 0
-      return true, pos + match[0].size, @actions.flat_map(&.emit(match, tokenizer))
-    end
-
-    def initialize(node : XML::Node)
-    end
-
-    def initialize(node : XML::Node, multiline, dotall, ignorecase)
-      pattern = node["pattern"]
-      pattern = "(?m)" + pattern if multiline
-      @pattern = Regex.new(pattern, multiline, dotall, ignorecase, true)
-      add_actions(node)
-    end
-  end
-
  # This rule includes another state. If any of the rules of the
  # included state matches, this rule matches.
-  struct IncludeStateRule < BaseRule
-    @state : String = ""
+  class IncludeStateRule < Rule
+    property state : String = ""

-    def match(text : Bytes, pos : Int32, tokenizer : Tokenizer) : Tuple(Bool, Int32, Array(Token))
-      tokenizer.@lexer.states[@state].rules.each do |rule|
-        matched, new_pos, new_tokens = rule.match(text, pos, tokenizer)
+    def match(text, pos, lexer) : Tuple(Bool, Int32, Array(Token))
+      Log.trace { "Including state #{state} from #{lexer.state_stack.last}" }
+      lexer.states[state].rules.each do |rule|
+        matched, new_pos, new_tokens = rule.match(text, pos, lexer)
+        Log.trace { "#{xml}, #{new_pos}, #{new_tokens}" } if matched
        return true, new_pos, new_tokens if matched
      end
      return false, pos, [] of Token
    end

    def initialize(node : XML::Node)
+      @xml = node.to_s
      include_node = node.children.find { |child|
        child.name == "include"
      }
@@ -74,14 +88,17 @@ module Tartrazine
  end

  # This rule always matches, unconditionally
-  struct UnconditionalRule < BaseRule
-    NO_MATCH = [] of Match
-
-    def match(text, pos, tokenizer) : Tuple(Bool, Int32, Array(Token))
-      return true, pos, @actions.flat_map(&.emit(NO_MATCH, tokenizer))
+  class UnconditionalRule < Rule
+    def match(text, pos, lexer) : Tuple(Bool, Int32, Array(Token))
+      tokens = [] of Token
+      actions.each do |action|
+        tokens += action.emit(nil, lexer)
+      end
+      return true, pos, tokens
    end

    def initialize(node : XML::Node)
+      @xml = node.to_s
      add_actions(node)
    end
  end
--- a/src/styles.cr
+++ b/src/styles.cr
@@ -9,7 +9,7 @@ require "xml"
 module Tartrazine
  alias Color = Sixteen::Color

-  struct ThemeFiles
+  class ThemeFiles
    extend BakedFileSystem
    bake_folder "../styles", __DIR__
  end
@@ -39,7 +39,7 @@ module Tartrazine
    themes.to_a.sort!
  end

-  struct Style
+  class Style
    # These properties are tri-state.
    # true means it's set
    # false means it's not set
@@ -79,7 +79,7 @@ module Tartrazine
    end
  end

-  struct Theme
+  class Theme
    property name : String = ""

    property styles = {} of String => Style
--- a/src/tartrazine.cr
+++ b/src/tartrazine.cr
@@ -11,7 +11,7 @@ require "xml"

 module Tartrazine
  extend self
-  VERSION = {{ `shards version #{__DIR__}`.chomp.stringify }}
+  VERSION = "0.2.0"

  Log = ::Log.for("tartrazine")
 end
--- a/x2.html
+++ b/x2.html
Author	SHA1	Message	Date
Roberto Alsina	32816eb207	CLose to 100% tests working, but slooooooow	2024-08-13 20:45:46 -03:00
Roberto Alsina	d2b61fdc6c	More tests pass	2024-08-13 20:09:36 -03:00
Roberto Alsina	a704c59fa9	Some tests pass!	2024-08-13 19:19:12 -03:00
Roberto Alsina	2a9e7fde0d	Working onigmo wrapper, but onigmo doesn't support anchored regexes	2024-08-13 14:02:13 -03:00
Roberto Alsina	d49d0969a9	Started binding, ran into things I don't know how to bind	2024-08-12 20:10:50 -03:00