mirror of
https://github.com/ralsina/tartrazine.git
synced 2025-07-01 20:37:08 -03:00
Compare commits
1 Commits
Author | SHA1 | Date | |
---|---|---|---|
10842f7074 |
1
.gitignore
vendored
1
.gitignore
vendored
@ -8,4 +8,3 @@ pygments/
|
||||
shard.lock
|
||||
.vscode/
|
||||
.crystal/
|
||||
venv/
|
||||
|
75
README.md
75
README.md
@ -2,11 +2,36 @@
|
||||
|
||||
Tartrazine is a library to syntax-highlight code. It is
|
||||
a port of [Pygments](https://pygments.org/) to
|
||||
[Crystal](https://crystal-lang.org/).
|
||||
[Crystal](https://crystal-lang.org/). Kind of.
|
||||
|
||||
It also provides a CLI tool which can be used to highlight many things in many styles.
|
||||
The CLI tool can be used to highlight many things in many styles.
|
||||
|
||||
Currently Tartrazine supports 247 languages. and it has 331 themes (63 from Chroma, the rest are base16 themes via
|
||||
# A port of what? Why "kind of"?
|
||||
|
||||
Pygments is a staple of the Python ecosystem, and it's great.
|
||||
It lets you highlight code in many languages, and it has many
|
||||
themes. Chroma is "Pygments for Go", it's actually a port of
|
||||
Pygments to Go, and it's great too.
|
||||
|
||||
I wanted that in Crystal, so I started this project. But I did
|
||||
not read much of the Pygments code. Or much of Chroma's.
|
||||
|
||||
Chroma has taken most of the Pygments lexers and turned them into
|
||||
XML descriptions. What I did was take those XML files from Chroma
|
||||
and a pile of test cases from Pygments, and I slapped them together
|
||||
until the tests passed and my code produced the same output as
|
||||
Chroma. Think of it as *extreme TDD*.
|
||||
|
||||
Currently the pass rate for tests in the supported languages
|
||||
is `96.8%`, which is *not bad for a couple days hacking*.
|
||||
|
||||
This only covers the RegexLexers, which are the most common ones,
|
||||
but it means the supported languages are a subset of Chroma's, which
|
||||
is a subset of Pygments'.
|
||||
|
||||
Currently Tartrazine supports ... 241 languages.
|
||||
|
||||
It has 331 themes (63 from Chroma, the rest are base16 themes via
|
||||
[Sixteen](https://github.com/ralsina/sixteen)
|
||||
|
||||
## Installation
|
||||
@ -24,17 +49,9 @@ To build from source:
|
||||
|
||||
## Usage as a CLI tool
|
||||
|
||||
Show a syntax highlighted version of a C source file in your terminal:
|
||||
|
||||
```shell
|
||||
$ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers -f terminal
|
||||
```
|
||||
|
||||
Generate a standalone HTML file from a C source file with the syntax highlighted:
|
||||
|
||||
```shell
|
||||
$ tartrazine whatever.c -t catppuccin-macchiato --line-numbers \
|
||||
--standalone -f html -o whatever.html
|
||||
$ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers \
|
||||
--standalone -o whatever.html
|
||||
```
|
||||
|
||||
## Usage as a Library
|
||||
@ -46,9 +63,7 @@ require "tartrazine"
|
||||
|
||||
lexer = Tartrazine.lexer("crystal")
|
||||
theme = Tartrazine.theme("catppuccin-macchiato")
|
||||
formatter = Tartrazine::Html.new
|
||||
formatter.theme = theme
|
||||
puts formatter.format(File.read(ARGV[0]), lexer)
|
||||
puts Tartrazine::Html.new.format(File.read(ARGV[0]), lexer, theme)
|
||||
```
|
||||
|
||||
## Contributing
|
||||
@ -61,30 +76,4 @@ puts formatter.format(File.read(ARGV[0]), lexer)
|
||||
|
||||
## Contributors
|
||||
|
||||
- [Roberto Alsina](https://github.com/ralsina) - creator and maintainer
|
||||
|
||||
## A port of what? Why "kind of"?
|
||||
|
||||
Pygments is a staple of the Python ecosystem, and it's great.
|
||||
It lets you highlight code in many languages, and it has many
|
||||
themes. Chroma is "Pygments for Go", it's actually a port of
|
||||
Pygments to Go, and it's great too.
|
||||
|
||||
I wanted that in Crystal, so I started this project. But I did
|
||||
not read much of the Pygments code. Or much of Chroma's.
|
||||
|
||||
Chroma has taken most of the Pygments lexers and turned them into
|
||||
XML descriptions. What I did was take those XML files from Chroma
|
||||
and a pile of test cases from Pygments, and I slapped them together
|
||||
until the tests passed and my code produced the same output as
|
||||
Chroma. Think of it as [*extreme TDD*](https://ralsina.me/weblog/posts/tartrazine-reimplementing-pygments.html)
|
||||
|
||||
Currently the pass rate for tests in the supported languages
|
||||
is `96.8%`, which is *not bad for a couple days hacking*.
|
||||
|
||||
This only covers the RegexLexers, which are the most common ones,
|
||||
but it means the supported languages are a subset of Chroma's, which
|
||||
is a subset of Pygments' and DelegatingLexers (useful for things like template languages)
|
||||
|
||||
Then performance was bad, so I hacked and hacked and made it
|
||||
significantly [faster than chroma](https://ralsina.me/weblog/posts/a-tale-of-optimization.html) which is fun.
|
||||
- [Roberto Alsina](https://github.com/ralsina) - creator and maintainer
|
7
TODO.md
7
TODO.md
@ -8,8 +8,5 @@
|
||||
* ✅ Implement lexer loader that respects aliases
|
||||
* ✅ Implement lexer loader by file extension
|
||||
* ✅ Add --line-numbers to terminal formatter
|
||||
* ✅ Implement lexer loader by mime type
|
||||
* ✅ Implement Delegating lexers
|
||||
* ✅ Add RstLexer
|
||||
* Add Mako template lexer
|
||||
* ✅ Implement heuristic lexer detection
|
||||
* Implement lexer loader by mime type
|
||||
* Implement Delegating lexers
|
@ -1,22 +0,0 @@
|
||||
Copyright (c) 2017 GitHub, Inc.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person
|
||||
obtaining a copy of this software and associated documentation
|
||||
files (the "Software"), to deal in the Software without
|
||||
restriction, including without limitation the rights to use,
|
||||
copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the
|
||||
Software is furnished to do so, subject to the following
|
||||
conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be
|
||||
included in all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
||||
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
||||
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
||||
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
||||
OTHER DEALINGS IN THE SOFTWARE.
|
@ -1,130 +0,0 @@
|
||||
|
||||
<lexer>
|
||||
<config>
|
||||
<name>liquid</name>
|
||||
<alias>liquid</alias>
|
||||
<filename>*.liquid</filename>
|
||||
</config>
|
||||
<rules>
|
||||
<state name="root">
|
||||
<rule pattern="[^{]+"><token type="Text"/></rule>
|
||||
<rule pattern="(\{%)(\s*)"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/></bygroups><push state="tag-or-block"/></rule>
|
||||
<rule pattern="(\{\{)(\s*)([^\s}]+)"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><usingself state="generic"/></bygroups><push state="output"/></rule>
|
||||
<rule pattern="\{"><token type="Text"/></rule>
|
||||
</state>
|
||||
<state name="tag-or-block">
|
||||
<rule pattern="(if|unless|elsif|case)(?=\s+)"><token type="KeywordReserved"/><push state="condition"/></rule>
|
||||
<rule pattern="(when)(\s+)"><bygroups><token type="KeywordReserved"/><token type="TextWhitespace"/></bygroups><combined state="end-of-block" state="whitespace" state="generic"/></rule>
|
||||
<rule pattern="(else)(\s*)(%\})"><bygroups><token type="KeywordReserved"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
|
||||
<rule pattern="(capture)(\s+)([^\s%]+)(\s*)(%\})"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><usingself state="variable"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
|
||||
<rule pattern="(comment)(\s*)(%\})"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><push state="comment"/></rule>
|
||||
<rule pattern="(raw)(\s*)(%\})"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><push state="raw"/></rule>
|
||||
<rule pattern="(end(case|unless|if))(\s*)(%\})"><bygroups><token type="KeywordReserved"/>None<token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
|
||||
<rule pattern="(end([^\s%]+))(\s*)(%\})"><bygroups><token type="NameTag"/>None<token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
|
||||
<rule pattern="(cycle)(\s+)(?:([^\s:]*)(:))?(\s*)"><bygroups><token type="NameTag"/><token type="TextWhitespace"/><usingself state="generic"/><token type="Punctuation"/><token type="TextWhitespace"/></bygroups><push state="variable-tag-markup"/></rule>
|
||||
<rule pattern="([^\s%]+)(\s*)"><bygroups><token type="NameTag"/><token type="TextWhitespace"/></bygroups><push state="tag-markup"/></rule>
|
||||
</state>
|
||||
<state name="output">
|
||||
<rule><include state="whitespace"/></rule>
|
||||
<rule pattern="\}\}"><token type="Punctuation"/><pop depth="1"/></rule>
|
||||
<rule pattern="\|"><token type="Punctuation"/><push state="filters"/></rule>
|
||||
</state>
|
||||
<state name="filters">
|
||||
<rule><include state="whitespace"/></rule>
|
||||
<rule pattern="\}\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
|
||||
<rule pattern="([^\s|:]+)(:?)(\s*)"><bygroups><token type="NameFunction"/><token type="Punctuation"/><token type="TextWhitespace"/></bygroups><push state="filter-markup"/></rule>
|
||||
</state>
|
||||
<state name="filter-markup">
|
||||
<rule pattern="\|"><token type="Punctuation"/><pop depth="1"/></rule>
|
||||
<rule><include state="end-of-tag"/></rule>
|
||||
<rule><include state="default-param-markup"/></rule>
|
||||
</state>
|
||||
<state name="condition">
|
||||
<rule><include state="end-of-block"/></rule>
|
||||
<rule><include state="whitespace"/></rule>
|
||||
<rule pattern="([^\s=!><]+)(\s*)([=!><]=?)(\s*)(\S+)(\s*)(%\})"><bygroups><usingself state="generic"/><token type="TextWhitespace"/><token type="Operator"/><token type="TextWhitespace"/><usingself state="generic"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups></rule>
|
||||
<rule pattern="\b!"><token type="Operator"/></rule>
|
||||
<rule pattern="\bnot\b"><token type="OperatorWord"/></rule>
|
||||
<rule pattern="([\w.\'"]+)(\s+)(contains)(\s+)([\w.\'"]+)"><bygroups><usingself state="generic"/><token type="TextWhitespace"/><token type="OperatorWord"/><token type="TextWhitespace"/><usingself state="generic"/></bygroups></rule>
|
||||
<rule><include state="generic"/></rule>
|
||||
<rule><include state="whitespace"/></rule>
|
||||
</state>
|
||||
<state name="generic-value">
|
||||
<rule><include state="generic"/></rule>
|
||||
<rule><include state="end-at-whitespace"/></rule>
|
||||
</state>
|
||||
<state name="operator">
|
||||
<rule pattern="(\s*)((=|!|>|<)=?)(\s*)"><bygroups><token type="TextWhitespace"/><token type="Operator"/>None<token type="TextWhitespace"/></bygroups><pop depth="1"/></rule>
|
||||
<rule pattern="(\s*)(\bcontains\b)(\s*)"><bygroups><token type="TextWhitespace"/><token type="OperatorWord"/><token type="TextWhitespace"/></bygroups><pop depth="1"/></rule>
|
||||
</state>
|
||||
<state name="end-of-tag">
|
||||
<rule pattern="\}\}"><token type="Punctuation"/><pop depth="1"/></rule>
|
||||
</state>
|
||||
<state name="end-of-block">
|
||||
<rule pattern="%\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
|
||||
</state>
|
||||
<state name="end-at-whitespace">
|
||||
<rule pattern="\s+"><token type="TextWhitespace"/><pop depth="1"/></rule>
|
||||
</state>
|
||||
<state name="param-markup">
|
||||
<rule><include state="whitespace"/></rule>
|
||||
<rule pattern="([^\s=:]+)(\s*)(=|:)"><bygroups><token type="NameAttribute"/><token type="TextWhitespace"/><token type="Operator"/></bygroups></rule>
|
||||
<rule pattern="(\{\{)(\s*)([^\s}])(\s*)(\}\})"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><usingself state="variable"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups></rule>
|
||||
<rule><include state="string"/></rule>
|
||||
<rule><include state="number"/></rule>
|
||||
<rule><include state="keyword"/></rule>
|
||||
<rule pattern=","><token type="Punctuation"/></rule>
|
||||
</state>
|
||||
<state name="default-param-markup">
|
||||
<rule><include state="param-markup"/></rule>
|
||||
<rule pattern="."><token type="Text"/></rule>
|
||||
</state>
|
||||
<state name="variable-param-markup">
|
||||
<rule><include state="param-markup"/></rule>
|
||||
<rule><include state="variable"/></rule>
|
||||
<rule pattern="."><token type="Text"/></rule>
|
||||
</state>
|
||||
<state name="tag-markup">
|
||||
<rule pattern="%\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
|
||||
<rule><include state="default-param-markup"/></rule>
|
||||
</state>
|
||||
<state name="variable-tag-markup">
|
||||
<rule pattern="%\}"><token type="Punctuation"/><push state="#pop" state="#pop"/></rule>
|
||||
<rule><include state="variable-param-markup"/></rule>
|
||||
</state>
|
||||
<state name="keyword">
|
||||
<rule pattern="\b(false|true)\b"><token type="KeywordConstant"/></rule>
|
||||
</state>
|
||||
<state name="variable">
|
||||
<rule pattern="[a-zA-Z_]\w*"><token type="NameVariable"/></rule>
|
||||
<rule pattern="(?<=\w)\.(?=\w)"><token type="Punctuation"/></rule>
|
||||
</state>
|
||||
<state name="string">
|
||||
<rule pattern="'[^']*'"><token type="LiteralStringSingle"/></rule>
|
||||
<rule pattern=""[^"]*""><token type="LiteralStringDouble"/></rule>
|
||||
</state>
|
||||
<state name="number">
|
||||
<rule pattern="\d+\.\d+"><token type="LiteralNumberFloat"/></rule>
|
||||
<rule pattern="\d+"><token type="LiteralNumberInteger"/></rule>
|
||||
</state>
|
||||
<state name="generic">
|
||||
<rule><include state="keyword"/></rule>
|
||||
<rule><include state="string"/></rule>
|
||||
<rule><include state="number"/></rule>
|
||||
<rule><include state="variable"/></rule>
|
||||
</state>
|
||||
<state name="whitespace">
|
||||
<rule pattern="[ \t]+"><token type="TextWhitespace"/></rule>
|
||||
</state>
|
||||
<state name="comment">
|
||||
<rule pattern="(\{%)(\s*)(endcomment)(\s*)(%\})"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><push state="#pop" state="#pop"/></rule>
|
||||
<rule pattern="."><token type="Comment"/></rule>
|
||||
</state>
|
||||
<state name="raw">
|
||||
<rule pattern="[^{]+"><token type="Text"/></rule>
|
||||
<rule pattern="(\{%)(\s*)(endraw)(\s*)(%\})"><bygroups><token type="Punctuation"/><token type="TextWhitespace"/><token type="NameTag"/><token type="TextWhitespace"/><token type="Punctuation"/></bygroups><pop depth="1"/></rule>
|
||||
<rule pattern="\{"><token type="Text"/></rule>
|
||||
</state>
|
||||
</rules>
|
||||
</lexer>
|
||||
|
@ -1,55 +0,0 @@
|
||||
|
||||
<lexer>
|
||||
<config>
|
||||
<name>Velocity</name>
|
||||
<alias>velocity</alias>
|
||||
<filename>*.vm</filename>
|
||||
<filename>*.fhtml</filename>
|
||||
<dot_all>true</dot_all>
|
||||
</config>
|
||||
<rules>
|
||||
<state name="root">
|
||||
<rule pattern="[^{#$]+"><token type="Other"/></rule>
|
||||
<rule pattern="(#)(\*.*?\*)(#)"><bygroups><token type="CommentPreproc"/><token type="Comment"/><token type="CommentPreproc"/></bygroups></rule>
|
||||
<rule pattern="(##)(.*?$)"><bygroups><token type="CommentPreproc"/><token type="Comment"/></bygroups></rule>
|
||||
<rule pattern="(#\{?)([a-zA-Z_]\w*)(\}?)(\s?\()"><bygroups><token type="CommentPreproc"/><token type="NameFunction"/><token type="CommentPreproc"/><token type="Punctuation"/></bygroups><push state="directiveparams"/></rule>
|
||||
<rule pattern="(#\{?)([a-zA-Z_]\w*)(\}|\b)"><bygroups><token type="CommentPreproc"/><token type="NameFunction"/><token type="CommentPreproc"/></bygroups></rule>
|
||||
<rule pattern="\$!?\{?"><token type="Punctuation"/><push state="variable"/></rule>
|
||||
</state>
|
||||
<state name="variable">
|
||||
<rule pattern="[a-zA-Z_]\w*"><token type="NameVariable"/></rule>
|
||||
<rule pattern="\("><token type="Punctuation"/><push state="funcparams"/></rule>
|
||||
<rule pattern="(\.)([a-zA-Z_]\w*)"><bygroups><token type="Punctuation"/><token type="NameVariable"/></bygroups><push/></rule>
|
||||
<rule pattern="\}"><token type="Punctuation"/><pop depth="1"/></rule>
|
||||
<rule><pop depth="1"/></rule>
|
||||
</state>
|
||||
<state name="directiveparams">
|
||||
<rule pattern="(&&|\|\||==?|!=?|[-<>+*%&|^/])|\b(eq|ne|gt|lt|ge|le|not|in)\b"><token type="Operator"/></rule>
|
||||
<rule pattern="\["><token type="Operator"/><push state="rangeoperator"/></rule>
|
||||
<rule pattern="\b[a-zA-Z_]\w*\b"><token type="NameFunction"/></rule>
|
||||
<rule><include state="funcparams"/></rule>
|
||||
</state>
|
||||
<state name="rangeoperator">
|
||||
<rule pattern="\.\."><token type="Operator"/></rule>
|
||||
<rule><include state="funcparams"/></rule>
|
||||
<rule pattern="\]"><token type="Operator"/><pop depth="1"/></rule>
|
||||
</state>
|
||||
<state name="funcparams">
|
||||
<rule pattern="\$!?\{?"><token type="Punctuation"/><push state="variable"/></rule>
|
||||
<rule pattern="\s+"><token type="Text"/></rule>
|
||||
<rule pattern="[,:]"><token type="Punctuation"/></rule>
|
||||
<rule pattern=""(\\\\|\\[^\\]|[^"\\])*""><token type="LiteralStringDouble"/></rule>
|
||||
<rule pattern="'(\\\\|\\[^\\]|[^'\\])*'"><token type="LiteralStringSingle"/></rule>
|
||||
<rule pattern="0[xX][0-9a-fA-F]+[Ll]?"><token type="LiteralNumber"/></rule>
|
||||
<rule pattern="\b[0-9]+\b"><token type="LiteralNumber"/></rule>
|
||||
<rule pattern="(true|false|null)\b"><token type="KeywordConstant"/></rule>
|
||||
<rule pattern="\("><token type="Punctuation"/><push/></rule>
|
||||
<rule pattern="\)"><token type="Punctuation"/><pop depth="1"/></rule>
|
||||
<rule pattern="\{"><token type="Punctuation"/><push/></rule>
|
||||
<rule pattern="\}"><token type="Punctuation"/><pop depth="1"/></rule>
|
||||
<rule pattern="\["><token type="Punctuation"/><push/></rule>
|
||||
<rule pattern="\]"><token type="Punctuation"/><pop depth="1"/></rule>
|
||||
</state>
|
||||
</rules>
|
||||
</lexer>
|
||||
|
@ -1,22 +0,0 @@
|
||||
|
||||
<lexer>
|
||||
<config>
|
||||
<name>BBCode</name>
|
||||
<alias>bbcode</alias>
|
||||
<mime_type>text/x-bbcode</mime_type>
|
||||
</config>
|
||||
<rules>
|
||||
<state name="root">
|
||||
<rule pattern="[^[]+"><token type="Text"/></rule>
|
||||
<rule pattern="\[/?\w+"><token type="Keyword"/><push state="tag"/></rule>
|
||||
<rule pattern="\["><token type="Text"/></rule>
|
||||
</state>
|
||||
<state name="tag">
|
||||
<rule pattern="\s+"><token type="Text"/></rule>
|
||||
<rule pattern="(\w+)(=)("?[^\s"\]]+"?)"><bygroups><token type="NameAttribute"/><token type="Operator"/><token type="LiteralString"/></bygroups></rule>
|
||||
<rule pattern="(=)("?[^\s"\]]+"?)"><bygroups><token type="Operator"/><token type="LiteralString"/></bygroups></rule>
|
||||
<rule pattern="\]"><token type="Keyword"/><pop depth="1"/></rule>
|
||||
</state>
|
||||
</rules>
|
||||
</lexer>
|
||||
|
@ -3,7 +3,6 @@
|
||||
<name>Groff</name>
|
||||
<alias>groff</alias>
|
||||
<alias>nroff</alias>
|
||||
<alias>roff</alias>
|
||||
<alias>man</alias>
|
||||
<filename>*.[1-9]</filename>
|
||||
<filename>*.1p</filename>
|
||||
@ -88,4 +87,4 @@
|
||||
</rule>
|
||||
</state>
|
||||
</rules>
|
||||
</lexer>
|
||||
</lexer>
|
@ -1,913 +0,0 @@
|
||||
# A collection of simple regexp-based rules that can be applied to content
|
||||
# to disambiguate languages with the same file extension.
|
||||
#
|
||||
# There are two top-level keys: disambiguations and named_patterns.
|
||||
#
|
||||
# disambiguations - a list of disambiguation rules, one for each
|
||||
# extension or group of extensions.
|
||||
# extensions - an array of file extensions that this block applies to.
|
||||
# rules - list of rules that are applied in order to the content
|
||||
# of a file with a matching extension. Rules are evaluated
|
||||
# until one of them matches. If none matches, no language
|
||||
# is returned.
|
||||
# language - Language to be returned if the rule matches.
|
||||
# pattern - Ruby-compatible regular expression that makes the rule
|
||||
# match. If no pattern is specified, the rule always matches.
|
||||
# Pattern can be a string with a single regular expression
|
||||
# or an array of strings that will be merged in a single
|
||||
# regular expression (with union).
|
||||
# and - An and block merges multiple rules and checks that all of
|
||||
# them must match.
|
||||
# negative_pattern - Same as pattern, but checks for absence of matches.
|
||||
# named_pattern - A pattern can be reused by specifying it in the
|
||||
# named_patterns section and referencing it here by its
|
||||
# key.
|
||||
# named_patterns - Key-value map of reusable named patterns.
|
||||
#
|
||||
# Please keep this list alphabetized.
|
||||
#
|
||||
---
|
||||
disambiguations:
|
||||
- extensions: ['.1', '.2', '.3', '.4', '.5', '.6', '.7', '.8', '.9']
|
||||
rules:
|
||||
- language: man
|
||||
and:
|
||||
- named_pattern: mdoc-date
|
||||
- named_pattern: mdoc-title
|
||||
- named_pattern: mdoc-heading
|
||||
- language: man
|
||||
and:
|
||||
- named_pattern: man-title
|
||||
- named_pattern: man-heading
|
||||
- language: Roff
|
||||
pattern: '^\.(?:[A-Za-z]{2}(?:\s|$)|\\")'
|
||||
- extensions: ['.1in', '.1m', '.1x', '.3in', '.3m', '.3p', '.3pm', '.3qt', '.3x', '.man', '.mdoc']
|
||||
rules:
|
||||
- language: man
|
||||
and:
|
||||
- named_pattern: mdoc-date
|
||||
- named_pattern: mdoc-title
|
||||
- named_pattern: mdoc-heading
|
||||
- language: man
|
||||
and:
|
||||
- named_pattern: man-title
|
||||
- named_pattern: man-heading
|
||||
- language: Roff
|
||||
- extensions: ['.al']
|
||||
rules:
|
||||
# AL pattern source from https://github.com/microsoft/AL/blob/master/grammar/alsyntax.tmlanguage - keyword.other.applicationobject.al
|
||||
- language: AL
|
||||
and:
|
||||
- pattern: '\b(?i:(CODEUNIT|PAGE|PAGEEXTENSION|PAGECUSTOMIZATION|DOTNET|ENUM|ENUMEXTENSION|VALUE|QUERY|REPORT|TABLE|TABLEEXTENSION|XMLPORT|PROFILE|CONTROLADDIN|REPORTEXTENSION|INTERFACE|PERMISSIONSET|PERMISSIONSETEXTENSION|ENTITLEMENT))\b'
|
||||
# Open-ended fallback to Perl AutoLoader
|
||||
- language: Perl
|
||||
- extensions: ['.app']
|
||||
rules:
|
||||
- language: Erlang
|
||||
pattern: '^\{\s*(?:application|''application'')\s*,\s*(?:[a-z]+[\w@]*|''[^'']+'')\s*,\s*\[(?:.|[\r\n])*\]\s*\}\.[ \t]*$'
|
||||
- extensions: ['.as']
|
||||
rules:
|
||||
- language: ActionScript
|
||||
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:\{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+\s*:\s*[\w<>.]+\s*(,\s*\w+\s*:\s*[\w<>.]+\s*)*)?\)))'
|
||||
- extensions: ['.asc']
|
||||
rules:
|
||||
- language: Public Key
|
||||
pattern: '^(----[- ]BEGIN|ssh-(rsa|dss)) '
|
||||
- language: AsciiDoc
|
||||
pattern: '^[=-]+\s|\{\{[A-Za-z]'
|
||||
- language: AGS Script
|
||||
pattern: '^(\/\/.+|((import|export)\s+)?(function|int|float|char)\s+((room|repeatedly|on|game)_)?([A-Za-z]+[A-Za-z_0-9]+)\s*[;\(])'
|
||||
- extensions: ['.asm']
|
||||
rules:
|
||||
- language: Motorola 68K Assembly
|
||||
named_pattern: m68k
|
||||
- extensions: ['.asy']
|
||||
rules:
|
||||
- language: LTspice Symbol
|
||||
pattern: '^SymbolType[ \t]'
|
||||
- language: Asymptote
|
||||
- extensions: ['.bas']
|
||||
rules:
|
||||
- language: FreeBasic
|
||||
pattern: '^[ \t]*#(?i)(?:define|endif|endmacro|ifn?def|include|lang|macro)(?:$|\s)'
|
||||
- language: BASIC
|
||||
pattern: '\A\s*\d'
|
||||
- language: VBA
|
||||
and:
|
||||
- named_pattern: vb-module
|
||||
- named_pattern: vba
|
||||
- language: Visual Basic 6.0
|
||||
named_pattern: vb-module
|
||||
- extensions: ['.bb']
|
||||
rules:
|
||||
- language: BlitzBasic
|
||||
pattern: '(<^\s*; |End Function)'
|
||||
- language: BitBake
|
||||
pattern: '^(# |include|require|inherit)\b'
|
||||
- language: Clojure
|
||||
pattern: '\((def|defn|defmacro|let)\s'
|
||||
- extensions: ['.bf']
|
||||
rules:
|
||||
- language: Beef
|
||||
pattern: '(?-m)^\s*using\s+(System|Beefy)(\.(.*))?;\s*$'
|
||||
- language: HyPhy
|
||||
pattern:
|
||||
- '(?-m)^\s*#include\s+".*";\s*$'
|
||||
- '\sfprintf\s*\('
|
||||
- language: Brainfuck
|
||||
pattern: '(>\+>|>\+<)'
|
||||
- extensions: ['.bi']
|
||||
rules:
|
||||
- language: FreeBasic
|
||||
pattern: '^[ \t]*#(?i)(?:define|endif|endmacro|ifn?def|if|include|lang|macro)(?:$|\s)'
|
||||
- extensions: ['.bs']
|
||||
rules:
|
||||
- language: Bikeshed
|
||||
pattern: '^(?i:<pre\s+class)\s*=\s*(''|\"|\b)metadata\b\1[^>\r\n]*>'
|
||||
- language: BrighterScript
|
||||
pattern:
|
||||
- (?i:^\s*(?=^sub\s)(?:sub\s*\w+\(.*?\))|(?::\s*sub\(.*?\))$)
|
||||
- (?i:^\s*(end\ssub)$)
|
||||
- (?i:^\s*(?=^function\s)(?:function\s*\w+\(.*?\)\s*as\s*\w*)|(?::\s*function\(.*?\)\s*as\s*\w*)$)
|
||||
- (?i:^\s*(end\sfunction)$)
|
||||
- language: Bluespec BH
|
||||
pattern: '^package\s+[A-Za-z_][A-Za-z0-9_'']*(?:\s*\(|\s+where)'
|
||||
- extensions: ['.builds']
|
||||
rules:
|
||||
- language: XML
|
||||
pattern: '^(\s*)(?i:<Project|<Import|<Property|<?xml|xmlns)'
|
||||
- extensions: ['.ch']
|
||||
rules:
|
||||
- language: xBase
|
||||
pattern: '^\s*#\s*(?i:if|ifdef|ifndef|define|command|xcommand|translate|xtranslate|include|pragma|undef)\b'
|
||||
- extensions: ['.cl']
|
||||
rules:
|
||||
- language: Common Lisp
|
||||
pattern: '^\s*\((?i:defun|in-package|defpackage) '
|
||||
- language: Cool
|
||||
pattern: '^class'
|
||||
- language: OpenCL
|
||||
pattern: '\/\* |\/\/ |^\}'
|
||||
- extensions: ['.cls']
|
||||
rules:
|
||||
- language: Visual Basic 6.0
|
||||
and:
|
||||
- named_pattern: vb-class
|
||||
- pattern: '^\s*BEGIN(?:\r?\n|\r)\s*MultiUse\s*=.*(?:\r?\n|\r)\s*Persistable\s*='
|
||||
- language: VBA
|
||||
named_pattern: vb-class
|
||||
- language: TeX
|
||||
pattern: '^\s*\\(?:NeedsTeXFormat|ProvidesClass)\{'
|
||||
- language: ObjectScript
|
||||
pattern: '^Class\s'
|
||||
- extensions: ['.cmp']
|
||||
rules:
|
||||
- language: Gerber Image
|
||||
pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
|
||||
- extensions: ['.cs']
|
||||
rules:
|
||||
- language: Smalltalk
|
||||
pattern: '![\w\s]+methodsFor: '
|
||||
- language: 'C#'
|
||||
pattern: '^\s*(using\s+[A-Z][\s\w.]+;|namespace\s*[\w\.]+\s*(\{|;)|\/\/)'
|
||||
- extensions: ['.csc']
|
||||
rules:
|
||||
- language: GSC
|
||||
named_pattern: gsc
|
||||
- extensions: ['.csl']
|
||||
rules:
|
||||
- language: XML
|
||||
pattern: '(?i:^\s*(<\?xml|xmlns))'
|
||||
- language: Kusto
|
||||
pattern: '(^\|\s*(where|extend|project|limit|summarize))|(^\.\w+)'
|
||||
- extensions: ['.d']
|
||||
rules:
|
||||
- language: D
|
||||
# see http://dlang.org/spec/grammar
|
||||
# ModuleDeclaration | ImportDeclaration | FuncDeclaration | unittest
|
||||
pattern: '^module\s+[\w.]*\s*;|import\s+[\w\s,.:]*;|\w+\s+\w+\s*\(.*\)(?:\(.*\))?\s*\{[^}]*\}|unittest\s*(?:\(.*\))?\s*\{[^}]*\}'
|
||||
- language: DTrace
|
||||
# see http://dtrace.org/guide/chp-prog.html, http://dtrace.org/guide/chp-profile.html, http://dtrace.org/guide/chp-opt.html
|
||||
pattern: '^(\w+:\w*:\w*:\w*|BEGIN|END|provider\s+|(tick|profile)-\w+\s+\{[^}]*\}|#pragma\s+D\s+(option|attributes|depends_on)\s|#pragma\s+ident\s)'
|
||||
- language: Makefile
|
||||
# path/target : dependency \
|
||||
# target : \
|
||||
# : dependency
|
||||
# path/file.ext1 : some/path/../file.ext2
|
||||
pattern: '([\/\\].*:\s+.*\s\\$|: \\$|^[ %]:|^[\w\s\/\\.]+\w+\.\w+\s*:\s+[\w\s\/\\.]+\w+\.\w+)'
|
||||
- extensions: ['.dsp']
|
||||
rules:
|
||||
- language: Microsoft Developer Studio Project
|
||||
pattern: '# Microsoft Developer Studio Generated Build File'
|
||||
- language: Faust
|
||||
pattern: '\bprocess\s*[(=]|\b(library|import)\s*\(\s*"|\bdeclare\s+(name|version|author|copyright|license)\s+"'
|
||||
- extensions: ['.e']
|
||||
rules:
|
||||
- language: E
|
||||
pattern:
|
||||
- '^\s*(def|var)\s+(.+):='
|
||||
- '^\s*(def|to)\s+(\w+)(\(.+\))?\s+\{'
|
||||
- '^\s*(when)\s+(\(.+\))\s+->\s+\{'
|
||||
- language: Eiffel
|
||||
pattern:
|
||||
- '^\s*\w+\s*(?:,\s*\w+)*[:]\s*\w+\s'
|
||||
- '^\s*\w+\s*(?:\(\s*\w+[:][^)]+\))?(?:[:]\s*\w+)?(?:--.+\s+)*\s+(?:do|local)\s'
|
||||
- '^\s*(?:across|deferred|elseif|ensure|feature|from|inherit|inspect|invariant|note|once|require|undefine|variant|when)\s*$'
|
||||
- language: Euphoria
|
||||
named_pattern: euphoria
|
||||
- extensions: ['.ecl']
|
||||
rules:
|
||||
- language: ECLiPSe
|
||||
pattern: '^[^#]+:-'
|
||||
- language: ECL
|
||||
pattern: ':='
|
||||
- extensions: ['.es']
|
||||
rules:
|
||||
- language: Erlang
|
||||
pattern: '^\s*(?:%%|main\s*\(.*?\)\s*->)'
|
||||
- language: JavaScript
|
||||
pattern: '\/\/|("|'')use strict\1|export\s+default\s|\/\*(?:.|[\r\n])*?\*\/'
|
||||
- extensions: ['.ex']
|
||||
rules:
|
||||
- language: Elixir
|
||||
pattern:
|
||||
- '^\s*@moduledoc\s'
|
||||
- '^\s*(?:cond|import|quote|unless)\s'
|
||||
- '^\s*def(?:exception|impl|macro|module|protocol)[(\s]'
|
||||
- language: Euphoria
|
||||
named_pattern: euphoria
|
||||
- extensions: ['.f']
|
||||
rules:
|
||||
- language: Forth
|
||||
pattern: '^: '
|
||||
- language: Filebench WML
|
||||
pattern: 'flowop'
|
||||
- language: Fortran
|
||||
named_pattern: fortran
|
||||
- extensions: ['.for']
|
||||
rules:
|
||||
- language: Forth
|
||||
pattern: '^: '
|
||||
- language: Fortran
|
||||
named_pattern: fortran
|
||||
- extensions: ['.fr']
|
||||
rules:
|
||||
- language: Forth
|
||||
pattern: '^(: |also |new-device|previous )'
|
||||
- language: Frege
|
||||
pattern: '^\s*(import|module|package|data|type) '
|
||||
- language: Text
|
||||
- extensions: ['.frm']
|
||||
rules:
|
||||
- language: VBA
|
||||
and:
|
||||
- named_pattern: vb-form
|
||||
- pattern: '^\s*Begin\s+\{[0-9A-Z\-]*\}\s?'
|
||||
- language: Visual Basic 6.0
|
||||
and:
|
||||
- named_pattern: vb-form
|
||||
- pattern: '^\s*Begin\s+VB\.Form\s+'
|
||||
- extensions: ['.fs']
|
||||
rules:
|
||||
- language: Forth
|
||||
pattern: '^(: |new-device)'
|
||||
- language: 'F#'
|
||||
pattern: '^\s*(#light|import|let|module|namespace|open|type)'
|
||||
- language: GLSL
|
||||
pattern: '^\s*(#version|precision|uniform|varying|vec[234])'
|
||||
- language: Filterscript
|
||||
pattern: '#include|#pragma\s+(rs|version)|__attribute__'
|
||||
- extensions: ['.ftl']
|
||||
rules:
|
||||
- language: FreeMarker
|
||||
pattern: '^(?:<|[a-zA-Z-][a-zA-Z0-9_-]+[ \t]+\w)|\$\{\w+[^\r\n]*?\}|^[ \t]*(?:<#--.*?-->|<#([a-z]+)(?=\s|>)[^>]*>.*?</#\1>|\[#--.*?--\]|\[#([a-z]+)(?=\s|\])[^\]]*\].*?\[#\2\])'
|
||||
- language: Fluent
|
||||
pattern: '^-?[a-zA-Z][a-zA-Z0-9_-]* *=|\{\$-?[a-zA-Z][-\w]*(?:\.[a-zA-Z][-\w]*)?\}'
|
||||
- extensions: ['.g']
|
||||
rules:
|
||||
- language: GAP
|
||||
pattern: '\s*(Declare|BindGlobal|KeyDependentOperation|Install(Method|GlobalFunction)|SetPackageInfo)'
|
||||
- language: G-code
|
||||
pattern: '^[MG][0-9]+(?:\r?\n|\r)'
|
||||
- extensions: ['.gd']
|
||||
rules:
|
||||
- language: GAP
|
||||
pattern: '\s*(Declare|BindGlobal|KeyDependentOperation)'
|
||||
- language: GDScript
|
||||
pattern: '\s*(extends|var|const|enum|func|class|signal|tool|yield|assert|onready)'
|
||||
- extensions: ['.gml']
|
||||
rules:
|
||||
- language: XML
|
||||
pattern: '(?i:^\s*(<\?xml|xmlns))'
|
||||
- language: Graph Modeling Language
|
||||
pattern: '(?i:^\s*(graph|node)\s+\[$)'
|
||||
- language: Gerber Image
|
||||
pattern: '^[DGMT][0-9]{2}\*$'
|
||||
- language: Game Maker Language
|
||||
- extensions: ['.gs']
|
||||
rules:
|
||||
- language: GLSL
|
||||
pattern: '^#version\s+[0-9]+\b'
|
||||
- language: Gosu
|
||||
pattern: '^uses (java|gw)\.'
|
||||
- language: Genie
|
||||
pattern: '^\[indent=[0-9]+\]'
|
||||
- extensions: ['.gsc']
|
||||
rules:
|
||||
- language: GSC
|
||||
named_pattern: gsc
|
||||
- extensions: ['.gsh']
|
||||
rules:
|
||||
- language: GSC
|
||||
named_pattern: gsc
|
||||
- extensions: ['.gts']
|
||||
rules:
|
||||
- language: Gerber Image
|
||||
pattern: '^G0.'
|
||||
- language: Glimmer TS
|
||||
negative_pattern: '^G0.'
|
||||
- extensions: ['.h']
|
||||
rules:
|
||||
- language: Objective-C
|
||||
named_pattern: objectivec
|
||||
- language: C++
|
||||
named_pattern: cpp
|
||||
- language: C
|
||||
- extensions: ['.hh']
|
||||
rules:
|
||||
- language: Hack
|
||||
pattern: '<\?hh'
|
||||
- extensions: ['.html']
|
||||
rules:
|
||||
- language: Ecmarkup
|
||||
pattern: '<emu-(?:alg|annex|biblio|clause|eqn|example|figure|gann|gmod|gprose|grammar|intro|not-ref|note|nt|prodref|production|rhs|table|t|xref)(?:$|\s|>)'
|
||||
- language: HTML
|
||||
- extensions: ['.i']
|
||||
rules:
|
||||
- language: Motorola 68K Assembly
|
||||
named_pattern: m68k
|
||||
- language: SWIG
|
||||
pattern: '^[ \t]*%[a-z_]+\b|^%[{}]$'
|
||||
- extensions: ['.ice']
|
||||
rules:
|
||||
- language: JSON
|
||||
pattern: '\A\s*[{\[]'
|
||||
- language: Slice
|
||||
- extensions: ['.inc']
|
||||
rules:
|
||||
- language: Motorola 68K Assembly
|
||||
named_pattern: m68k
|
||||
- language: PHP
|
||||
pattern: '^<\?(?:php)?'
|
||||
- language: SourcePawn
|
||||
pattern:
|
||||
- '^public\s+(?:SharedPlugin(?:\s+|:)__pl_\w+\s*=(?:\s*\{)?|(?:void\s+)?__pl_\w+_SetNTVOptional\(\)(?:\s*\{)?)'
|
||||
- '^methodmap\s+\w+\s+<\s+\w+'
|
||||
- '^\s*MarkNativeAsOptional\s*\('
|
||||
- language: NASL
|
||||
pattern:
|
||||
- '^\s*include\s*\(\s*(?:"|'')[\\/\w\-\.:\s]+\.(?:nasl|inc)\s*(?:"|'')\s*\)\s*;'
|
||||
- '^\s*(?:global|local)_var\s+(?:\w+(?:\s*=\s*[\w\-"'']+)?\s*)(?:,\s*\w+(?:\s*=\s*[\w\-"'']+)?\s*)*+\s*;'
|
||||
- '^\s*namespace\s+\w+\s*\{'
|
||||
- '^\s*object\s+\w+\s*(?:extends\s+\w+(?:::\w+)?)?\s*\{'
|
||||
- '^\s*(?:public\s+|private\s+|\s*)function\s+\w+\s*\([\w\s,]*\)\s*\{'
|
||||
- language: POV-Ray SDL
|
||||
pattern: '^\s*#(declare|local|macro|while)\s'
|
||||
- language: Pascal
|
||||
pattern:
|
||||
- '(?i:^\s*\{\$(?:mode|ifdef|undef|define)[ ]+[a-z0-9_]+\})'
|
||||
- '^\s*end[.;]\s*$'
|
||||
- language: BitBake
|
||||
pattern: '^inherit(\s+[\w.-]+)+\s*$'
|
||||
- extensions: ['.json']
|
||||
rules:
|
||||
- language: OASv2-json
|
||||
pattern: '"swagger":\s?"2.[0-9.]+"'
|
||||
- language: OASv3-json
|
||||
pattern: '"openapi":\s?"3.[0-9.]+"'
|
||||
- language: JSON
|
||||
- extensions: ['.l']
|
||||
rules:
|
||||
- language: Common Lisp
|
||||
pattern: '\(def(un|macro)\s'
|
||||
- language: Lex
|
||||
pattern: '^(%[%{}]xs|<.*>)'
|
||||
- language: Roff
|
||||
pattern: '^\.[A-Za-z]{2}(\s|$)'
|
||||
- language: PicoLisp
|
||||
pattern: '^\((de|class|rel|code|data|must)\s'
|
||||
- extensions: ['.lean']
|
||||
rules:
|
||||
- language: Lean
|
||||
pattern: '^import [a-z]'
|
||||
- language: Lean 4
|
||||
pattern: '^import [A-Z]'
|
||||
- extensions: ['.ls']
|
||||
rules:
|
||||
- language: LoomScript
|
||||
pattern: '^\s*package\s*[\w\.\/\*\s]*\s*\{'
|
||||
- language: LiveScript
|
||||
- extensions: ['.lsp', '.lisp']
|
||||
rules:
|
||||
- language: Common Lisp
|
||||
pattern: '^\s*\((?i:defun|in-package|defpackage) '
|
||||
- language: NewLisp
|
||||
pattern: '^\s*\(define '
|
||||
- extensions: ['.m']
|
||||
rules:
|
||||
- language: Objective-C
|
||||
named_pattern: objectivec
|
||||
- language: Mercury
|
||||
pattern: ':- module'
|
||||
- language: MUF
|
||||
pattern: '^: '
|
||||
- language: M
|
||||
pattern: '^\s*;'
|
||||
- language: Mathematica
|
||||
and:
|
||||
- pattern: '\(\*'
|
||||
- pattern: '\*\)$'
|
||||
- language: MATLAB
|
||||
pattern: '^\s*%'
|
||||
- language: Limbo
|
||||
pattern: '^\w+\s*:\s*module\s*\{'
|
||||
- extensions: ['.m4']
|
||||
rules:
|
||||
- language: M4Sugar
|
||||
pattern:
|
||||
- 'AC_DEFUN|AC_PREREQ|AC_INIT'
|
||||
- '^_?m4_'
|
||||
- language: 'M4'
|
||||
- extensions: ['.mask']
|
||||
rules:
|
||||
- language: Unity3D Asset
|
||||
pattern: 'tag:unity3d.com'
|
||||
- extensions: ['.mc']
|
||||
rules:
|
||||
- language: Win32 Message File
|
||||
pattern: '(?i)^[ \t]*(?>\/\*\s*)?MessageId=|^\.$'
|
||||
- language: M4
|
||||
pattern: '^dnl|^divert\((?:-?\d+)?\)|^\w+\(`[^\r\n]*?''[),]'
|
||||
- language: Monkey C
|
||||
pattern: '\b(?:using|module|function|class|var)\s+\w'
|
||||
- extensions: ['.md']
|
||||
rules:
|
||||
- language: Markdown
|
||||
pattern:
|
||||
- '(^[-A-Za-z0-9=#!\*\[|>])|<\/'
|
||||
- '\A\z'
|
||||
- language: GCC Machine Description
|
||||
pattern: '^(;;|\(define_)'
|
||||
- language: Markdown
|
||||
- extensions: ['.ml']
|
||||
rules:
|
||||
- language: OCaml
|
||||
pattern: '(^\s*module)|let rec |match\s+(\S+\s)+with'
|
||||
- language: Standard ML
|
||||
pattern: '=> |case\s+(\S+\s)+of'
|
||||
- extensions: ['.mod']
|
||||
rules:
|
||||
- language: XML
|
||||
pattern: '<!ENTITY '
|
||||
- language: NMODL
|
||||
pattern: '\b(NEURON|INITIAL|UNITS)\b'
|
||||
- language: Modula-2
|
||||
pattern: '^\s*(?i:MODULE|END) [\w\.]+;'
|
||||
- language: [Linux Kernel Module, AMPL]
|
||||
- extensions: ['.mojo']
|
||||
rules:
|
||||
- language: Mojo
|
||||
pattern: '^\s*(alias|def|from|fn|import|struct|trait)\s'
|
||||
- language: XML
|
||||
pattern: '^\s*<\?xml'
|
||||
- extensions: ['.ms']
|
||||
rules:
|
||||
- language: Roff
|
||||
pattern: '^[.''][A-Za-z]{2}(\s|$)'
|
||||
- language: Unix Assembly
|
||||
and:
|
||||
- negative_pattern: '/\*'
|
||||
- pattern: '^\s*\.(?:include\s|globa?l\s|[A-Za-z][_A-Za-z0-9]*:)'
|
||||
- language: MAXScript
|
||||
- extensions: ['.n']
|
||||
rules:
|
||||
- language: Roff
|
||||
pattern: '^[.'']'
|
||||
- language: Nemerle
|
||||
pattern: '^(module|namespace|using)\s'
|
||||
- extensions: ['.ncl']
|
||||
rules:
|
||||
- language: XML
|
||||
pattern: '^\s*<\?xml\s+version'
|
||||
- language: Gerber Image
|
||||
pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
|
||||
- language: Text
|
||||
pattern: 'THE_TITLE'
|
||||
- extensions: ['.nl']
|
||||
rules:
|
||||
- language: NL
|
||||
pattern: '^(b|g)[0-9]+ '
|
||||
- language: NewLisp
|
||||
- extensions: ['.nu']
|
||||
rules:
|
||||
- language: Nushell
|
||||
pattern: '^\s*(import|export|module|def|let|let-env) '
|
||||
- language: Nu
|
||||
- extensions: ['.odin']
|
||||
rules:
|
||||
- language: Object Data Instance Notation
|
||||
pattern: '(?:^|<)\s*[A-Za-z0-9_]+\s*=\s*<'
|
||||
- language: Odin
|
||||
pattern: 'package\s+\w+|\b(?:im|ex)port\s*"[\w:./]+"|\w+\s*::\s*(?:proc|struct)\s*\(|^\s*//\s'
|
||||
- extensions: ['.p']
|
||||
rules:
|
||||
- language: Gnuplot
|
||||
pattern:
|
||||
- '^s?plot\b'
|
||||
- '^set\s+(term|terminal|out|output|[xy]tics|[xy]label|[xy]range|style)\b'
|
||||
- language: OpenEdge ABL
|
||||
- extensions: ['.php']
|
||||
rules:
|
||||
- language: Hack
|
||||
pattern: '<\?hh'
|
||||
- language: PHP
|
||||
pattern: '<\?[^h]'
|
||||
- extensions: ['.pkl']
|
||||
rules:
|
||||
- language: Pkl
|
||||
pattern:
|
||||
- '^\s*(module|import|amends|extends|local|const|fixed|abstract|open|class|typealias|@\w+)\b'
|
||||
- '^\s*[a-zA-Z0-9_$]+\s*(=|{|:)|^\s*`[^`]+`\s*(=|{|:)|for\s*\(|when\s*\('
|
||||
- language: Pickle
|
||||
- extensions: ['.pl']
|
||||
rules:
|
||||
- language: Prolog
|
||||
pattern: '^[^#]*:-'
|
||||
- language: Perl
|
||||
and:
|
||||
- negative_pattern: '^\s*use\s+v6\b'
|
||||
- named_pattern: perl
|
||||
- language: Raku
|
||||
named_pattern: raku
|
||||
- extensions: ['.plist']
|
||||
rules:
|
||||
- language: XML Property List
|
||||
pattern: '^\s*(?:<\?xml\s|<!DOCTYPE\s+plist|<plist(?:\s+version\s*=\s*(["''])\d+(?:\.\d+)?\1)?\s*>\s*$)'
|
||||
- language: OpenStep Property List
|
||||
- extensions: ['.plt']
|
||||
rules:
|
||||
- language: Prolog
|
||||
pattern: '^\s*:-'
|
||||
- extensions: ['.pm']
|
||||
rules:
|
||||
- language: Perl
|
||||
and:
|
||||
- negative_pattern: '^\s*use\s+v6\b'
|
||||
- named_pattern: perl
|
||||
- language: Raku
|
||||
named_pattern: raku
|
||||
- language: X PixMap
|
||||
pattern: '^\s*\/\* XPM \*\/'
|
||||
- extensions: ['.pod']
|
||||
rules:
|
||||
- language: Pod 6
|
||||
pattern: '^[\s&&[^\r\n]]*=(comment|begin pod|begin para|item\d+)'
|
||||
- language: Pod
|
||||
- extensions: ['.pp']
|
||||
rules:
|
||||
- language: Pascal
|
||||
pattern: '^\s*end[.;]'
|
||||
- language: Puppet
|
||||
pattern: '^\s+\w+\s+=>\s'
|
||||
- extensions: ['.pro']
|
||||
rules:
|
||||
- language: Proguard
|
||||
pattern: '^-(include\b.*\.pro$|keep\b|keepclassmembers\b|keepattributes\b)'
|
||||
- language: Prolog
|
||||
pattern: '^[^\[#]+:-'
|
||||
- language: INI
|
||||
pattern: 'last_client='
|
||||
- language: QMake
|
||||
and:
|
||||
- pattern: HEADERS
|
||||
- pattern: SOURCES
|
||||
- language: IDL
|
||||
pattern: '^\s*(?i:function|pro|compile_opt) \w[ \w,:]*$'
|
||||
- extensions: ['.properties']
|
||||
rules:
|
||||
- language: INI
|
||||
and:
|
||||
- named_pattern: key_equals_value
|
||||
- pattern: '^[;\[]'
|
||||
- language: Java Properties
|
||||
and:
|
||||
- named_pattern: key_equals_value
|
||||
- pattern: '^[#!]'
|
||||
- language: INI
|
||||
named_pattern: key_equals_value
|
||||
- language: Java Properties
|
||||
pattern: '^[^#!][^:]*:'
|
||||
- extensions: ['.q']
|
||||
rules:
|
||||
- language: q
|
||||
pattern: '((?i:[A-Z.][\w.]*:\{)|^\\(cd?|d|l|p|ts?) )'
|
||||
- language: HiveQL
|
||||
pattern: '(?i:SELECT\s+[\w*,]+\s+FROM|(CREATE|ALTER|DROP)\s(DATABASE|SCHEMA|TABLE))'
|
||||
- extensions: ['.qs']
|
||||
rules:
|
||||
- language: Q#
|
||||
pattern: '^((\/{2,3})?\s*(namespace|operation)\b)'
|
||||
- language: Qt Script
|
||||
pattern: '(\w+\.prototype\.\w+|===|\bvar\b)'
|
||||
- extensions: ['.r']
|
||||
rules:
|
||||
- language: Rebol
|
||||
pattern: '(?i:\bRebol\b)'
|
||||
- language: Rez
|
||||
pattern: '(#include\s+["<](Types\.r|Carbon\/Carbon\.r)[">])|((resource|data|type)\s+''[A-Za-z0-9]{4}''\s+((\(.*\)\s+){0,1}){)'
|
||||
- language: R
|
||||
pattern: '<-|^\s*#'
|
||||
- extensions: ['.re']
|
||||
rules:
|
||||
- language: Reason
|
||||
pattern:
|
||||
- '^\s*module\s+type\s'
|
||||
- '^\s*(?:include|open)\s+\w+\s*;\s*$'
|
||||
- '^\s*let\s+(?:module\s\w+\s*=\s*\{|\w+:\s+.*=.*;\s*$)'
|
||||
- language: C++
|
||||
pattern:
|
||||
- '^\s*#(?:(?:if|ifdef|define|pragma)\s+\w|\s*include\s+<[^>]+>)'
|
||||
- '^\s*template\s*<'
|
||||
- extensions: ['.res']
|
||||
rules:
|
||||
- language: ReScript
|
||||
pattern:
|
||||
- '^\s*(let|module|type)\s+\w*\s+=\s+'
|
||||
- '^\s*(?:include|open)\s+\w+\s*$'
|
||||
- extensions: ['.rno']
|
||||
rules:
|
||||
- language: RUNOFF
|
||||
pattern: '(?i:^\.!|^\f|\f$|^\.end lit(?:eral)?\b|^\.[a-zA-Z].*?;\.[a-zA-Z](?:[; \t])|\^\*[^\s*][^*]*\\\*(?=$|\s)|^\.c;[ \t]*\w+)'
|
||||
- language: Roff
|
||||
pattern: '^\.\\" '
|
||||
- extensions: ['.rpy']
|
||||
rules:
|
||||
- language: Python
|
||||
pattern: '^(import|from|class|def)\s'
|
||||
- language: "Ren'Py"
|
||||
- extensions: ['.rs']
|
||||
rules:
|
||||
- language: Rust
|
||||
pattern: '^(use |fn |mod |pub |macro_rules|impl|#!?\[)'
|
||||
- language: RenderScript
|
||||
pattern: '#include|#pragma\s+(rs|version)|__attribute__'
|
||||
- language: XML
|
||||
pattern: '^\s*<\?xml'
|
||||
- extensions: ['.s']
|
||||
rules:
|
||||
- language: Motorola 68K Assembly
|
||||
named_pattern: m68k
|
||||
- extensions: ['.sc']
|
||||
rules:
|
||||
- language: SuperCollider
|
||||
pattern: '(?i:\^(this|super)\.|^\s*~\w+\s*=\.)'
|
||||
- language: Scala
|
||||
pattern: '(^\s*import (scala|java)\.|^\s*class\b)'
|
||||
- extensions: ['.scd']
|
||||
rules:
|
||||
- language: SuperCollider
|
||||
pattern: '(?i:\^(this|super)\.|^\s*(~\w+\s*=\.|SynthDef\b))'
|
||||
- language: Markdown
|
||||
# Markdown syntax for scdoc
|
||||
pattern: '^#+\s+(NAME|SYNOPSIS|DESCRIPTION)'
|
||||
- extensions: ['.sol']
|
||||
rules:
|
||||
- language: Solidity
|
||||
pattern: '\bpragma\s+solidity\b|\b(?:abstract\s+)?contract\s+(?!\d)[a-zA-Z0-9$_]+(?:\s+is\s+(?:[a-zA-Z0-9$_][^\{]*?)?)?\s*\{'
|
||||
- language: Gerber Image
|
||||
pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
|
||||
- extensions: ['.sql']
|
||||
rules:
|
||||
# Postgres
|
||||
- language: PLpgSQL
|
||||
pattern: '(?i:^\\i\b|AS\s+\$\$|LANGUAGE\s+''?plpgsql''?|BEGIN(\s+WORK)?\s*;)'
|
||||
# IBM db2
|
||||
- language: SQLPL
|
||||
pattern: '(?i:ALTER\s+MODULE|MODE\s+DB2SQL|\bSYS(CAT|PROC)\.|ASSOCIATE\s+RESULT\s+SET|\bEND!\s*$)'
|
||||
# Oracle
|
||||
- language: PLSQL
|
||||
pattern: '(?i:\$\$PLSQL_|XMLTYPE|systimestamp|\.nextval|CONNECT\s+BY|AUTHID\s+(DEFINER|CURRENT_USER)|constructor\W+function)'
|
||||
# T-SQL
|
||||
- language: TSQL
|
||||
pattern: '(?i:^\s*GO\b|BEGIN(\s+TRY|\s+CATCH)|OUTPUT\s+INSERTED|DECLARE\s+@|\[dbo\])'
|
||||
- language: SQL
|
||||
- extensions: ['.srt']
|
||||
rules:
|
||||
- language: SubRip Text
|
||||
pattern: '^(\d{2}:\d{2}:\d{2},\d{3})\s*(-->)\s*(\d{2}:\d{2}:\d{2},\d{3})$'
|
||||
- extensions: ['.st']
|
||||
rules:
|
||||
- language: StringTemplate
|
||||
pattern: '\$\w+[($]|(.)!\s*.+?\s*!\1|<!\s*.+?\s*!>|\[!\s*.+?\s*!\]|\{!\s*.+?\s*!\}'
|
||||
- language: Smalltalk
|
||||
pattern: '\A\s*[\[{(^"''\w#]|[a-zA-Z_]\w*\s*:=\s*[a-zA-Z_]\w*|class\s*>>\s*[a-zA-Z_]\w*|^[a-zA-Z_]\w*\s+[a-zA-Z_]\w*:|^Class\s*\{|if(?:True|False):\s*\['
|
||||
- extensions: ['.star']
|
||||
rules:
|
||||
- language: STAR
|
||||
pattern: '^loop_\s*$'
|
||||
- language: Starlark
|
||||
- extensions: ['.stl']
|
||||
rules:
|
||||
- language: STL
|
||||
pattern: '\A\s*solid(?:$|\s)[\s\S]*^endsolid(?:$|\s)'
|
||||
- extensions: ['.sw']
|
||||
rules:
|
||||
- language: Sway
|
||||
pattern: '^\s*(?:(?:abi|dep|fn|impl|mod|pub|trait)\s|#\[)'
|
||||
- language: XML
|
||||
pattern: '^\s*<\?xml\s+version'
|
||||
- extensions: ['.t']
|
||||
rules:
|
||||
- language: Perl
|
||||
and:
|
||||
- negative_pattern: '^\s*use\s+v6\b'
|
||||
- named_pattern: perl
|
||||
- language: Raku
|
||||
pattern: '^\s*(?:use\s+v6\b|\bmodule\b|\bmy\s+class\b)'
|
||||
- language: Turing
|
||||
pattern: '^\s*%[ \t]+|^\s*var\s+\w+(\s*:\s*\w+)?\s*:=\s*\w+'
|
||||
- extensions: ['.tag']
|
||||
rules:
|
||||
- language: Java Server Pages
|
||||
pattern: '<%[@!=\s]?\s*(taglib|tag|include|attribute|variable)\s'
|
||||
- extensions: ['.tlv']
|
||||
rules:
|
||||
- language: TL-Verilog
|
||||
pattern: '^\\.{0,10}TLV_version'
|
||||
- extensions: ['.toc']
|
||||
rules:
|
||||
- language: World of Warcraft Addon Data
|
||||
pattern: '^## |@no-lib-strip@'
|
||||
- language: TeX
|
||||
pattern: '^\\(contentsline|defcounter|beamer|boolfalse)'
|
||||
- extensions: ['.ts']
|
||||
rules:
|
||||
- language: XML
|
||||
pattern: '<TS\b'
|
||||
- language: TypeScript
|
||||
- extensions: ['.tst']
|
||||
rules:
|
||||
- language: GAP
|
||||
pattern: 'gap> '
|
||||
# Heads up - we don't usually write heuristics like this (with no regex match)
|
||||
- language: Scilab
|
||||
- extensions: ['.tsx']
|
||||
rules:
|
||||
- language: TSX
|
||||
pattern: '^\s*(import.+(from\s+|require\()[''"]react|\/\/\/\s*<reference\s)'
|
||||
- language: XML
|
||||
pattern: '(?i:^\s*<\?xml\s+version)'
|
||||
- extensions: ['.txt']
|
||||
rules:
|
||||
# The following RegExp is simply a collapsed and simplified form of the
|
||||
# VIM_MODELINE pattern in `./lib/linguist/strategy/modeline.rb`.
|
||||
- language: Vim Help File
|
||||
pattern: '(?:(?:^|[ \t])(?:vi|Vi(?=m))(?:m[<=>]?[0-9]+|m)?|[ \t]ex)(?=:(?=[ \t]*set?[ \t][^\r\n:]+:)|:(?![ \t]*set?[ \t]))(?:(?:[ \t]*:[ \t]*|[ \t])\w*(?:[ \t]*=(?:[^\\\s]|\\.)*)?)*[ \t:](?:filetype|ft|syntax)[ \t]*=(help)(?=$|\s|:)'
|
||||
- language: Adblock Filter List
|
||||
pattern: |-
|
||||
(?x)\A
|
||||
\[
|
||||
(?<version>
|
||||
(?:
|
||||
[Aa]d[Bb]lock
|
||||
(?:[ \t][Pp]lus)?
|
||||
|
|
||||
u[Bb]lock
|
||||
(?:[ \t][Oo]rigin)?
|
||||
|
|
||||
[Aa]d[Gg]uard
|
||||
)
|
||||
(?:[ \t] \d+(?:\.\d+)*+)?
|
||||
)
|
||||
(?:
|
||||
[ \t]?;[ \t]?
|
||||
\g<version>
|
||||
)*+
|
||||
\]
|
||||
# HACK: This is a contrived use of heuristics needed to address
|
||||
# an unusual edge-case. See https://git.io/JULye for discussion.
|
||||
- language: Text
|
||||
- extensions: ['.typ']
|
||||
rules:
|
||||
- language: Typst
|
||||
pattern: '^#(import|show|let|set)'
|
||||
- language: XML
|
||||
- extensions: ['.url']
|
||||
rules:
|
||||
- language: INI
|
||||
pattern: '^\[InternetShortcut\](?:\r?\n|\r)(?>[^\s\[][^\r\n]*(?:\r?\n|\r))*URL='
|
||||
- extensions: ['.v']
|
||||
rules:
|
||||
- language: Coq
|
||||
pattern: '(?:^|\s)(?:Proof|Qed)\.(?:$|\s)|(?:^|\s)Require[ \t]+(Import|Export)\s'
|
||||
- language: Verilog
|
||||
pattern: '^[ \t]*module\s+[^\s()]+\s+\#?\(|^[ \t]*`(?:define|ifdef|ifndef|include|timescale)|^[ \t]*always[ \t]+@|^[ \t]*initial[ \t]+(begin|@)'
|
||||
- language: V
|
||||
pattern: '\$(?:if|else)[ \t]|^[ \t]*fn\s+[^\s()]+\(.*?\).*?\{|^[ \t]*for\s*\{'
|
||||
- extensions: ['.vba']
|
||||
rules:
|
||||
- language: Vim Script
|
||||
pattern: '^UseVimball'
|
||||
- language: VBA
|
||||
- extensions: ['.w']
|
||||
rules:
|
||||
- language: OpenEdge ABL
|
||||
pattern: '&ANALYZE-SUSPEND _UIB-CODE-BLOCK _CUSTOM _DEFINITIONS'
|
||||
- language: CWeb
|
||||
pattern: '^@(<|\w+\.)'
|
||||
- extensions: ['.x']
|
||||
rules:
|
||||
- language: DirectX 3D File
|
||||
pattern: '^xof 030(2|3)(?:txt|bin|tzip|bzip)\b'
|
||||
- language: RPC
|
||||
pattern: '\b(program|version)\s+\w+\s*\{|\bunion\s+\w+\s+switch\s*\('
|
||||
- language: Logos
|
||||
pattern: '^%(end|ctor|hook|group)\b'
|
||||
- language: Linker Script
|
||||
pattern: 'OUTPUT_ARCH\(|OUTPUT_FORMAT\(|SECTIONS'
|
||||
- extensions: ['.yaml', '.yml']
|
||||
rules:
|
||||
- language: MiniYAML
|
||||
pattern: '^\t+.*?[^\s:].*?:'
|
||||
negative_pattern: '---'
|
||||
- language: OASv2-yaml
|
||||
pattern: 'swagger:\s?''?"?2.[0-9.]+''?"?'
|
||||
- language: OASv3-yaml
|
||||
pattern: 'openapi:\s?''?"?3.[0-9.]+''?"?'
|
||||
- language: YAML
|
||||
- extensions: ['.yy']
|
||||
rules:
|
||||
- language: JSON
|
||||
pattern: '\"modelName\"\:\s*\"GM'
|
||||
- language: Yacc
|
||||
named_patterns:
|
||||
cpp:
|
||||
- '^\s*#\s*include <(cstdint|string|vector|map|list|array|bitset|queue|stack|forward_list|unordered_map|unordered_set|(i|o|io)stream)>'
|
||||
- '^\s*template\s*<'
|
||||
- '^[ \t]*(try|constexpr)'
|
||||
- '^[ \t]*catch\s*\('
|
||||
- '^[ \t]*(class|(using[ \t]+)?namespace)\s+\w+'
|
||||
- '^[ \t]*(private|public|protected):$'
|
||||
- '__has_cpp_attribute|__cplusplus >'
|
||||
- 'std::\w+'
|
||||
euphoria:
|
||||
- '^\s*namespace\s'
|
||||
- '^\s*(?:public\s+)?include\s'
|
||||
- '^\s*(?:(?:public|export|global)\s+)?(?:atom|constant|enum|function|integer|object|procedure|sequence|type)\s'
|
||||
fortran: '^(?i:[c*][^abd-z]| (subroutine|program|end|data)\s|\s*!)'
|
||||
gsc:
|
||||
- '^\s*#\s*(?:using|insert|include|define|namespace)[ \t]+\w'
|
||||
- '^\s*(?>(?:autoexec|private)\s+){0,2}function\s+(?>(?:autoexec|private)\s+){0,2}\w+\s*\('
|
||||
- '\b(?:level|self)[ \t]+thread[ \t]+(?:\[\[[ \t]*(?>\w+\.)*\w+[ \t]*\]\]|\w+)[ \t]*\([^\r\n\)]*\)[ \t]*;'
|
||||
- '^[ \t]*#[ \t]*(?:precache|using_animtree)[ \t]*\('
|
||||
key_equals_value: '^[^#!;][^=]*='
|
||||
m68k:
|
||||
- '(?im)\bmoveq(?:\.l)?\s+#(?:\$-?[0-9a-f]{1,3}|%[0-1]{1,8}|-?[0-9]{1,3}),\s*d[0-7]\b'
|
||||
- '(?im)^\s*move(?:\.[bwl])?\s+(?:sr|usp),\s*[^\s]+'
|
||||
- '(?im)^\s*move\.[bwl]\s+.*\b[ad]\d'
|
||||
- '(?im)^\s*movem\.[bwl]\b'
|
||||
- '(?im)^\s*move[mp](?:\.[wl])?\b'
|
||||
- '(?im)^\s*btst\b'
|
||||
- '(?im)^\s*dbra\b'
|
||||
man-heading: '^[.''][ \t]*SH +(?:[^"\s]+|"[^"\s]+)'
|
||||
man-title: '^[.''][ \t]*TH +(?:[^"\s]+|"[^"]+") +"?(?:[1-9]|@[^\s@]+@)'
|
||||
mdoc-date: '^[.''][ \t]*Dd +(?:[^"\s]+|"[^"]+")'
|
||||
mdoc-heading: '^[.''][ \t]*Sh +(?:[^"\s]|"[^"]+")'
|
||||
mdoc-title: '^[.''][ \t]*Dt +(?:[^"\s]+|"[^"]+") +"?(?:[1-9]|@[^\s@]+@)'
|
||||
objectivec: '^\s*(@(interface|class|protocol|property|end|synchronised|selector|implementation)\b|#import\s+.+\.h[">])'
|
||||
perl:
|
||||
- '\buse\s+(?:strict\b|v?5\b)'
|
||||
- '^\s*use\s+(?:constant|overload)\b'
|
||||
- '^\s*(?:\*|(?:our\s*)?@)EXPORT\s*='
|
||||
- '^\s*package\s+[^\W\d]\w*(?:::\w+)*\s*(?:[;{]|\sv?\d)'
|
||||
- '[\s$][^\W\d]\w*(?::\w+)*->[a-zA-Z_\[({]'
|
||||
raku: '^\s*(?:use\s+v6\b|\bmodule\b|\b(?:my\s+)?class\b)'
|
||||
vb-class: '^[ ]*VERSION [0-9]\.[0-9] CLASS'
|
||||
vb-form: '^[ ]*VERSION [0-9]\.[0-9]{2}'
|
||||
vb-module: '^[ ]*Attribute VB_Name = '
|
||||
vba:
|
||||
- '\b(?:VBA|[vV]ba)(?:\b|[0-9A-Z_])'
|
||||
# VBA7 new 64-bit features
|
||||
- '^[ ]*(?:Public|Private)? Declare PtrSafe (?:Sub|Function)\b'
|
||||
- '^[ ]*#If Win64\b'
|
||||
- '^[ ]*(?:Dim|Const) [0-9a-zA-Z_]*[ ]*As Long(?:Ptr|Long)\b'
|
||||
# Top module declarations unique to VBA
|
||||
- '^[ ]*Option (?:Private Module|Compare (?:Database|Text|Binary))\b'
|
||||
# General VBA libraries and objects
|
||||
- '(?: |\()(?:Access|Excel|Outlook|PowerPoint|Visio|Word|VBIDE)\.\w'
|
||||
- '\b(?:(?:Active)?VBProjects?|VBComponents?|Application\.(?:VBE|ScreenUpdating))\b'
|
||||
# AutoCAD, Outlook, PowerPoint and Word objects
|
||||
- '\b(?:ThisDrawing|AcadObject|Active(?:Explorer|Inspector|Window\.Presentation|Presentation|Document)|Selection\.(?:Find|Paragraphs))\b'
|
||||
# Excel objects
|
||||
- '\b(?:(?:This|Active)?Workbooks?|Worksheets?|Active(?:Sheet|Chart|Cell)|WorksheetFunction)\b'
|
||||
- '\b(?:Range\(".*|Cells\([0-9a-zA-Z_]*, (?:[0-9a-zA-Z_]*|"[a-zA-Z]{1,3}"))\)'
|
@ -1,56 +0,0 @@
|
||||
|
||||
<lexer>
|
||||
<config>
|
||||
<name>Markdown</name>
|
||||
<alias>markdown</alias>
|
||||
<alias>md</alias>
|
||||
<filename>*.md</filename>
|
||||
<filename>*.markdown</filename>
|
||||
<mime_type>text/x-markdown</mime_type>
|
||||
</config>
|
||||
<rules>
|
||||
<state name="root">
|
||||
<rule pattern="(^#[^#].+)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="(^#{2,6}[^#].+)(\n)"><bygroups><token type="GenericSubheading"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="^(.+)(\n)(=+)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="^(.+)(\n)(-+)(\n)"><bygroups><token type="GenericSubheading"/><token type="Text"/><token type="GenericSubheading"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="^(\s*)([*-] )(\[[ xX]\])( .+\n)"><bygroups><token type="TextWhitespace"/><token type="Keyword"/><token type="Keyword"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^(\s*)([*-])(\s)(.+\n)"><bygroups><token type="TextWhitespace"/><token type="Keyword"/><token type="TextWhitespace"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^(\s*)([0-9]+\.)( .+\n)"><bygroups><token type="TextWhitespace"/><token type="Keyword"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^(\s*>\s)(.+\n)"><bygroups><token type="Keyword"/><token type="GenericEmph"/></bygroups></rule>
|
||||
<rule pattern="^(```\n)([\w\W]*?)(^```$)">
|
||||
<bygroups>
|
||||
<token type="LiteralStringBacktick"/>
|
||||
<token type="Text"/>
|
||||
<token type="LiteralStringBacktick"/>
|
||||
</bygroups>
|
||||
</rule>
|
||||
<rule pattern="^(```)(\w+)(\n)([\w\W]*?)(^```$)">
|
||||
<bygroups>
|
||||
<token type="LiteralStringBacktick"/>
|
||||
<token type="NameLabel"/>
|
||||
<token type="TextWhitespace"/>
|
||||
<UsingByGroup lexer="2" content="4"/>
|
||||
<token type="LiteralStringBacktick"/>
|
||||
</bygroups>
|
||||
</rule>
|
||||
<rule><include state="inline"/></rule>
|
||||
</state>
|
||||
<state name="inline">
|
||||
<rule pattern="\\."><token type="Text"/></rule>
|
||||
<rule pattern="([^`]?)(`[^`\n]+`)"><bygroups><token type="Text"/><token type="LiteralStringBacktick"/></bygroups></rule>
|
||||
<rule pattern="([^\*]?)(\*\*[^* \n][^*\n]*\*\*)"><bygroups><token type="Text"/><token type="GenericStrong"/></bygroups></rule>
|
||||
<rule pattern="([^_]?)(__[^_ \n][^_\n]*__)"><bygroups><token type="Text"/><token type="GenericStrong"/></bygroups></rule>
|
||||
<rule pattern="([^\*]?)(\*[^* \n][^*\n]*\*)"><bygroups><token type="Text"/><token type="GenericEmph"/></bygroups></rule>
|
||||
<rule pattern="([^_]?)(_[^_ \n][^_\n]*_)"><bygroups><token type="Text"/><token type="GenericEmph"/></bygroups></rule>
|
||||
<rule pattern="([^~]?)(~~[^~ \n][^~\n]*~~)"><bygroups><token type="Text"/><token type="GenericDeleted"/></bygroups></rule>
|
||||
<rule pattern="[@#][\w/:]+"><token type="NameEntity"/></rule>
|
||||
<rule pattern="(!?\[)([^]]+)(\])(\()([^)]+)(\))"><bygroups><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="Text"/><token type="NameAttribute"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="(\[)([^]]+)(\])(\[)([^]]*)(\])"><bygroups><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="Text"/><token type="NameLabel"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="^(\s*\[)([^]]*)(\]:\s*)(.+)"><bygroups><token type="Text"/><token type="NameLabel"/><token type="Text"/><token type="NameAttribute"/></bygroups></rule>
|
||||
<rule pattern="[^\\\s]+"><token type="Text"/></rule>
|
||||
<rule pattern="."><token type="Text"/></rule>
|
||||
</state>
|
||||
</rules>
|
||||
</lexer>
|
||||
|
@ -1,34 +0,0 @@
|
||||
|
||||
<lexer>
|
||||
<config>
|
||||
<name>MoinMoin/Trac Wiki markup</name>
|
||||
<alias>trac-wiki</alias>
|
||||
<alias>moin</alias>
|
||||
<mime_type>text/x-trac-wiki</mime_type>
|
||||
<case_insensitive>true</case_insensitive>
|
||||
</config>
|
||||
<rules>
|
||||
<state name="root">
|
||||
<rule pattern="^#.*$"><token type="Comment"/></rule>
|
||||
<rule pattern="(!)(\S+)"><bygroups><token type="Keyword"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="^(=+)([^=]+)(=+)(\s*#.+)?$"><bygroups><token type="GenericHeading"/><usingself state="root"/><token type="GenericHeading"/><token type="LiteralString"/></bygroups></rule>
|
||||
<rule pattern="(\{\{\{)(\n#!.+)?"><bygroups><token type="NameBuiltin"/><token type="NameNamespace"/></bygroups><push state="codeblock"/></rule>
|
||||
<rule pattern="(\'\'\'?|\|\||`|__|~~|\^|,,|::)"><token type="Comment"/></rule>
|
||||
<rule pattern="^( +)([.*-])( )"><bygroups><token type="Text"/><token type="NameBuiltin"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="^( +)([a-z]{1,5}\.)( )"><bygroups><token type="Text"/><token type="NameBuiltin"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="\[\[\w+.*?\]\]"><token type="Keyword"/></rule>
|
||||
<rule pattern="(\[[^\s\]]+)(\s+[^\]]+?)?(\])"><bygroups><token type="Keyword"/><token type="LiteralString"/><token type="Keyword"/></bygroups></rule>
|
||||
<rule pattern="^----+$"><token type="Keyword"/></rule>
|
||||
<rule pattern="[^\n\'\[{!_~^,|]+"><token type="Text"/></rule>
|
||||
<rule pattern="\n"><token type="Text"/></rule>
|
||||
<rule pattern="."><token type="Text"/></rule>
|
||||
</state>
|
||||
<state name="codeblock">
|
||||
<rule pattern="\}\}\}"><token type="NameBuiltin"/><pop depth="1"/></rule>
|
||||
<rule pattern="\{\{\{"><token type="Text"/><push/></rule>
|
||||
<rule pattern="[^{}]+"><token type="CommentPreproc"/></rule>
|
||||
<rule pattern="."><token type="CommentPreproc"/></rule>
|
||||
</state>
|
||||
</rules>
|
||||
</lexer>
|
||||
|
@ -1,76 +0,0 @@
|
||||
|
||||
<lexer>
|
||||
<config>
|
||||
<name>reStructuredText</name>
|
||||
<alias>restructuredtext</alias>
|
||||
<alias>rst</alias>
|
||||
<alias>rest</alias>
|
||||
<filename>*.rst</filename>
|
||||
<filename>*.rest</filename>
|
||||
<mime_type>text/x-rst</mime_type>
|
||||
<mime_type>text/prs.fallenstein.rst</mime_type>
|
||||
</config>
|
||||
<rules>
|
||||
<state name="root">
|
||||
<rule pattern="^(=+|-+|`+|:+|\.+|\'+|"+|~+|\^+|_+|\*+|\++|#+)([ \t]*\n)(.+)(\n)(\1)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="^(\S.*)(\n)(={3,}|-{3,}|`{3,}|:{3,}|\.{3,}|\'{3,}|"{3,}|~{3,}|\^{3,}|_{3,}|\*{3,}|\+{3,}|#{3,})(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="^(\s*)([-*+])( .+\n(?:\1 .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^(\s*)([0-9#ivxlcmIVXLCM]+\.)( .+\n(?:\1 .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^(\s*)(\(?[0-9#ivxlcmIVXLCM]+\))( .+\n(?:\1 .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^(\s*)([A-Z]+\.)( .+\n(?:\1 .+\n)+)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^(\s*)(\(?[A-Za-z]+\))( .+\n(?:\1 .+\n)+)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^(\s*)(\|)( .+\n(?:\| .+\n)*)"><bygroups><token type="Text"/><token type="Operator"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^( *\.\.)(\s*)((?:source)?code(?:-block)?)(::)([ \t]*)([^\n]+)(\n[ \t]*\n)([ \t]+)(.*)(\n)((?:(?:\8.*)?\n)+)">
|
||||
<bygroups>
|
||||
<token type="Punctuation"/>
|
||||
<token type="Text"/>
|
||||
<token type="OperatorWord"/>
|
||||
<token type="Punctuation"/>
|
||||
<token type="Text"/>
|
||||
<token type="Keyword"/>
|
||||
<token type="Text"/>
|
||||
<token type="Text"/>
|
||||
<UsingByGroup lexer="6" content="9,10,11"/>
|
||||
</bygroups>
|
||||
</rule>
|
||||
<rule pattern="^( *\.\.)(\s*)([\w:-]+?)(::)(?:([ \t]*)(.*))">
|
||||
<bygroups>
|
||||
<token type="Punctuation"/>
|
||||
<token type="Text"/>
|
||||
<token type="OperatorWord"/>
|
||||
<token type="Punctuation"/>
|
||||
<token type="Text"/>
|
||||
<usingself state="inline"/>
|
||||
</bygroups>
|
||||
</rule>
|
||||
<rule pattern="^( *\.\.)(\s*)(_(?:[^:\\]|\\.)+:)(.*?)$"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^( *\.\.)(\s*)(\[.+\])(.*?)$"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^( *\.\.)(\s*)(\|.+\|)(\s*)([\w:-]+?)(::)(?:([ \t]*)(.*))"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="OperatorWord"/><token type="Punctuation"/><token type="Text"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="^ *\.\..*(\n( +.*\n|\n)+)?"><token type="Comment"/></rule>
|
||||
<rule pattern="^( *)(:(?:\\\\|\\:|[^:\n])+:(?=\s))([ \t]*)"><bygroups><token type="Text"/><token type="NameClass"/><token type="Text"/></bygroups></rule>
|
||||
<rule pattern="^(\S.*(?<!::)\n)((?:(?: +.*)\n)+)"><bygroups><usingself state="inline"/><usingself state="inline"/></bygroups></rule>
|
||||
<rule pattern="(::)(\n[ \t]*\n)([ \t]+)(.*)(\n)((?:(?:\3.*)?\n)+)"><bygroups><token type="LiteralStringEscape"/><token type="Text"/><token type="LiteralString"/><token type="LiteralString"/><token type="Text"/><token type="LiteralString"/></bygroups></rule>
|
||||
<rule><include state="inline"/></rule>
|
||||
</state>
|
||||
<state name="inline">
|
||||
<rule pattern="\\."><token type="Text"/></rule>
|
||||
<rule pattern="``"><token type="LiteralString"/><push state="literal"/></rule>
|
||||
<rule pattern="(`.+?)(<.+?>)(`__?)"><bygroups><token type="LiteralString"/><token type="LiteralStringInterpol"/><token type="LiteralString"/></bygroups></rule>
|
||||
<rule pattern="`.+?`__?"><token type="LiteralString"/></rule>
|
||||
<rule pattern="(`.+?`)(:[a-zA-Z0-9:-]+?:)?"><bygroups><token type="NameVariable"/><token type="NameAttribute"/></bygroups></rule>
|
||||
<rule pattern="(:[a-zA-Z0-9:-]+?:)(`.+?`)"><bygroups><token type="NameAttribute"/><token type="NameVariable"/></bygroups></rule>
|
||||
<rule pattern="\*\*.+?\*\*"><token type="GenericStrong"/></rule>
|
||||
<rule pattern="\*.+?\*"><token type="GenericEmph"/></rule>
|
||||
<rule pattern="\[.*?\]_"><token type="LiteralString"/></rule>
|
||||
<rule pattern="<.+?>"><token type="NameTag"/></rule>
|
||||
<rule pattern="[^\\\n\[*`:]+"><token type="Text"/></rule>
|
||||
<rule pattern="."><token type="Text"/></rule>
|
||||
</state>
|
||||
<state name="literal">
|
||||
<rule pattern="[^`]+"><token type="LiteralString"/></rule>
|
||||
<rule pattern="``((?=$)|(?=[-/:.,; \n\x00‐‑‒–— '"\)\]\}>’”»!\?]))"><token type="LiteralString"/><pop depth="1"/></rule>
|
||||
<rule pattern="`"><token type="LiteralString"/></rule>
|
||||
</state>
|
||||
</rules>
|
||||
</lexer>
|
||||
|
@ -40,18 +40,15 @@ for fname in glob.glob("lexers/*.xml"):
|
||||
with open("src/constants/lexers.cr", "w") as f:
|
||||
f.write("module Tartrazine\n")
|
||||
f.write(" LEXERS_BY_NAME = {\n")
|
||||
for k in sorted(lexer_by_name.keys()):
|
||||
v = lexer_by_name[k]
|
||||
for k, v in lexer_by_name.items():
|
||||
f.write(f'"{k}" => "{v}", \n')
|
||||
f.write("}\n")
|
||||
f.write(" LEXERS_BY_MIMETYPE = {\n")
|
||||
for k in sorted(lexer_by_mimetype.keys()):
|
||||
v = lexer_by_mimetype[k]
|
||||
for k, v in lexer_by_mimetype.items():
|
||||
f.write(f'"{k}" => "{v}", \n')
|
||||
f.write("}\n")
|
||||
f.write(" LEXERS_BY_FILENAME = {\n")
|
||||
for k in sorted(lexer_by_filename.keys()):
|
||||
v = lexer_by_filename[k]
|
||||
f.write(f'"{k}" => {str(sorted(list(v))).replace("'", "\"")}, \n')
|
||||
for k, v in lexer_by_filename.items():
|
||||
f.write(f'"{k}" => {str(list(v)).replace("'", "\"")}, \n')
|
||||
f.write("}\n")
|
||||
f.write("end\n")
|
||||
|
@ -1,5 +1,5 @@
|
||||
name: tartrazine
|
||||
version: 0.6.1
|
||||
version: 0.5.0
|
||||
|
||||
authors:
|
||||
- Roberto Alsina <roberto.alsina@gmail.com>
|
||||
@ -10,13 +10,11 @@ targets:
|
||||
|
||||
dependencies:
|
||||
baked_file_system:
|
||||
github: ralsina/baked_file_system
|
||||
branch: master
|
||||
github: schovi/baked_file_system
|
||||
base58:
|
||||
github: crystal-china/base58.cr
|
||||
sixteen:
|
||||
github: ralsina/sixteen
|
||||
branch: main
|
||||
docopt:
|
||||
github: chenkovsky/docopt.cr
|
||||
|
||||
|
@ -72,7 +72,8 @@ end
|
||||
|
||||
# Helper that creates lexer and tokenizes
|
||||
def tokenize(lexer_name, text)
|
||||
tokenizer = Tartrazine.lexer(lexer_name).tokenizer(text)
|
||||
lexer = Tartrazine.lexer(lexer_name)
|
||||
tokenizer = Tartrazine::Tokenizer.new(lexer, text)
|
||||
Tartrazine::Lexer.collapse_tokens(tokenizer.to_a)
|
||||
end
|
||||
|
||||
|
@ -16,16 +16,13 @@ module Tartrazine
|
||||
Push
|
||||
Token
|
||||
Using
|
||||
Usingbygroup
|
||||
Usingself
|
||||
end
|
||||
|
||||
struct Action
|
||||
property actions : Array(Action) = [] of Action
|
||||
|
||||
@content_index : Array(Int32) = [] of Int32
|
||||
@depth : Int32 = 0
|
||||
@lexer_index : Int32 = 0
|
||||
@lexer_name : String = ""
|
||||
@states : Array(String) = [] of String
|
||||
@states_to_push : Array(String) = [] of String
|
||||
@ -65,9 +62,6 @@ module Tartrazine
|
||||
@states = xml.attributes.select { |attrib|
|
||||
attrib.name == "state"
|
||||
}.map &.content
|
||||
when ActionType::Usingbygroup
|
||||
@lexer_index = xml["lexer"].to_i
|
||||
@content_index = xml["content"].split(",").map(&.to_i)
|
||||
end
|
||||
end
|
||||
|
||||
@ -121,13 +115,15 @@ module Tartrazine
|
||||
when ActionType::Using
|
||||
# Shunt to another lexer entirely
|
||||
return [] of Token if match.empty?
|
||||
Tartrazine.lexer(@lexer_name).tokenizer(
|
||||
Tokenizer.new(
|
||||
Tartrazine.lexer(@lexer_name),
|
||||
String.new(match[match_group].value),
|
||||
secondary: true).to_a
|
||||
when ActionType::Usingself
|
||||
# Shunt to another copy of this lexer
|
||||
return [] of Token if match.empty?
|
||||
tokenizer.lexer.tokenizer(
|
||||
Tokenizer.new(
|
||||
tokenizer.lexer,
|
||||
String.new(match[match_group].value),
|
||||
secondary: true).to_a
|
||||
when ActionType::Combined
|
||||
@ -140,16 +136,6 @@ module Tartrazine
|
||||
tokenizer.lexer.states[new_state.name] = new_state
|
||||
tokenizer.state_stack << new_state.name
|
||||
[] of Token
|
||||
when ActionType::Usingbygroup
|
||||
# Shunt to content-specified lexer
|
||||
return [] of Token if match.empty?
|
||||
content = ""
|
||||
@content_index.each do |i|
|
||||
content += String.new(match[i].value)
|
||||
end
|
||||
Tartrazine.lexer(String.new(match[@lexer_index].value)).tokenizer(
|
||||
content,
|
||||
secondary: true).to_a
|
||||
else
|
||||
raise Exception.new("Unknown action type: #{@type}")
|
||||
end
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -12,10 +12,6 @@ module Tartrazine
|
||||
property theme : Theme = Tartrazine.theme("default-dark")
|
||||
|
||||
# Format the text using the given lexer.
|
||||
def format(text : String, lexer : Lexer, io : IO = nil) : Nil
|
||||
raise Exception.new("Not implemented")
|
||||
end
|
||||
|
||||
def format(text : String, lexer : Lexer) : String
|
||||
raise Exception.new("Not implemented")
|
||||
end
|
||||
|
@ -11,25 +11,35 @@ module Tartrazine
|
||||
"#{i + 1}".rjust(4).ljust(5)
|
||||
end
|
||||
|
||||
def format(text : String, lexer : BaseLexer) : String
|
||||
outp = String::Builder.new("")
|
||||
format(text, lexer, outp)
|
||||
outp.to_s
|
||||
end
|
||||
|
||||
def format(text : String, lexer : BaseLexer, outp : IO) : Nil
|
||||
tokenizer = lexer.tokenizer(text)
|
||||
def format(text : String, lexer : Lexer) : String
|
||||
tokenizer = Tokenizer.new(lexer, text)
|
||||
i = 0
|
||||
outp << line_label(i) if line_numbers?
|
||||
tokenizer.each do |token|
|
||||
outp << colorize(token[:value], token[:type])
|
||||
if token[:value].includes?("\n")
|
||||
i += 1
|
||||
outp << line_label(i) if line_numbers?
|
||||
output = String.build do |outp|
|
||||
outp << line_label(i) if line_numbers?
|
||||
tokenizer.each do |token|
|
||||
outp << colorize(token[:value], token[:type])
|
||||
if token[:value].includes?("\n")
|
||||
i += 1
|
||||
outp << line_label(i) if line_numbers?
|
||||
end
|
||||
end
|
||||
end
|
||||
output
|
||||
end
|
||||
|
||||
# def format(text : String, lexer : Lexer) : String
|
||||
# output = String.build do |outp|
|
||||
# lexer.group_tokens_in_lines(lexer.tokenize(text)).each_with_index do |line, i|
|
||||
# label = line_numbers? ? "#{i + 1}".rjust(4).ljust(5) : ""
|
||||
# outp << label
|
||||
# line.each do |token|
|
||||
# outp << colorize(token[:value], token[:type])
|
||||
# end
|
||||
# end
|
||||
# end
|
||||
# output
|
||||
# end
|
||||
|
||||
def colorize(text : String, token : String) : String
|
||||
style = theme.styles.fetch(token, nil)
|
||||
return text if style.nil?
|
||||
|
@ -35,26 +35,23 @@ module Tartrazine
|
||||
end
|
||||
|
||||
def format(text : String, lexer : Lexer) : String
|
||||
outp = String::Builder.new("")
|
||||
format(text, lexer, outp)
|
||||
outp.to_s
|
||||
end
|
||||
|
||||
def format(text : String, lexer : BaseLexer, io : IO) : Nil
|
||||
pre, post = wrap_standalone
|
||||
io << pre if standalone?
|
||||
format_text(text, lexer, io)
|
||||
io << post if standalone?
|
||||
text = format_text(text, lexer)
|
||||
if standalone?
|
||||
text = wrap_standalone(text)
|
||||
end
|
||||
text
|
||||
end
|
||||
|
||||
# Wrap text into a full HTML document, including the CSS for the theme
|
||||
def wrap_standalone
|
||||
def wrap_standalone(text) : String
|
||||
output = String.build do |outp|
|
||||
outp << "<!DOCTYPE html><html><head><style>"
|
||||
outp << style_defs
|
||||
outp << "</style></head><body>"
|
||||
outp << text
|
||||
outp << "</body></html>"
|
||||
end
|
||||
{output.to_s, "</body></html>"}
|
||||
output
|
||||
end
|
||||
|
||||
private def line_label(i : Int32) : String
|
||||
@ -64,23 +61,27 @@ module Tartrazine
|
||||
"<span #{line_id} #{line_class} style=\"user-select: none;\">#{line_label} </span>"
|
||||
end
|
||||
|
||||
def format_text(text : String, lexer : BaseLexer, outp : IO)
|
||||
tokenizer = lexer.tokenizer(text)
|
||||
def format_text(text : String, lexer : Lexer) : String
|
||||
# lines = lexer.group_tokens_in_lines(lexer.tokenize(text))
|
||||
tokenizer = Tokenizer.new(lexer, text)
|
||||
i = 0
|
||||
if surrounding_pre?
|
||||
pre_style = wrap_long_lines? ? "style=\"white-space: pre-wrap; word-break: break-word;\"" : ""
|
||||
outp << "<pre class=\"#{get_css_class("Background")}\" #{pre_style}>"
|
||||
end
|
||||
outp << "<code class=\"#{get_css_class("Background")}\">"
|
||||
outp << line_label(i) if line_numbers?
|
||||
tokenizer.each do |token|
|
||||
outp << "<span class=\"#{get_css_class(token[:type])}\">#{HTML.escape(token[:value])}</span>"
|
||||
if token[:value].ends_with? "\n"
|
||||
i += 1
|
||||
outp << line_label(i) if line_numbers?
|
||||
output = String.build do |outp|
|
||||
if surrounding_pre?
|
||||
pre_style = wrap_long_lines? ? "style=\"white-space: pre-wrap; word-break: break-word;\"" : ""
|
||||
outp << "<pre class=\"#{get_css_class("Background")}\" #{pre_style}>"
|
||||
end
|
||||
outp << "<code class=\"#{get_css_class("Background")}\">"
|
||||
outp << line_label(i) if line_numbers?
|
||||
tokenizer.each do |token|
|
||||
outp << "<span class=\"#{get_css_class(token[:type])}\">#{HTML.escape(token[:value])}</span>"
|
||||
if token[:value].ends_with? "\n"
|
||||
i += 1
|
||||
outp << line_label(i) if line_numbers?
|
||||
end
|
||||
end
|
||||
outp << "</code></pre>"
|
||||
end
|
||||
outp << "</code></pre>"
|
||||
output
|
||||
end
|
||||
|
||||
# ameba:disable Metrics/CyclomaticComplexity
|
||||
|
@ -4,15 +4,8 @@ module Tartrazine
|
||||
class Json < Formatter
|
||||
property name = "json"
|
||||
|
||||
def format(text : String, lexer : BaseLexer) : String
|
||||
outp = String::Builder.new("")
|
||||
format(text, lexer, outp)
|
||||
outp.to_s
|
||||
end
|
||||
|
||||
def format(text : String, lexer : BaseLexer, io : IO) : Nil
|
||||
tokenizer = lexer.tokenizer(text)
|
||||
io << Tartrazine::Lexer.collapse_tokens(tokenizer.to_a).to_json
|
||||
def format(text : String, lexer : Lexer, _theme : Theme) : String
|
||||
lexer.tokenize(text).to_json
|
||||
end
|
||||
end
|
||||
end
|
||||
|
@ -1,81 +0,0 @@
|
||||
require "yaml"
|
||||
|
||||
# Use linguist's heuristics to disambiguate between languages
|
||||
# This is *shamelessly* stolen from https://github.com/github-linguist/linguist
|
||||
# and ported to Crystal. Deepest thanks to the authors of Linguist
|
||||
# for licensing it liberally.
|
||||
#
|
||||
# Consider this code (c) 2017 GitHub, Inc. even if I wrote it.
|
||||
module Linguist
|
||||
class Heuristic
|
||||
include YAML::Serializable
|
||||
|
||||
property disambiguations : Array(Disambiguation)
|
||||
property named_patterns : Hash(String, String | Array(String))
|
||||
|
||||
# Run the heuristics on the given filename and content
|
||||
def run(filename, content)
|
||||
ext = File.extname filename
|
||||
disambiguation = disambiguations.find do |item|
|
||||
item.extensions.includes? ext
|
||||
end
|
||||
disambiguation.try &.run(content, named_patterns)
|
||||
end
|
||||
end
|
||||
|
||||
class Disambiguation
|
||||
include YAML::Serializable
|
||||
property extensions : Array(String)
|
||||
property rules : Array(LangRule)
|
||||
|
||||
def run(content, named_patterns)
|
||||
rules.each do |rule|
|
||||
if rule.match(content, named_patterns)
|
||||
return rule.language
|
||||
end
|
||||
end
|
||||
nil
|
||||
end
|
||||
end
|
||||
|
||||
class LangRule
|
||||
include YAML::Serializable
|
||||
property pattern : (String | Array(String))?
|
||||
property negative_pattern : (String | Array(String))?
|
||||
property named_pattern : String?
|
||||
property and : Array(LangRule)?
|
||||
property language : String | Array(String)?
|
||||
|
||||
# ameba:disable Metrics/CyclomaticComplexity
|
||||
def match(content, named_patterns)
|
||||
# This rule matches without conditions
|
||||
return true if !pattern && !negative_pattern && !named_pattern && !and
|
||||
|
||||
if pattern
|
||||
p_arr = [] of String
|
||||
p_arr << pattern.as(String) if pattern.is_a? String
|
||||
p_arr = pattern.as(Array(String)) if pattern.is_a? Array(String)
|
||||
return true if p_arr.any? { |pat| ::Regex.new(pat).matches?(content) }
|
||||
end
|
||||
if negative_pattern
|
||||
p_arr = [] of String
|
||||
p_arr << negative_pattern.as(String) if negative_pattern.is_a? String
|
||||
p_arr = negative_pattern.as(Array(String)) if negative_pattern.is_a? Array(String)
|
||||
return true if p_arr.none? { |pat| ::Regex.new(pat).matches?(content) }
|
||||
end
|
||||
if named_pattern
|
||||
p_arr = [] of String
|
||||
if named_patterns[named_pattern].is_a? String
|
||||
p_arr << named_patterns[named_pattern].as(String)
|
||||
else
|
||||
p_arr = named_patterns[named_pattern].as(Array(String))
|
||||
end
|
||||
result = p_arr.any? { |pat| ::Regex.new(pat).matches?(content) }
|
||||
end
|
||||
if and
|
||||
result = and.as(Array(LangRule)).all?(&.match(content, named_patterns))
|
||||
end
|
||||
result
|
||||
end
|
||||
end
|
||||
end
|
158
src/lexer.cr
158
src/lexer.cr
@ -9,72 +9,29 @@ module Tartrazine
|
||||
|
||||
# Get the lexer object for a language name
|
||||
# FIXME: support mimetypes
|
||||
def self.lexer(name : String? = nil, filename : String? = nil, mimetype : String? = nil) : BaseLexer
|
||||
return lexer_by_name(name) if name && name != "autodetect"
|
||||
return lexer_by_filename(filename) if filename
|
||||
return lexer_by_mimetype(mimetype) if mimetype
|
||||
|
||||
Lexer.from_xml(LexerFiles.get("/#{LEXERS_BY_NAME["plaintext"]}.xml").gets_to_end)
|
||||
end
|
||||
|
||||
private def self.lexer_by_mimetype(mimetype : String) : BaseLexer
|
||||
lexer_file_name = LEXERS_BY_MIMETYPE.fetch(mimetype, nil)
|
||||
raise Exception.new("Unknown mimetype: #{mimetype}") if lexer_file_name.nil?
|
||||
|
||||
Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end)
|
||||
end
|
||||
|
||||
private def self.lexer_by_name(name : String) : BaseLexer
|
||||
lexer_file_name = LEXERS_BY_NAME.fetch(name.downcase, nil)
|
||||
return create_delegating_lexer(name) if lexer_file_name.nil? && name.includes? "+"
|
||||
raise Exception.new("Unknown lexer: #{name}") if lexer_file_name.nil?
|
||||
|
||||
Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end)
|
||||
end
|
||||
|
||||
private def self.lexer_by_filename(filename : String) : BaseLexer
|
||||
candidates = Set(String).new
|
||||
LEXERS_BY_FILENAME.each do |k, v|
|
||||
candidates += v.to_set if File.match?(k, File.basename(filename))
|
||||
end
|
||||
|
||||
case candidates.size
|
||||
when 0
|
||||
def self.lexer(name : String? = nil, filename : String? = nil) : Lexer
|
||||
if name.nil? && filename.nil?
|
||||
lexer_file_name = LEXERS_BY_NAME["plaintext"]
|
||||
when 1
|
||||
lexer_file_name = candidates.first
|
||||
elsif name && name != "autodetect"
|
||||
lexer_file_name = LEXERS_BY_NAME[name.downcase]
|
||||
else
|
||||
lexer_file_name = self.lexer_by_content(filename)
|
||||
begin
|
||||
return self.lexer(lexer_file_name)
|
||||
rescue ex : Exception
|
||||
raise Exception.new("Multiple lexers match the filename: #{candidates.to_a.join(", ")}, heuristics suggest #{lexer_file_name} but there is no matching lexer.")
|
||||
# Guess by filename
|
||||
candidates = Set(String).new
|
||||
LEXERS_BY_FILENAME.each do |k, v|
|
||||
candidates += v.to_set if File.match?(k, File.basename(filename.to_s))
|
||||
end
|
||||
case candidates.size
|
||||
when 0
|
||||
lexer_file_name = LEXERS_BY_NAME["plaintext"]
|
||||
when 1
|
||||
lexer_file_name = candidates.first
|
||||
else
|
||||
raise Exception.new("Multiple lexers match the filename: #{candidates.to_a.join(", ")}")
|
||||
end
|
||||
end
|
||||
|
||||
Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end)
|
||||
end
|
||||
|
||||
private def self.lexer_by_content(fname : String) : String?
|
||||
h = Linguist::Heuristic.from_yaml(LexerFiles.get("/heuristics.yml").gets_to_end)
|
||||
result = h.run(fname, File.read(fname))
|
||||
case result
|
||||
when Nil
|
||||
raise Exception.new "No lexer found for #{fname}"
|
||||
when String
|
||||
result.as(String)
|
||||
when Array(String)
|
||||
result.first
|
||||
end
|
||||
end
|
||||
|
||||
private def self.create_delegating_lexer(name : String) : BaseLexer
|
||||
language, root = name.split("+", 2)
|
||||
language_lexer = lexer(language)
|
||||
root_lexer = lexer(root)
|
||||
DelegatingLexer.new(language_lexer, root_lexer)
|
||||
end
|
||||
|
||||
# Return a list of all lexers
|
||||
def self.lexers : Array(String)
|
||||
LEXERS_BY_NAME.keys.sort!
|
||||
@ -83,18 +40,15 @@ module Tartrazine
|
||||
# A token, the output of the tokenizer
|
||||
alias Token = NamedTuple(type: String, value: String)
|
||||
|
||||
abstract class BaseTokenizer
|
||||
end
|
||||
|
||||
class Tokenizer < BaseTokenizer
|
||||
struct Tokenizer
|
||||
include Iterator(Token)
|
||||
property lexer : BaseLexer
|
||||
property lexer : Lexer
|
||||
property text : Bytes
|
||||
property pos : Int32 = 0
|
||||
@dq = Deque(Token).new
|
||||
property state_stack = ["root"]
|
||||
|
||||
def initialize(@lexer : BaseLexer, text : String, secondary = false)
|
||||
def initialize(@lexer : Lexer, text : String, secondary = false)
|
||||
# Respect the `ensure_nl` config option
|
||||
if text.size > 0 && text[-1] != '\n' && @lexer.config[:ensure_nl] && !secondary
|
||||
text += "\n"
|
||||
@ -152,7 +106,13 @@ module Tartrazine
|
||||
end
|
||||
end
|
||||
|
||||
abstract class BaseLexer
|
||||
# This implements a lexer for Pygments RegexLexers as expressed
|
||||
# in Chroma's XML serialization.
|
||||
#
|
||||
# For explanations on what actions and states do
|
||||
# the Pygments documentation is a good place to start.
|
||||
# https://pygments.org/docs/lexerdevelopment/
|
||||
struct Lexer
|
||||
property config = {
|
||||
name: "",
|
||||
priority: 0.0,
|
||||
@ -163,18 +123,6 @@ module Tartrazine
|
||||
}
|
||||
property states = {} of String => State
|
||||
|
||||
def tokenizer(text : String, secondary = false) : BaseTokenizer
|
||||
Tokenizer.new(self, text, secondary)
|
||||
end
|
||||
end
|
||||
|
||||
# This implements a lexer for Pygments RegexLexers as expressed
|
||||
# in Chroma's XML serialization.
|
||||
#
|
||||
# For explanations on what actions and states do
|
||||
# the Pygments documentation is a good place to start.
|
||||
# https://pygments.org/docs/lexerdevelopment/
|
||||
class Lexer < BaseLexer
|
||||
# Collapse consecutive tokens of the same type for easier comparison
|
||||
# and smaller output
|
||||
def self.collapse_tokens(tokens : Array(Tartrazine::Token)) : Array(Tartrazine::Token)
|
||||
@ -256,60 +204,6 @@ module Tartrazine
|
||||
end
|
||||
end
|
||||
|
||||
# A lexer that takes two lexers as arguments. A root lexer
|
||||
# and a language lexer. Everything is scalled using the
|
||||
# language lexer, afterwards all `Other` tokens are lexed
|
||||
# using the root lexer.
|
||||
#
|
||||
# This is useful for things like template languages, where
|
||||
# you have Jinja + HTML or Jinja + CSS and so on.
|
||||
class DelegatingLexer < BaseLexer
|
||||
property language_lexer : BaseLexer
|
||||
property root_lexer : BaseLexer
|
||||
|
||||
def initialize(@language_lexer : BaseLexer, @root_lexer : BaseLexer)
|
||||
end
|
||||
|
||||
def tokenizer(text : String, secondary = false) : DelegatingTokenizer
|
||||
DelegatingTokenizer.new(self, text, secondary)
|
||||
end
|
||||
end
|
||||
|
||||
# This Tokenizer works with a DelegatingLexer. It first tokenizes
|
||||
# using the language lexer, and "Other" tokens are tokenized using
|
||||
# the root lexer.
|
||||
class DelegatingTokenizer < BaseTokenizer
|
||||
include Iterator(Token)
|
||||
@dq = Deque(Token).new
|
||||
@language_tokenizer : BaseTokenizer
|
||||
|
||||
def initialize(@lexer : DelegatingLexer, text : String, secondary = false)
|
||||
# Respect the `ensure_nl` config option
|
||||
if text.size > 0 && text[-1] != '\n' && @lexer.config[:ensure_nl] && !secondary
|
||||
text += "\n"
|
||||
end
|
||||
@language_tokenizer = @lexer.language_lexer.tokenizer(text, true)
|
||||
end
|
||||
|
||||
def next : Iterator::Stop | Token
|
||||
if @dq.size > 0
|
||||
return @dq.shift
|
||||
end
|
||||
token = @language_tokenizer.next
|
||||
if token.is_a? Iterator::Stop
|
||||
return stop
|
||||
elsif token.as(Token).[:type] == "Other"
|
||||
root_tokenizer = @lexer.root_lexer.tokenizer(token.as(Token).[:value], true)
|
||||
root_tokenizer.each do |root_token|
|
||||
@dq << root_token
|
||||
end
|
||||
else
|
||||
@dq << token.as(Token)
|
||||
end
|
||||
self.next
|
||||
end
|
||||
end
|
||||
|
||||
# A Lexer state. A state has a name and a list of rules.
|
||||
# The state machine has a state stack containing references
|
||||
# to states to decide which rules to apply.
|
||||
|
12
src/main.cr
12
src/main.cr
@ -20,8 +20,7 @@ Usage:
|
||||
Options:
|
||||
-f <formatter> Format to use (html, terminal, json)
|
||||
-t <theme> Theme to use, see --list-themes [default: default-dark]
|
||||
-l <lexer> Lexer (language) to use, see --list-lexers. Use more than
|
||||
one lexer with "+" (e.g. jinja+yaml) [default: autodetect]
|
||||
-l <lexer> Lexer (language) to use, see --list-lexers [default: autodetect]
|
||||
-o <output> Output file. Default is stdout.
|
||||
--standalone Generate a standalone HTML file, which includes
|
||||
all style information. If not given, it will generate just
|
||||
@ -86,12 +85,13 @@ if options["-f"]
|
||||
lexer = Tartrazine.lexer(name: options["-l"].as(String), filename: options["FILE"].as(String))
|
||||
|
||||
input = File.open(options["FILE"].as(String)).gets_to_end
|
||||
output = formatter.format(input, lexer)
|
||||
|
||||
if options["-o"].nil?
|
||||
outf = STDOUT
|
||||
puts output
|
||||
else
|
||||
outf = File.open(options["-o"].as(String), "w")
|
||||
File.open(options["-o"].as(String), "w") do |outf|
|
||||
outf << output
|
||||
end
|
||||
end
|
||||
formatter.format(input, lexer, outf)
|
||||
outf.close
|
||||
end
|
||||
|
Reference in New Issue
Block a user