Compare commits

...

11 Commits

Author SHA1 Message Date
0e7dafe711 Updated README 2024-08-24 22:33:24 -03:00
082241eb0f Load lexer by mimetype 2024-08-24 22:20:38 -03:00
df88047ca8 v0.6.1 2024-08-24 21:45:57 -03:00
5a3b50d7a3 Integrate heuristics into lexer selection 2024-08-24 21:39:39 -03:00
a5926af518 Comments 2024-08-24 20:53:14 -03:00
fc9f834bc8 Make it work again 2024-08-24 20:09:29 -03:00
58fd42d936 Rebase to main 2024-08-24 19:59:05 -03:00
5a88a51f3e Implement heuristics from linguist 2024-08-24 19:55:56 -03:00
fd7c6fa4b3 Sort of working? 2024-08-24 19:55:56 -03:00
6264bfc754 Beginning deserialization of data 2024-08-24 19:55:56 -03:00
38196d6e96 Rst lexer 2024-08-24 19:49:02 -03:00
13 changed files with 1191 additions and 104 deletions

View File

@ -2,36 +2,11 @@
Tartrazine is a library to syntax-highlight code. It is Tartrazine is a library to syntax-highlight code. It is
a port of [Pygments](https://pygments.org/) to a port of [Pygments](https://pygments.org/) to
[Crystal](https://crystal-lang.org/). Kind of. [Crystal](https://crystal-lang.org/).
The CLI tool can be used to highlight many things in many styles. It also provides a CLI tool which can be used to highlight many things in many styles.
# A port of what? Why "kind of"? Currently Tartrazine supports 247 languages. and it has 331 themes (63 from Chroma, the rest are base16 themes via
Pygments is a staple of the Python ecosystem, and it's great.
It lets you highlight code in many languages, and it has many
themes. Chroma is "Pygments for Go", it's actually a port of
Pygments to Go, and it's great too.
I wanted that in Crystal, so I started this project. But I did
not read much of the Pygments code. Or much of Chroma's.
Chroma has taken most of the Pygments lexers and turned them into
XML descriptions. What I did was take those XML files from Chroma
and a pile of test cases from Pygments, and I slapped them together
until the tests passed and my code produced the same output as
Chroma. Think of it as *extreme TDD*.
Currently the pass rate for tests in the supported languages
is `96.8%`, which is *not bad for a couple days hacking*.
This only covers the RegexLexers, which are the most common ones,
but it means the supported languages are a subset of Chroma's, which
is a subset of Pygments'.
Currently Tartrazine supports ... 248 languages.
It has 331 themes (63 from Chroma, the rest are base16 themes via
[Sixteen](https://github.com/ralsina/sixteen) [Sixteen](https://github.com/ralsina/sixteen)
## Installation ## Installation
@ -58,7 +33,7 @@ $ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers -f terminal
Generate a standalone HTML file from a C source file with the syntax highlighted: Generate a standalone HTML file from a C source file with the syntax highlighted:
```shell ```shell
$ tartrazine whatever.c -l c -t catppuccin-macchiato --line-numbers \ $ tartrazine whatever.c -t catppuccin-macchiato --line-numbers \
--standalone -f html -o whatever.html --standalone -f html -o whatever.html
``` ```
@ -87,3 +62,29 @@ puts formatter.format(File.read(ARGV[0]), lexer)
## Contributors ## Contributors
- [Roberto Alsina](https://github.com/ralsina) - creator and maintainer - [Roberto Alsina](https://github.com/ralsina) - creator and maintainer
## A port of what? Why "kind of"?
Pygments is a staple of the Python ecosystem, and it's great.
It lets you highlight code in many languages, and it has many
themes. Chroma is "Pygments for Go", it's actually a port of
Pygments to Go, and it's great too.
I wanted that in Crystal, so I started this project. But I did
not read much of the Pygments code. Or much of Chroma's.
Chroma has taken most of the Pygments lexers and turned them into
XML descriptions. What I did was take those XML files from Chroma
and a pile of test cases from Pygments, and I slapped them together
until the tests passed and my code produced the same output as
Chroma. Think of it as [*extreme TDD*](https://ralsina.me/weblog/posts/tartrazine-reimplementing-pygments.html)
Currently the pass rate for tests in the supported languages
is `96.8%`, which is *not bad for a couple days hacking*.
This only covers the RegexLexers, which are the most common ones,
but it means the supported languages are a subset of Chroma's, which
is a subset of Pygments' and DelegatingLexers (useful for things like template languages)
Then performance was bad, so I hacked and hacked and made it
significantly [faster than chroma](https://ralsina.me/weblog/posts/a-tale-of-optimization.html) which is fun.

View File

@ -8,6 +8,8 @@
* ✅ Implement lexer loader that respects aliases * ✅ Implement lexer loader that respects aliases
* ✅ Implement lexer loader by file extension * ✅ Implement lexer loader by file extension
* ✅ Add --line-numbers to terminal formatter * ✅ Add --line-numbers to terminal formatter
* Implement lexer loader by mime type * Implement lexer loader by mime type
* ✅ Implement Delegating lexers * ✅ Implement Delegating lexers
* Add RstLexer maybe others? * ✅ Add RstLexer
* Add Mako template lexer
* ✅ Implement heuristic lexer detection

22
lexers/LICENSE-heuristics Normal file
View File

@ -0,0 +1,22 @@
Copyright (c) 2017 GitHub, Inc.
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

View File

@ -1,47 +0,0 @@
<lexer>
<config>
<name>Twig</name>
<alias>twig</alias>
<mime_type>application/x-twig</mime_type>
<dot_all>true</dot_all>
</config>
<rules>
<state name="root">
<rule pattern="[^{]+"><token type="Other"/></rule>
<rule pattern="\{\{"><token type="CommentPreproc"/><push state="var"/></rule>
<rule pattern="\{\#.*?\#\}"><token type="Comment"/></rule>
<rule pattern="(\{%)(-?\s*)(raw)(\s*-?)(%\})(.*?)(\{%)(-?\s*)(endraw)(\s*-?)(%\})"><bygroups><token type="CommentPreproc"/><token type="Text"/><token type="Keyword"/><token type="Text"/><token type="CommentPreproc"/><token type="Other"/><token type="CommentPreproc"/><token type="Text"/><token type="Keyword"/><token type="Text"/><token type="CommentPreproc"/></bygroups></rule>
<rule pattern="(\{%)(-?\s*)(verbatim)(\s*-?)(%\})(.*?)(\{%)(-?\s*)(endverbatim)(\s*-?)(%\})"><bygroups><token type="CommentPreproc"/><token type="Text"/><token type="Keyword"/><token type="Text"/><token type="CommentPreproc"/><token type="Other"/><token type="CommentPreproc"/><token type="Text"/><token type="Keyword"/><token type="Text"/><token type="CommentPreproc"/></bygroups></rule>
<rule pattern="(\{%)(-?\s*)(filter)(\s+)((?:[\\_a-z]|[^\x00-\x7f])(?:[\\\w-]|[^\x00-\x7f])*)"><bygroups><token type="CommentPreproc"/><token type="Text"/><token type="Keyword"/><token type="Text"/><token type="NameFunction"/></bygroups><push state="tag"/></rule>
<rule pattern="(\{%)(-?\s*)([a-zA-Z_]\w*)"><bygroups><token type="CommentPreproc"/><token type="Text"/><token type="Keyword"/></bygroups><push state="tag"/></rule>
<rule pattern="\{"><token type="Other"/></rule>
</state>
<state name="varnames">
<rule pattern="(\|)(\s*)((?:[\\_a-z]|[^\x00-\x7f])(?:[\\\w-]|[^\x00-\x7f])*)"><bygroups><token type="Operator"/><token type="Text"/><token type="NameFunction"/></bygroups></rule>
<rule pattern="(is)(\s+)(not)?(\s*)((?:[\\_a-z]|[^\x00-\x7f])(?:[\\\w-]|[^\x00-\x7f])*)"><bygroups><token type="Keyword"/><token type="Text"/><token type="Keyword"/><token type="Text"/><token type="NameFunction"/></bygroups></rule>
<rule pattern="(?i)(true|false|none|null)\b"><token type="KeywordPseudo"/></rule>
<rule pattern="(in|not|and|b-and|or|b-or|b-xor|isif|elseif|else|importconstant|defined|divisibleby|empty|even|iterable|odd|sameasmatches|starts\s+with|ends\s+with)\b"><token type="Keyword"/></rule>
<rule pattern="(loop|block|parent)\b"><token type="NameBuiltin"/></rule>
<rule pattern="(?:[\\_a-z]|[^\x00-\x7f])(?:[\\\w-]|[^\x00-\x7f])*"><token type="NameVariable"/></rule>
<rule pattern="\.(?:[\\_a-z]|[^\x00-\x7f])(?:[\\\w-]|[^\x00-\x7f])*"><token type="NameVariable"/></rule>
<rule pattern="\.[0-9]+"><token type="LiteralNumber"/></rule>
<rule pattern=":?&quot;(\\\\|\\[^\\]|[^&quot;\\])*&quot;"><token type="LiteralStringDouble"/></rule>
<rule pattern=":?&#x27;(\\\\|\\[^\\]|[^&#x27;\\])*&#x27;"><token type="LiteralStringSingle"/></rule>
<rule pattern="([{}()\[\]+\-*/,:~%]|\.\.|\?|:|\*\*|\/\/|!=|[&gt;&lt;=]=?)"><token type="Operator"/></rule>
<rule pattern="[0-9](\.[0-9]*)?(eE[+-][0-9])?[flFLdD]?|0[xX][0-9a-fA-F]+[Ll]?"><token type="LiteralNumber"/></rule>
</state>
<state name="var">
<rule pattern="\s+"><token type="Text"/></rule>
<rule pattern="(-?)(\}\})"><bygroups><token type="Text"/><token type="CommentPreproc"/></bygroups><pop depth="1"/></rule>
<rule><include state="varnames"/></rule>
</state>
<state name="tag">
<rule pattern="\s+"><token type="Text"/></rule>
<rule pattern="(-?)(%\})"><bygroups><token type="Text"/><token type="CommentPreproc"/></bygroups><pop depth="1"/></rule>
<rule><include state="varnames"/></rule>
<rule pattern="."><token type="Punctuation"/></rule>
</state>
</rules>
</lexer>

View File

@ -3,6 +3,7 @@
<name>Groff</name> <name>Groff</name>
<alias>groff</alias> <alias>groff</alias>
<alias>nroff</alias> <alias>nroff</alias>
<alias>roff</alias>
<alias>man</alias> <alias>man</alias>
<filename>*.[1-9]</filename> <filename>*.[1-9]</filename>
<filename>*.1p</filename> <filename>*.1p</filename>
@ -87,4 +88,4 @@
</rule> </rule>
</state> </state>
</rules> </rules>
</lexer> </lexer>

913
lexers/heuristics.yml Normal file
View File

@ -0,0 +1,913 @@
# A collection of simple regexp-based rules that can be applied to content
# to disambiguate languages with the same file extension.
#
# There are two top-level keys: disambiguations and named_patterns.
#
# disambiguations - a list of disambiguation rules, one for each
# extension or group of extensions.
# extensions - an array of file extensions that this block applies to.
# rules - list of rules that are applied in order to the content
# of a file with a matching extension. Rules are evaluated
# until one of them matches. If none matches, no language
# is returned.
# language - Language to be returned if the rule matches.
# pattern - Ruby-compatible regular expression that makes the rule
# match. If no pattern is specified, the rule always matches.
# Pattern can be a string with a single regular expression
# or an array of strings that will be merged in a single
# regular expression (with union).
# and - An and block merges multiple rules and checks that all of
# them must match.
# negative_pattern - Same as pattern, but checks for absence of matches.
# named_pattern - A pattern can be reused by specifying it in the
# named_patterns section and referencing it here by its
# key.
# named_patterns - Key-value map of reusable named patterns.
#
# Please keep this list alphabetized.
#
---
disambiguations:
- extensions: ['.1', '.2', '.3', '.4', '.5', '.6', '.7', '.8', '.9']
rules:
- language: man
and:
- named_pattern: mdoc-date
- named_pattern: mdoc-title
- named_pattern: mdoc-heading
- language: man
and:
- named_pattern: man-title
- named_pattern: man-heading
- language: Roff
pattern: '^\.(?:[A-Za-z]{2}(?:\s|$)|\\")'
- extensions: ['.1in', '.1m', '.1x', '.3in', '.3m', '.3p', '.3pm', '.3qt', '.3x', '.man', '.mdoc']
rules:
- language: man
and:
- named_pattern: mdoc-date
- named_pattern: mdoc-title
- named_pattern: mdoc-heading
- language: man
and:
- named_pattern: man-title
- named_pattern: man-heading
- language: Roff
- extensions: ['.al']
rules:
# AL pattern source from https://github.com/microsoft/AL/blob/master/grammar/alsyntax.tmlanguage - keyword.other.applicationobject.al
- language: AL
and:
- pattern: '\b(?i:(CODEUNIT|PAGE|PAGEEXTENSION|PAGECUSTOMIZATION|DOTNET|ENUM|ENUMEXTENSION|VALUE|QUERY|REPORT|TABLE|TABLEEXTENSION|XMLPORT|PROFILE|CONTROLADDIN|REPORTEXTENSION|INTERFACE|PERMISSIONSET|PERMISSIONSETEXTENSION|ENTITLEMENT))\b'
# Open-ended fallback to Perl AutoLoader
- language: Perl
- extensions: ['.app']
rules:
- language: Erlang
pattern: '^\{\s*(?:application|''application'')\s*,\s*(?:[a-z]+[\w@]*|''[^'']+'')\s*,\s*\[(?:.|[\r\n])*\]\s*\}\.[ \t]*$'
- extensions: ['.as']
rules:
- language: ActionScript
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:\{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+\s*:\s*[\w<>.]+\s*(,\s*\w+\s*:\s*[\w<>.]+\s*)*)?\)))'
- extensions: ['.asc']
rules:
- language: Public Key
pattern: '^(----[- ]BEGIN|ssh-(rsa|dss)) '
- language: AsciiDoc
pattern: '^[=-]+\s|\{\{[A-Za-z]'
- language: AGS Script
pattern: '^(\/\/.+|((import|export)\s+)?(function|int|float|char)\s+((room|repeatedly|on|game)_)?([A-Za-z]+[A-Za-z_0-9]+)\s*[;\(])'
- extensions: ['.asm']
rules:
- language: Motorola 68K Assembly
named_pattern: m68k
- extensions: ['.asy']
rules:
- language: LTspice Symbol
pattern: '^SymbolType[ \t]'
- language: Asymptote
- extensions: ['.bas']
rules:
- language: FreeBasic
pattern: '^[ \t]*#(?i)(?:define|endif|endmacro|ifn?def|include|lang|macro)(?:$|\s)'
- language: BASIC
pattern: '\A\s*\d'
- language: VBA
and:
- named_pattern: vb-module
- named_pattern: vba
- language: Visual Basic 6.0
named_pattern: vb-module
- extensions: ['.bb']
rules:
- language: BlitzBasic
pattern: '(<^\s*; |End Function)'
- language: BitBake
pattern: '^(# |include|require|inherit)\b'
- language: Clojure
pattern: '\((def|defn|defmacro|let)\s'
- extensions: ['.bf']
rules:
- language: Beef
pattern: '(?-m)^\s*using\s+(System|Beefy)(\.(.*))?;\s*$'
- language: HyPhy
pattern:
- '(?-m)^\s*#include\s+".*";\s*$'
- '\sfprintf\s*\('
- language: Brainfuck
pattern: '(>\+>|>\+<)'
- extensions: ['.bi']
rules:
- language: FreeBasic
pattern: '^[ \t]*#(?i)(?:define|endif|endmacro|ifn?def|if|include|lang|macro)(?:$|\s)'
- extensions: ['.bs']
rules:
- language: Bikeshed
pattern: '^(?i:<pre\s+class)\s*=\s*(''|\"|\b)metadata\b\1[^>\r\n]*>'
- language: BrighterScript
pattern:
- (?i:^\s*(?=^sub\s)(?:sub\s*\w+\(.*?\))|(?::\s*sub\(.*?\))$)
- (?i:^\s*(end\ssub)$)
- (?i:^\s*(?=^function\s)(?:function\s*\w+\(.*?\)\s*as\s*\w*)|(?::\s*function\(.*?\)\s*as\s*\w*)$)
- (?i:^\s*(end\sfunction)$)
- language: Bluespec BH
pattern: '^package\s+[A-Za-z_][A-Za-z0-9_'']*(?:\s*\(|\s+where)'
- extensions: ['.builds']
rules:
- language: XML
pattern: '^(\s*)(?i:<Project|<Import|<Property|<?xml|xmlns)'
- extensions: ['.ch']
rules:
- language: xBase
pattern: '^\s*#\s*(?i:if|ifdef|ifndef|define|command|xcommand|translate|xtranslate|include|pragma|undef)\b'
- extensions: ['.cl']
rules:
- language: Common Lisp
pattern: '^\s*\((?i:defun|in-package|defpackage) '
- language: Cool
pattern: '^class'
- language: OpenCL
pattern: '\/\* |\/\/ |^\}'
- extensions: ['.cls']
rules:
- language: Visual Basic 6.0
and:
- named_pattern: vb-class
- pattern: '^\s*BEGIN(?:\r?\n|\r)\s*MultiUse\s*=.*(?:\r?\n|\r)\s*Persistable\s*='
- language: VBA
named_pattern: vb-class
- language: TeX
pattern: '^\s*\\(?:NeedsTeXFormat|ProvidesClass)\{'
- language: ObjectScript
pattern: '^Class\s'
- extensions: ['.cmp']
rules:
- language: Gerber Image
pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
- extensions: ['.cs']
rules:
- language: Smalltalk
pattern: '![\w\s]+methodsFor: '
- language: 'C#'
pattern: '^\s*(using\s+[A-Z][\s\w.]+;|namespace\s*[\w\.]+\s*(\{|;)|\/\/)'
- extensions: ['.csc']
rules:
- language: GSC
named_pattern: gsc
- extensions: ['.csl']
rules:
- language: XML
pattern: '(?i:^\s*(<\?xml|xmlns))'
- language: Kusto
pattern: '(^\|\s*(where|extend|project|limit|summarize))|(^\.\w+)'
- extensions: ['.d']
rules:
- language: D
# see http://dlang.org/spec/grammar
# ModuleDeclaration | ImportDeclaration | FuncDeclaration | unittest
pattern: '^module\s+[\w.]*\s*;|import\s+[\w\s,.:]*;|\w+\s+\w+\s*\(.*\)(?:\(.*\))?\s*\{[^}]*\}|unittest\s*(?:\(.*\))?\s*\{[^}]*\}'
- language: DTrace
# see http://dtrace.org/guide/chp-prog.html, http://dtrace.org/guide/chp-profile.html, http://dtrace.org/guide/chp-opt.html
pattern: '^(\w+:\w*:\w*:\w*|BEGIN|END|provider\s+|(tick|profile)-\w+\s+\{[^}]*\}|#pragma\s+D\s+(option|attributes|depends_on)\s|#pragma\s+ident\s)'
- language: Makefile
# path/target : dependency \
# target : \
# : dependency
# path/file.ext1 : some/path/../file.ext2
pattern: '([\/\\].*:\s+.*\s\\$|: \\$|^[ %]:|^[\w\s\/\\.]+\w+\.\w+\s*:\s+[\w\s\/\\.]+\w+\.\w+)'
- extensions: ['.dsp']
rules:
- language: Microsoft Developer Studio Project
pattern: '# Microsoft Developer Studio Generated Build File'
- language: Faust
pattern: '\bprocess\s*[(=]|\b(library|import)\s*\(\s*"|\bdeclare\s+(name|version|author|copyright|license)\s+"'
- extensions: ['.e']
rules:
- language: E
pattern:
- '^\s*(def|var)\s+(.+):='
- '^\s*(def|to)\s+(\w+)(\(.+\))?\s+\{'
- '^\s*(when)\s+(\(.+\))\s+->\s+\{'
- language: Eiffel
pattern:
- '^\s*\w+\s*(?:,\s*\w+)*[:]\s*\w+\s'
- '^\s*\w+\s*(?:\(\s*\w+[:][^)]+\))?(?:[:]\s*\w+)?(?:--.+\s+)*\s+(?:do|local)\s'
- '^\s*(?:across|deferred|elseif|ensure|feature|from|inherit|inspect|invariant|note|once|require|undefine|variant|when)\s*$'
- language: Euphoria
named_pattern: euphoria
- extensions: ['.ecl']
rules:
- language: ECLiPSe
pattern: '^[^#]+:-'
- language: ECL
pattern: ':='
- extensions: ['.es']
rules:
- language: Erlang
pattern: '^\s*(?:%%|main\s*\(.*?\)\s*->)'
- language: JavaScript
pattern: '\/\/|("|'')use strict\1|export\s+default\s|\/\*(?:.|[\r\n])*?\*\/'
- extensions: ['.ex']
rules:
- language: Elixir
pattern:
- '^\s*@moduledoc\s'
- '^\s*(?:cond|import|quote|unless)\s'
- '^\s*def(?:exception|impl|macro|module|protocol)[(\s]'
- language: Euphoria
named_pattern: euphoria
- extensions: ['.f']
rules:
- language: Forth
pattern: '^: '
- language: Filebench WML
pattern: 'flowop'
- language: Fortran
named_pattern: fortran
- extensions: ['.for']
rules:
- language: Forth
pattern: '^: '
- language: Fortran
named_pattern: fortran
- extensions: ['.fr']
rules:
- language: Forth
pattern: '^(: |also |new-device|previous )'
- language: Frege
pattern: '^\s*(import|module|package|data|type) '
- language: Text
- extensions: ['.frm']
rules:
- language: VBA
and:
- named_pattern: vb-form
- pattern: '^\s*Begin\s+\{[0-9A-Z\-]*\}\s?'
- language: Visual Basic 6.0
and:
- named_pattern: vb-form
- pattern: '^\s*Begin\s+VB\.Form\s+'
- extensions: ['.fs']
rules:
- language: Forth
pattern: '^(: |new-device)'
- language: 'F#'
pattern: '^\s*(#light|import|let|module|namespace|open|type)'
- language: GLSL
pattern: '^\s*(#version|precision|uniform|varying|vec[234])'
- language: Filterscript
pattern: '#include|#pragma\s+(rs|version)|__attribute__'
- extensions: ['.ftl']
rules:
- language: FreeMarker
pattern: '^(?:<|[a-zA-Z-][a-zA-Z0-9_-]+[ \t]+\w)|\$\{\w+[^\r\n]*?\}|^[ \t]*(?:<#--.*?-->|<#([a-z]+)(?=\s|>)[^>]*>.*?</#\1>|\[#--.*?--\]|\[#([a-z]+)(?=\s|\])[^\]]*\].*?\[#\2\])'
- language: Fluent
pattern: '^-?[a-zA-Z][a-zA-Z0-9_-]* *=|\{\$-?[a-zA-Z][-\w]*(?:\.[a-zA-Z][-\w]*)?\}'
- extensions: ['.g']
rules:
- language: GAP
pattern: '\s*(Declare|BindGlobal|KeyDependentOperation|Install(Method|GlobalFunction)|SetPackageInfo)'
- language: G-code
pattern: '^[MG][0-9]+(?:\r?\n|\r)'
- extensions: ['.gd']
rules:
- language: GAP
pattern: '\s*(Declare|BindGlobal|KeyDependentOperation)'
- language: GDScript
pattern: '\s*(extends|var|const|enum|func|class|signal|tool|yield|assert|onready)'
- extensions: ['.gml']
rules:
- language: XML
pattern: '(?i:^\s*(<\?xml|xmlns))'
- language: Graph Modeling Language
pattern: '(?i:^\s*(graph|node)\s+\[$)'
- language: Gerber Image
pattern: '^[DGMT][0-9]{2}\*$'
- language: Game Maker Language
- extensions: ['.gs']
rules:
- language: GLSL
pattern: '^#version\s+[0-9]+\b'
- language: Gosu
pattern: '^uses (java|gw)\.'
- language: Genie
pattern: '^\[indent=[0-9]+\]'
- extensions: ['.gsc']
rules:
- language: GSC
named_pattern: gsc
- extensions: ['.gsh']
rules:
- language: GSC
named_pattern: gsc
- extensions: ['.gts']
rules:
- language: Gerber Image
pattern: '^G0.'
- language: Glimmer TS
negative_pattern: '^G0.'
- extensions: ['.h']
rules:
- language: Objective-C
named_pattern: objectivec
- language: C++
named_pattern: cpp
- language: C
- extensions: ['.hh']
rules:
- language: Hack
pattern: '<\?hh'
- extensions: ['.html']
rules:
- language: Ecmarkup
pattern: '<emu-(?:alg|annex|biblio|clause|eqn|example|figure|gann|gmod|gprose|grammar|intro|not-ref|note|nt|prodref|production|rhs|table|t|xref)(?:$|\s|>)'
- language: HTML
- extensions: ['.i']
rules:
- language: Motorola 68K Assembly
named_pattern: m68k
- language: SWIG
pattern: '^[ \t]*%[a-z_]+\b|^%[{}]$'
- extensions: ['.ice']
rules:
- language: JSON
pattern: '\A\s*[{\[]'
- language: Slice
- extensions: ['.inc']
rules:
- language: Motorola 68K Assembly
named_pattern: m68k
- language: PHP
pattern: '^<\?(?:php)?'
- language: SourcePawn
pattern:
- '^public\s+(?:SharedPlugin(?:\s+|:)__pl_\w+\s*=(?:\s*\{)?|(?:void\s+)?__pl_\w+_SetNTVOptional\(\)(?:\s*\{)?)'
- '^methodmap\s+\w+\s+<\s+\w+'
- '^\s*MarkNativeAsOptional\s*\('
- language: NASL
pattern:
- '^\s*include\s*\(\s*(?:"|'')[\\/\w\-\.:\s]+\.(?:nasl|inc)\s*(?:"|'')\s*\)\s*;'
- '^\s*(?:global|local)_var\s+(?:\w+(?:\s*=\s*[\w\-"'']+)?\s*)(?:,\s*\w+(?:\s*=\s*[\w\-"'']+)?\s*)*+\s*;'
- '^\s*namespace\s+\w+\s*\{'
- '^\s*object\s+\w+\s*(?:extends\s+\w+(?:::\w+)?)?\s*\{'
- '^\s*(?:public\s+|private\s+|\s*)function\s+\w+\s*\([\w\s,]*\)\s*\{'
- language: POV-Ray SDL
pattern: '^\s*#(declare|local|macro|while)\s'
- language: Pascal
pattern:
- '(?i:^\s*\{\$(?:mode|ifdef|undef|define)[ ]+[a-z0-9_]+\})'
- '^\s*end[.;]\s*$'
- language: BitBake
pattern: '^inherit(\s+[\w.-]+)+\s*$'
- extensions: ['.json']
rules:
- language: OASv2-json
pattern: '"swagger":\s?"2.[0-9.]+"'
- language: OASv3-json
pattern: '"openapi":\s?"3.[0-9.]+"'
- language: JSON
- extensions: ['.l']
rules:
- language: Common Lisp
pattern: '\(def(un|macro)\s'
- language: Lex
pattern: '^(%[%{}]xs|<.*>)'
- language: Roff
pattern: '^\.[A-Za-z]{2}(\s|$)'
- language: PicoLisp
pattern: '^\((de|class|rel|code|data|must)\s'
- extensions: ['.lean']
rules:
- language: Lean
pattern: '^import [a-z]'
- language: Lean 4
pattern: '^import [A-Z]'
- extensions: ['.ls']
rules:
- language: LoomScript
pattern: '^\s*package\s*[\w\.\/\*\s]*\s*\{'
- language: LiveScript
- extensions: ['.lsp', '.lisp']
rules:
- language: Common Lisp
pattern: '^\s*\((?i:defun|in-package|defpackage) '
- language: NewLisp
pattern: '^\s*\(define '
- extensions: ['.m']
rules:
- language: Objective-C
named_pattern: objectivec
- language: Mercury
pattern: ':- module'
- language: MUF
pattern: '^: '
- language: M
pattern: '^\s*;'
- language: Mathematica
and:
- pattern: '\(\*'
- pattern: '\*\)$'
- language: MATLAB
pattern: '^\s*%'
- language: Limbo
pattern: '^\w+\s*:\s*module\s*\{'
- extensions: ['.m4']
rules:
- language: M4Sugar
pattern:
- 'AC_DEFUN|AC_PREREQ|AC_INIT'
- '^_?m4_'
- language: 'M4'
- extensions: ['.mask']
rules:
- language: Unity3D Asset
pattern: 'tag:unity3d.com'
- extensions: ['.mc']
rules:
- language: Win32 Message File
pattern: '(?i)^[ \t]*(?>\/\*\s*)?MessageId=|^\.$'
- language: M4
pattern: '^dnl|^divert\((?:-?\d+)?\)|^\w+\(`[^\r\n]*?''[),]'
- language: Monkey C
pattern: '\b(?:using|module|function|class|var)\s+\w'
- extensions: ['.md']
rules:
- language: Markdown
pattern:
- '(^[-A-Za-z0-9=#!\*\[|>])|<\/'
- '\A\z'
- language: GCC Machine Description
pattern: '^(;;|\(define_)'
- language: Markdown
- extensions: ['.ml']
rules:
- language: OCaml
pattern: '(^\s*module)|let rec |match\s+(\S+\s)+with'
- language: Standard ML
pattern: '=> |case\s+(\S+\s)+of'
- extensions: ['.mod']
rules:
- language: XML
pattern: '<!ENTITY '
- language: NMODL
pattern: '\b(NEURON|INITIAL|UNITS)\b'
- language: Modula-2
pattern: '^\s*(?i:MODULE|END) [\w\.]+;'
- language: [Linux Kernel Module, AMPL]
- extensions: ['.mojo']
rules:
- language: Mojo
pattern: '^\s*(alias|def|from|fn|import|struct|trait)\s'
- language: XML
pattern: '^\s*<\?xml'
- extensions: ['.ms']
rules:
- language: Roff
pattern: '^[.''][A-Za-z]{2}(\s|$)'
- language: Unix Assembly
and:
- negative_pattern: '/\*'
- pattern: '^\s*\.(?:include\s|globa?l\s|[A-Za-z][_A-Za-z0-9]*:)'
- language: MAXScript
- extensions: ['.n']
rules:
- language: Roff
pattern: '^[.'']'
- language: Nemerle
pattern: '^(module|namespace|using)\s'
- extensions: ['.ncl']
rules:
- language: XML
pattern: '^\s*<\?xml\s+version'
- language: Gerber Image
pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
- language: Text
pattern: 'THE_TITLE'
- extensions: ['.nl']
rules:
- language: NL
pattern: '^(b|g)[0-9]+ '
- language: NewLisp
- extensions: ['.nu']
rules:
- language: Nushell
pattern: '^\s*(import|export|module|def|let|let-env) '
- language: Nu
- extensions: ['.odin']
rules:
- language: Object Data Instance Notation
pattern: '(?:^|<)\s*[A-Za-z0-9_]+\s*=\s*<'
- language: Odin
pattern: 'package\s+\w+|\b(?:im|ex)port\s*"[\w:./]+"|\w+\s*::\s*(?:proc|struct)\s*\(|^\s*//\s'
- extensions: ['.p']
rules:
- language: Gnuplot
pattern:
- '^s?plot\b'
- '^set\s+(term|terminal|out|output|[xy]tics|[xy]label|[xy]range|style)\b'
- language: OpenEdge ABL
- extensions: ['.php']
rules:
- language: Hack
pattern: '<\?hh'
- language: PHP
pattern: '<\?[^h]'
- extensions: ['.pkl']
rules:
- language: Pkl
pattern:
- '^\s*(module|import|amends|extends|local|const|fixed|abstract|open|class|typealias|@\w+)\b'
- '^\s*[a-zA-Z0-9_$]+\s*(=|{|:)|^\s*`[^`]+`\s*(=|{|:)|for\s*\(|when\s*\('
- language: Pickle
- extensions: ['.pl']
rules:
- language: Prolog
pattern: '^[^#]*:-'
- language: Perl
and:
- negative_pattern: '^\s*use\s+v6\b'
- named_pattern: perl
- language: Raku
named_pattern: raku
- extensions: ['.plist']
rules:
- language: XML Property List
pattern: '^\s*(?:<\?xml\s|<!DOCTYPE\s+plist|<plist(?:\s+version\s*=\s*(["''])\d+(?:\.\d+)?\1)?\s*>\s*$)'
- language: OpenStep Property List
- extensions: ['.plt']
rules:
- language: Prolog
pattern: '^\s*:-'
- extensions: ['.pm']
rules:
- language: Perl
and:
- negative_pattern: '^\s*use\s+v6\b'
- named_pattern: perl
- language: Raku
named_pattern: raku
- language: X PixMap
pattern: '^\s*\/\* XPM \*\/'
- extensions: ['.pod']
rules:
- language: Pod 6
pattern: '^[\s&&[^\r\n]]*=(comment|begin pod|begin para|item\d+)'
- language: Pod
- extensions: ['.pp']
rules:
- language: Pascal
pattern: '^\s*end[.;]'
- language: Puppet
pattern: '^\s+\w+\s+=>\s'
- extensions: ['.pro']
rules:
- language: Proguard
pattern: '^-(include\b.*\.pro$|keep\b|keepclassmembers\b|keepattributes\b)'
- language: Prolog
pattern: '^[^\[#]+:-'
- language: INI
pattern: 'last_client='
- language: QMake
and:
- pattern: HEADERS
- pattern: SOURCES
- language: IDL
pattern: '^\s*(?i:function|pro|compile_opt) \w[ \w,:]*$'
- extensions: ['.properties']
rules:
- language: INI
and:
- named_pattern: key_equals_value
- pattern: '^[;\[]'
- language: Java Properties
and:
- named_pattern: key_equals_value
- pattern: '^[#!]'
- language: INI
named_pattern: key_equals_value
- language: Java Properties
pattern: '^[^#!][^:]*:'
- extensions: ['.q']
rules:
- language: q
pattern: '((?i:[A-Z.][\w.]*:\{)|^\\(cd?|d|l|p|ts?) )'
- language: HiveQL
pattern: '(?i:SELECT\s+[\w*,]+\s+FROM|(CREATE|ALTER|DROP)\s(DATABASE|SCHEMA|TABLE))'
- extensions: ['.qs']
rules:
- language: Q#
pattern: '^((\/{2,3})?\s*(namespace|operation)\b)'
- language: Qt Script
pattern: '(\w+\.prototype\.\w+|===|\bvar\b)'
- extensions: ['.r']
rules:
- language: Rebol
pattern: '(?i:\bRebol\b)'
- language: Rez
pattern: '(#include\s+["<](Types\.r|Carbon\/Carbon\.r)[">])|((resource|data|type)\s+''[A-Za-z0-9]{4}''\s+((\(.*\)\s+){0,1}){)'
- language: R
pattern: '<-|^\s*#'
- extensions: ['.re']
rules:
- language: Reason
pattern:
- '^\s*module\s+type\s'
- '^\s*(?:include|open)\s+\w+\s*;\s*$'
- '^\s*let\s+(?:module\s\w+\s*=\s*\{|\w+:\s+.*=.*;\s*$)'
- language: C++
pattern:
- '^\s*#(?:(?:if|ifdef|define|pragma)\s+\w|\s*include\s+<[^>]+>)'
- '^\s*template\s*<'
- extensions: ['.res']
rules:
- language: ReScript
pattern:
- '^\s*(let|module|type)\s+\w*\s+=\s+'
- '^\s*(?:include|open)\s+\w+\s*$'
- extensions: ['.rno']
rules:
- language: RUNOFF
pattern: '(?i:^\.!|^\f|\f$|^\.end lit(?:eral)?\b|^\.[a-zA-Z].*?;\.[a-zA-Z](?:[; \t])|\^\*[^\s*][^*]*\\\*(?=$|\s)|^\.c;[ \t]*\w+)'
- language: Roff
pattern: '^\.\\" '
- extensions: ['.rpy']
rules:
- language: Python
pattern: '^(import|from|class|def)\s'
- language: "Ren'Py"
- extensions: ['.rs']
rules:
- language: Rust
pattern: '^(use |fn |mod |pub |macro_rules|impl|#!?\[)'
- language: RenderScript
pattern: '#include|#pragma\s+(rs|version)|__attribute__'
- language: XML
pattern: '^\s*<\?xml'
- extensions: ['.s']
rules:
- language: Motorola 68K Assembly
named_pattern: m68k
- extensions: ['.sc']
rules:
- language: SuperCollider
pattern: '(?i:\^(this|super)\.|^\s*~\w+\s*=\.)'
- language: Scala
pattern: '(^\s*import (scala|java)\.|^\s*class\b)'
- extensions: ['.scd']
rules:
- language: SuperCollider
pattern: '(?i:\^(this|super)\.|^\s*(~\w+\s*=\.|SynthDef\b))'
- language: Markdown
# Markdown syntax for scdoc
pattern: '^#+\s+(NAME|SYNOPSIS|DESCRIPTION)'
- extensions: ['.sol']
rules:
- language: Solidity
pattern: '\bpragma\s+solidity\b|\b(?:abstract\s+)?contract\s+(?!\d)[a-zA-Z0-9$_]+(?:\s+is\s+(?:[a-zA-Z0-9$_][^\{]*?)?)?\s*\{'
- language: Gerber Image
pattern: '^[DGMT][0-9]{2}\*(?:\r?\n|\r)'
- extensions: ['.sql']
rules:
# Postgres
- language: PLpgSQL
pattern: '(?i:^\\i\b|AS\s+\$\$|LANGUAGE\s+''?plpgsql''?|BEGIN(\s+WORK)?\s*;)'
# IBM db2
- language: SQLPL
pattern: '(?i:ALTER\s+MODULE|MODE\s+DB2SQL|\bSYS(CAT|PROC)\.|ASSOCIATE\s+RESULT\s+SET|\bEND!\s*$)'
# Oracle
- language: PLSQL
pattern: '(?i:\$\$PLSQL_|XMLTYPE|systimestamp|\.nextval|CONNECT\s+BY|AUTHID\s+(DEFINER|CURRENT_USER)|constructor\W+function)'
# T-SQL
- language: TSQL
pattern: '(?i:^\s*GO\b|BEGIN(\s+TRY|\s+CATCH)|OUTPUT\s+INSERTED|DECLARE\s+@|\[dbo\])'
- language: SQL
- extensions: ['.srt']
rules:
- language: SubRip Text
pattern: '^(\d{2}:\d{2}:\d{2},\d{3})\s*(-->)\s*(\d{2}:\d{2}:\d{2},\d{3})$'
- extensions: ['.st']
rules:
- language: StringTemplate
pattern: '\$\w+[($]|(.)!\s*.+?\s*!\1|<!\s*.+?\s*!>|\[!\s*.+?\s*!\]|\{!\s*.+?\s*!\}'
- language: Smalltalk
pattern: '\A\s*[\[{(^"''\w#]|[a-zA-Z_]\w*\s*:=\s*[a-zA-Z_]\w*|class\s*>>\s*[a-zA-Z_]\w*|^[a-zA-Z_]\w*\s+[a-zA-Z_]\w*:|^Class\s*\{|if(?:True|False):\s*\['
- extensions: ['.star']
rules:
- language: STAR
pattern: '^loop_\s*$'
- language: Starlark
- extensions: ['.stl']
rules:
- language: STL
pattern: '\A\s*solid(?:$|\s)[\s\S]*^endsolid(?:$|\s)'
- extensions: ['.sw']
rules:
- language: Sway
pattern: '^\s*(?:(?:abi|dep|fn|impl|mod|pub|trait)\s|#\[)'
- language: XML
pattern: '^\s*<\?xml\s+version'
- extensions: ['.t']
rules:
- language: Perl
and:
- negative_pattern: '^\s*use\s+v6\b'
- named_pattern: perl
- language: Raku
pattern: '^\s*(?:use\s+v6\b|\bmodule\b|\bmy\s+class\b)'
- language: Turing
pattern: '^\s*%[ \t]+|^\s*var\s+\w+(\s*:\s*\w+)?\s*:=\s*\w+'
- extensions: ['.tag']
rules:
- language: Java Server Pages
pattern: '<%[@!=\s]?\s*(taglib|tag|include|attribute|variable)\s'
- extensions: ['.tlv']
rules:
- language: TL-Verilog
pattern: '^\\.{0,10}TLV_version'
- extensions: ['.toc']
rules:
- language: World of Warcraft Addon Data
pattern: '^## |@no-lib-strip@'
- language: TeX
pattern: '^\\(contentsline|defcounter|beamer|boolfalse)'
- extensions: ['.ts']
rules:
- language: XML
pattern: '<TS\b'
- language: TypeScript
- extensions: ['.tst']
rules:
- language: GAP
pattern: 'gap> '
# Heads up - we don't usually write heuristics like this (with no regex match)
- language: Scilab
- extensions: ['.tsx']
rules:
- language: TSX
pattern: '^\s*(import.+(from\s+|require\()[''"]react|\/\/\/\s*<reference\s)'
- language: XML
pattern: '(?i:^\s*<\?xml\s+version)'
- extensions: ['.txt']
rules:
# The following RegExp is simply a collapsed and simplified form of the
# VIM_MODELINE pattern in `./lib/linguist/strategy/modeline.rb`.
- language: Vim Help File
pattern: '(?:(?:^|[ \t])(?:vi|Vi(?=m))(?:m[<=>]?[0-9]+|m)?|[ \t]ex)(?=:(?=[ \t]*set?[ \t][^\r\n:]+:)|:(?![ \t]*set?[ \t]))(?:(?:[ \t]*:[ \t]*|[ \t])\w*(?:[ \t]*=(?:[^\\\s]|\\.)*)?)*[ \t:](?:filetype|ft|syntax)[ \t]*=(help)(?=$|\s|:)'
- language: Adblock Filter List
pattern: |-
(?x)\A
\[
(?<version>
(?:
[Aa]d[Bb]lock
(?:[ \t][Pp]lus)?
|
u[Bb]lock
(?:[ \t][Oo]rigin)?
|
[Aa]d[Gg]uard
)
(?:[ \t] \d+(?:\.\d+)*+)?
)
(?:
[ \t]?;[ \t]?
\g<version>
)*+
\]
# HACK: This is a contrived use of heuristics needed to address
# an unusual edge-case. See https://git.io/JULye for discussion.
- language: Text
- extensions: ['.typ']
rules:
- language: Typst
pattern: '^#(import|show|let|set)'
- language: XML
- extensions: ['.url']
rules:
- language: INI
pattern: '^\[InternetShortcut\](?:\r?\n|\r)(?>[^\s\[][^\r\n]*(?:\r?\n|\r))*URL='
- extensions: ['.v']
rules:
- language: Coq
pattern: '(?:^|\s)(?:Proof|Qed)\.(?:$|\s)|(?:^|\s)Require[ \t]+(Import|Export)\s'
- language: Verilog
pattern: '^[ \t]*module\s+[^\s()]+\s+\#?\(|^[ \t]*`(?:define|ifdef|ifndef|include|timescale)|^[ \t]*always[ \t]+@|^[ \t]*initial[ \t]+(begin|@)'
- language: V
pattern: '\$(?:if|else)[ \t]|^[ \t]*fn\s+[^\s()]+\(.*?\).*?\{|^[ \t]*for\s*\{'
- extensions: ['.vba']
rules:
- language: Vim Script
pattern: '^UseVimball'
- language: VBA
- extensions: ['.w']
rules:
- language: OpenEdge ABL
pattern: '&ANALYZE-SUSPEND _UIB-CODE-BLOCK _CUSTOM _DEFINITIONS'
- language: CWeb
pattern: '^@(<|\w+\.)'
- extensions: ['.x']
rules:
- language: DirectX 3D File
pattern: '^xof 030(2|3)(?:txt|bin|tzip|bzip)\b'
- language: RPC
pattern: '\b(program|version)\s+\w+\s*\{|\bunion\s+\w+\s+switch\s*\('
- language: Logos
pattern: '^%(end|ctor|hook|group)\b'
- language: Linker Script
pattern: 'OUTPUT_ARCH\(|OUTPUT_FORMAT\(|SECTIONS'
- extensions: ['.yaml', '.yml']
rules:
- language: MiniYAML
pattern: '^\t+.*?[^\s:].*?:'
negative_pattern: '---'
- language: OASv2-yaml
pattern: 'swagger:\s?''?"?2.[0-9.]+''?"?'
- language: OASv3-yaml
pattern: 'openapi:\s?''?"?3.[0-9.]+''?"?'
- language: YAML
- extensions: ['.yy']
rules:
- language: JSON
pattern: '\"modelName\"\:\s*\"GM'
- language: Yacc
named_patterns:
cpp:
- '^\s*#\s*include <(cstdint|string|vector|map|list|array|bitset|queue|stack|forward_list|unordered_map|unordered_set|(i|o|io)stream)>'
- '^\s*template\s*<'
- '^[ \t]*(try|constexpr)'
- '^[ \t]*catch\s*\('
- '^[ \t]*(class|(using[ \t]+)?namespace)\s+\w+'
- '^[ \t]*(private|public|protected):$'
- '__has_cpp_attribute|__cplusplus >'
- 'std::\w+'
euphoria:
- '^\s*namespace\s'
- '^\s*(?:public\s+)?include\s'
- '^\s*(?:(?:public|export|global)\s+)?(?:atom|constant|enum|function|integer|object|procedure|sequence|type)\s'
fortran: '^(?i:[c*][^abd-z]| (subroutine|program|end|data)\s|\s*!)'
gsc:
- '^\s*#\s*(?:using|insert|include|define|namespace)[ \t]+\w'
- '^\s*(?>(?:autoexec|private)\s+){0,2}function\s+(?>(?:autoexec|private)\s+){0,2}\w+\s*\('
- '\b(?:level|self)[ \t]+thread[ \t]+(?:\[\[[ \t]*(?>\w+\.)*\w+[ \t]*\]\]|\w+)[ \t]*\([^\r\n\)]*\)[ \t]*;'
- '^[ \t]*#[ \t]*(?:precache|using_animtree)[ \t]*\('
key_equals_value: '^[^#!;][^=]*='
m68k:
- '(?im)\bmoveq(?:\.l)?\s+#(?:\$-?[0-9a-f]{1,3}|%[0-1]{1,8}|-?[0-9]{1,3}),\s*d[0-7]\b'
- '(?im)^\s*move(?:\.[bwl])?\s+(?:sr|usp),\s*[^\s]+'
- '(?im)^\s*move\.[bwl]\s+.*\b[ad]\d'
- '(?im)^\s*movem\.[bwl]\b'
- '(?im)^\s*move[mp](?:\.[wl])?\b'
- '(?im)^\s*btst\b'
- '(?im)^\s*dbra\b'
man-heading: '^[.''][ \t]*SH +(?:[^"\s]+|"[^"\s]+)'
man-title: '^[.''][ \t]*TH +(?:[^"\s]+|"[^"]+") +"?(?:[1-9]|@[^\s@]+@)'
mdoc-date: '^[.''][ \t]*Dd +(?:[^"\s]+|"[^"]+")'
mdoc-heading: '^[.''][ \t]*Sh +(?:[^"\s]|"[^"]+")'
mdoc-title: '^[.''][ \t]*Dt +(?:[^"\s]+|"[^"]+") +"?(?:[1-9]|@[^\s@]+@)'
objectivec: '^\s*(@(interface|class|protocol|property|end|synchronised|selector|implementation)\b|#import\s+.+\.h[">])'
perl:
- '\buse\s+(?:strict\b|v?5\b)'
- '^\s*use\s+(?:constant|overload)\b'
- '^\s*(?:\*|(?:our\s*)?@)EXPORT\s*='
- '^\s*package\s+[^\W\d]\w*(?:::\w+)*\s*(?:[;{]|\sv?\d)'
- '[\s$][^\W\d]\w*(?::\w+)*->[a-zA-Z_\[({]'
raku: '^\s*(?:use\s+v6\b|\bmodule\b|\b(?:my\s+)?class\b)'
vb-class: '^[ ]*VERSION [0-9]\.[0-9] CLASS'
vb-form: '^[ ]*VERSION [0-9]\.[0-9]{2}'
vb-module: '^[ ]*Attribute VB_Name = '
vba:
- '\b(?:VBA|[vV]ba)(?:\b|[0-9A-Z_])'
# VBA7 new 64-bit features
- '^[ ]*(?:Public|Private)? Declare PtrSafe (?:Sub|Function)\b'
- '^[ ]*#If Win64\b'
- '^[ ]*(?:Dim|Const) [0-9a-zA-Z_]*[ ]*As Long(?:Ptr|Long)\b'
# Top module declarations unique to VBA
- '^[ ]*Option (?:Private Module|Compare (?:Database|Text|Binary))\b'
# General VBA libraries and objects
- '(?: |\()(?:Access|Excel|Outlook|PowerPoint|Visio|Word|VBIDE)\.\w'
- '\b(?:(?:Active)?VBProjects?|VBComponents?|Application\.(?:VBE|ScreenUpdating))\b'
# AutoCAD, Outlook, PowerPoint and Word objects
- '\b(?:ThisDrawing|AcadObject|Active(?:Explorer|Inspector|Window\.Presentation|Presentation|Document)|Selection\.(?:Find|Paragraphs))\b'
# Excel objects
- '\b(?:(?:This|Active)?Workbooks?|Worksheets?|Active(?:Sheet|Chart|Cell)|WorksheetFunction)\b'
- '\b(?:Range\(".*|Cells\([0-9a-zA-Z_]*, (?:[0-9a-zA-Z_]*|"[a-zA-Z]{1,3}"))\)'

76
lexers/rst.xml Normal file
View File

@ -0,0 +1,76 @@
<lexer>
<config>
<name>reStructuredText</name>
<alias>restructuredtext</alias>
<alias>rst</alias>
<alias>rest</alias>
<filename>*.rst</filename>
<filename>*.rest</filename>
<mime_type>text/x-rst</mime_type>
<mime_type>text/prs.fallenstein.rst</mime_type>
</config>
<rules>
<state name="root">
<rule pattern="^(=+|-+|`+|:+|\.+|\&#x27;+|&quot;+|~+|\^+|_+|\*+|\++|#+)([ \t]*\n)(.+)(\n)(\1)(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
<rule pattern="^(\S.*)(\n)(={3,}|-{3,}|`{3,}|:{3,}|\.{3,}|\&#x27;{3,}|&quot;{3,}|~{3,}|\^{3,}|_{3,}|\*{3,}|\+{3,}|#{3,})(\n)"><bygroups><token type="GenericHeading"/><token type="Text"/><token type="GenericHeading"/><token type="Text"/></bygroups></rule>
<rule pattern="^(\s*)([-*+])( .+\n(?:\1 .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
<rule pattern="^(\s*)([0-9#ivxlcmIVXLCM]+\.)( .+\n(?:\1 .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
<rule pattern="^(\s*)(\(?[0-9#ivxlcmIVXLCM]+\))( .+\n(?:\1 .+\n)*)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
<rule pattern="^(\s*)([A-Z]+\.)( .+\n(?:\1 .+\n)+)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
<rule pattern="^(\s*)(\(?[A-Za-z]+\))( .+\n(?:\1 .+\n)+)"><bygroups><token type="Text"/><token type="LiteralNumber"/><usingself state="inline"/></bygroups></rule>
<rule pattern="^(\s*)(\|)( .+\n(?:\| .+\n)*)"><bygroups><token type="Text"/><token type="Operator"/><usingself state="inline"/></bygroups></rule>
<rule pattern="^( *\.\.)(\s*)((?:source)?code(?:-block)?)(::)([ \t]*)([^\n]+)(\n[ \t]*\n)([ \t]+)(.*)(\n)((?:(?:\8.*)?\n)+)">
<bygroups>
<token type="Punctuation"/>
<token type="Text"/>
<token type="OperatorWord"/>
<token type="Punctuation"/>
<token type="Text"/>
<token type="Keyword"/>
<token type="Text"/>
<token type="Text"/>
<UsingByGroup lexer="6" content="9,10,11"/>
</bygroups>
</rule>
<rule pattern="^( *\.\.)(\s*)([\w:-]+?)(::)(?:([ \t]*)(.*))">
<bygroups>
<token type="Punctuation"/>
<token type="Text"/>
<token type="OperatorWord"/>
<token type="Punctuation"/>
<token type="Text"/>
<usingself state="inline"/>
</bygroups>
</rule>
<rule pattern="^( *\.\.)(\s*)(_(?:[^:\\]|\\.)+:)(.*?)$"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><usingself state="inline"/></bygroups></rule>
<rule pattern="^( *\.\.)(\s*)(\[.+\])(.*?)$"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><usingself state="inline"/></bygroups></rule>
<rule pattern="^( *\.\.)(\s*)(\|.+\|)(\s*)([\w:-]+?)(::)(?:([ \t]*)(.*))"><bygroups><token type="Punctuation"/><token type="Text"/><token type="NameTag"/><token type="Text"/><token type="OperatorWord"/><token type="Punctuation"/><token type="Text"/><usingself state="inline"/></bygroups></rule>
<rule pattern="^ *\.\..*(\n( +.*\n|\n)+)?"><token type="Comment"/></rule>
<rule pattern="^( *)(:(?:\\\\|\\:|[^:\n])+:(?=\s))([ \t]*)"><bygroups><token type="Text"/><token type="NameClass"/><token type="Text"/></bygroups></rule>
<rule pattern="^(\S.*(?&lt;!::)\n)((?:(?: +.*)\n)+)"><bygroups><usingself state="inline"/><usingself state="inline"/></bygroups></rule>
<rule pattern="(::)(\n[ \t]*\n)([ \t]+)(.*)(\n)((?:(?:\3.*)?\n)+)"><bygroups><token type="LiteralStringEscape"/><token type="Text"/><token type="LiteralString"/><token type="LiteralString"/><token type="Text"/><token type="LiteralString"/></bygroups></rule>
<rule><include state="inline"/></rule>
</state>
<state name="inline">
<rule pattern="\\."><token type="Text"/></rule>
<rule pattern="``"><token type="LiteralString"/><push state="literal"/></rule>
<rule pattern="(`.+?)(&lt;.+?&gt;)(`__?)"><bygroups><token type="LiteralString"/><token type="LiteralStringInterpol"/><token type="LiteralString"/></bygroups></rule>
<rule pattern="`.+?`__?"><token type="LiteralString"/></rule>
<rule pattern="(`.+?`)(:[a-zA-Z0-9:-]+?:)?"><bygroups><token type="NameVariable"/><token type="NameAttribute"/></bygroups></rule>
<rule pattern="(:[a-zA-Z0-9:-]+?:)(`.+?`)"><bygroups><token type="NameAttribute"/><token type="NameVariable"/></bygroups></rule>
<rule pattern="\*\*.+?\*\*"><token type="GenericStrong"/></rule>
<rule pattern="\*.+?\*"><token type="GenericEmph"/></rule>
<rule pattern="\[.*?\]_"><token type="LiteralString"/></rule>
<rule pattern="&lt;.+?&gt;"><token type="NameTag"/></rule>
<rule pattern="[^\\\n\[*`:]+"><token type="Text"/></rule>
<rule pattern="."><token type="Text"/></rule>
</state>
<state name="literal">
<rule pattern="[^`]+"><token type="LiteralString"/></rule>
<rule pattern="``((?=$)|(?=[-/:.,; \n\x00 &#x27;&quot;\)\]\}&gt;’”»!\?]))"><token type="LiteralString"/><pop depth="1"/></rule>
<rule pattern="`"><token type="LiteralString"/></rule>
</state>
</rules>
</lexer>

View File

@ -52,6 +52,6 @@ with open("src/constants/lexers.cr", "w") as f:
f.write(" LEXERS_BY_FILENAME = {\n") f.write(" LEXERS_BY_FILENAME = {\n")
for k in sorted(lexer_by_filename.keys()): for k in sorted(lexer_by_filename.keys()):
v = lexer_by_filename[k] v = lexer_by_filename[k]
f.write(f'"{k}" => {str(list(v)).replace("'", "\"")}, \n') f.write(f'"{k}" => {str(sorted(list(v))).replace("'", "\"")}, \n')
f.write("}\n") f.write("}\n")
f.write("end\n") f.write("end\n")

View File

@ -1,5 +1,5 @@
name: tartrazine name: tartrazine
version: 0.6.0 version: 0.6.1
authors: authors:
- Roberto Alsina <roberto.alsina@gmail.com> - Roberto Alsina <roberto.alsina@gmail.com>

View File

@ -23,7 +23,7 @@ module Tartrazine
struct Action struct Action
property actions : Array(Action) = [] of Action property actions : Array(Action) = [] of Action
@content_index : Int32 = 0 @content_index : Array(Int32) = [] of Int32
@depth : Int32 = 0 @depth : Int32 = 0
@lexer_index : Int32 = 0 @lexer_index : Int32 = 0
@lexer_name : String = "" @lexer_name : String = ""
@ -67,7 +67,7 @@ module Tartrazine
}.map &.content }.map &.content
when ActionType::Usingbygroup when ActionType::Usingbygroup
@lexer_index = xml["lexer"].to_i @lexer_index = xml["lexer"].to_i
@content_index = xml["content"].to_i @content_index = xml["content"].split(",").map(&.to_i)
end end
end end
@ -143,8 +143,12 @@ module Tartrazine
when ActionType::Usingbygroup when ActionType::Usingbygroup
# Shunt to content-specified lexer # Shunt to content-specified lexer
return [] of Token if match.empty? return [] of Token if match.empty?
content = ""
@content_index.each do |i|
content += String.new(match[i].value)
end
Tartrazine.lexer(String.new(match[@lexer_index].value)).tokenizer( Tartrazine.lexer(String.new(match[@lexer_index].value)).tokenizer(
String.new(match[@content_index].value), content,
secondary: true).to_a secondary: true).to_a
else else
raise Exception.new("Unknown action type: #{@type}") raise Exception.new("Unknown action type: #{@type}")

View File

@ -324,10 +324,14 @@ module Tartrazine
"reg" => "reg", "reg" => "reg",
"registry" => "reg", "registry" => "reg",
"rego" => "rego", "rego" => "rego",
"rest" => "rst",
"restructuredtext" => "rst",
"rexx" => "rexx", "rexx" => "rexx",
"rkt" => "racket", "rkt" => "racket",
"roff" => "groff",
"rpmspec" => "rpm_spec", "rpmspec" => "rpm_spec",
"rs" => "rust", "rs" => "rust",
"rst" => "rst",
"ruby" => "ruby", "ruby" => "ruby",
"rust" => "rust", "rust" => "rust",
"s" => "r", "s" => "r",
@ -395,7 +399,7 @@ module Tartrazine
"turing" => "turing", "turing" => "turing",
"turtle" => "turtle", "turtle" => "turtle",
"tv" => "tradingview", "tv" => "tradingview",
"twig" => "TwigLexer", "twig" => "twig",
"typescript" => "typescript", "typescript" => "typescript",
"typoscript" => "typoscript", "typoscript" => "typoscript",
"typoscriptcssdata" => "typoscriptcssdata", "typoscriptcssdata" => "typoscriptcssdata",
@ -467,7 +471,7 @@ module Tartrazine
"application/x-fennel" => "fennel", "application/x-fennel" => "fennel",
"application/x-fish" => "fish", "application/x-fish" => "fish",
"application/x-forth" => "forth", "application/x-forth" => "forth",
"application/x-gdscript" => "gdscript3", "application/x-gdscript" => "gdscript",
"application/x-hcl" => "hcl", "application/x-hcl" => "hcl",
"application/x-hy" => "hy", "application/x-hy" => "hy",
"application/x-javascript" => "javascript", "application/x-javascript" => "javascript",
@ -500,7 +504,7 @@ module Tartrazine
"application/x-thrift" => "thrift", "application/x-thrift" => "thrift",
"application/x-troff" => "groff", "application/x-troff" => "groff",
"application/x-turtle" => "turtle", "application/x-turtle" => "turtle",
"application/x-twig" => "TwigLexer", "application/x-twig" => "twig",
"application/x-vue" => "vue", "application/x-vue" => "vue",
"application/x.ucode" => "ucode", "application/x.ucode" => "ucode",
"application/xhtml+xml" => "html", "application/xhtml+xml" => "html",
@ -527,6 +531,7 @@ module Tartrazine
"text/odin" => "odin", "text/odin" => "odin",
"text/org" => "org_mode", "text/org" => "org_mode",
"text/plain" => "plaintext", "text/plain" => "plaintext",
"text/prs.fallenstein.rst" => "rst",
"text/rust" => "rust", "text/rust" => "rust",
"text/s" => "r", "text/s" => "r",
"text/s-plus" => "r", "text/s-plus" => "r",
@ -589,7 +594,7 @@ module Tartrazine
"text/x-fortran" => "fortran", "text/x-fortran" => "fortran",
"text/x-fsharp" => "fsharp", "text/x-fsharp" => "fsharp",
"text/x-gas" => "gas", "text/x-gas" => "gas",
"text/x-gdscript" => "gdscript3", "text/x-gdscript" => "gdscript",
"text/x-gherkin" => "gherkin", "text/x-gherkin" => "gherkin",
"text/x-gleam" => "gleam", "text/x-gleam" => "gleam",
"text/x-glslsrc" => "glsl", "text/x-glslsrc" => "glsl",
@ -657,6 +662,7 @@ module Tartrazine
"text/x-reasonml" => "reasonml", "text/x-reasonml" => "reasonml",
"text/x-rexx" => "rexx", "text/x-rexx" => "rexx",
"text/x-rpm-spec" => "rpm_spec", "text/x-rpm-spec" => "rpm_spec",
"text/x-rst" => "rst",
"text/x-ruby" => "ruby", "text/x-ruby" => "ruby",
"text/x-rust" => "rust", "text/x-rust" => "rust",
"text/x-sas" => "sas", "text/x-sas" => "sas",
@ -726,7 +732,7 @@ module Tartrazine
"*.aql" => ["arangodb_aql"], "*.aql" => ["arangodb_aql"],
"*.arexx" => ["rexx"], "*.arexx" => ["rexx"],
"*.as" => ["actionscript", "actionscript_3"], "*.as" => ["actionscript", "actionscript_3"],
"*.asm" => ["nasm", "z80_assembly", "tasm"], "*.asm" => ["nasm", "tasm", "z80_assembly"],
"*.au3" => ["autoit"], "*.au3" => ["autoit"],
"*.automount" => ["systemd"], "*.automount" => ["systemd"],
"*.aux" => ["tex"], "*.aux" => ["tex"],
@ -734,7 +740,7 @@ module Tartrazine
"*.awk" => ["awk"], "*.awk" => ["awk"],
"*.b" => ["brainfuck"], "*.b" => ["brainfuck"],
"*.bal" => ["ballerina"], "*.bal" => ["ballerina"],
"*.bas" => ["vb_net", "qbasic"], "*.bas" => ["qbasic", "vb_net"],
"*.bash" => ["bash"], "*.bash" => ["bash"],
"*.bat" => ["batchfile"], "*.bat" => ["batchfile"],
"*.batch" => ["psl"], "*.batch" => ["psl"],
@ -834,7 +840,7 @@ module Tartrazine
"*.fx" => ["hlsl"], "*.fx" => ["hlsl"],
"*.fxh" => ["hlsl"], "*.fxh" => ["hlsl"],
"*.fzn" => ["minizinc"], "*.fzn" => ["minizinc"],
"*.gd" => ["gdscript3", "gdscript"], "*.gd" => ["gdscript", "gdscript3"],
"*.gemspec" => ["ruby"], "*.gemspec" => ["ruby"],
"*.geo" => ["glsl"], "*.geo" => ["glsl"],
"*.gleam" => ["gleam"], "*.gleam" => ["gleam"],
@ -844,7 +850,7 @@ module Tartrazine
"*.graphql" => ["graphql"], "*.graphql" => ["graphql"],
"*.graphqls" => ["graphql"], "*.graphqls" => ["graphql"],
"*.groovy" => ["groovy"], "*.groovy" => ["groovy"],
"*.h" => ["objective-c", "c", "c++"], "*.h" => ["c", "c++", "objective-c"],
"*.h++" => ["c++"], "*.h++" => ["c++"],
"*.ha" => ["hare"], "*.ha" => ["hare"],
"*.handlebars" => ["handlebars"], "*.handlebars" => ["handlebars"],
@ -852,7 +858,7 @@ module Tartrazine
"*.hc" => ["holyc"], "*.hc" => ["holyc"],
"*.hc.z" => ["holyc"], "*.hc.z" => ["holyc"],
"*.hcl" => ["hcl"], "*.hcl" => ["hcl"],
"*.hh" => ["holyc", "c++"], "*.hh" => ["c++", "holyc"],
"*.hlb" => ["hlb"], "*.hlb" => ["hlb"],
"*.hlsl" => ["hlsl"], "*.hlsl" => ["hlsl"],
"*.hlsli" => ["hlsl"], "*.hlsli" => ["hlsl"],
@ -867,7 +873,7 @@ module Tartrazine
"*.idc" => ["c"], "*.idc" => ["c"],
"*.idr" => ["idris"], "*.idr" => ["idris"],
"*.ijs" => ["j"], "*.ijs" => ["j"],
"*.inc" => ["php", "sourcepawn", "objectpascal", "povray"], "*.inc" => ["objectpascal", "php", "povray", "sourcepawn"],
"*.inf" => ["ini"], "*.inf" => ["ini"],
"*.ini" => ["ini"], "*.ini" => ["ini"],
"*.ino" => ["arduino"], "*.ino" => ["arduino"],
@ -893,13 +899,13 @@ module Tartrazine
"*.lpk" => ["objectpascal"], "*.lpk" => ["objectpascal"],
"*.lpr" => ["objectpascal"], "*.lpr" => ["objectpascal"],
"*.lua" => ["lua"], "*.lua" => ["lua"],
"*.m" => ["mason", "mathematica", "matlab", "octave", "objective-c"], "*.m" => ["mason", "mathematica", "matlab", "objective-c", "octave"],
"*.ma" => ["mathematica"], "*.ma" => ["mathematica"],
"*.mak" => ["makefile"], "*.mak" => ["makefile"],
"*.man" => ["groff"], "*.man" => ["groff"],
"*.mao" => ["mako"], "*.mao" => ["mako"],
"*.markdown" => ["markdown"], "*.markdown" => ["markdown"],
"*.mc" => ["monkeyc", "mason"], "*.mc" => ["mason", "monkeyc"],
"*.mcfunction" => ["mcfunction"], "*.mcfunction" => ["mcfunction"],
"*.md" => ["markdown"], "*.md" => ["markdown"],
"*.metal" => ["metal"], "*.metal" => ["metal"],
@ -948,7 +954,7 @@ module Tartrazine
"*.php" => ["php"], "*.php" => ["php"],
"*.php[345]" => ["php"], "*.php[345]" => ["php"],
"*.pig" => ["pig"], "*.pig" => ["pig"],
"*.pl" => ["prolog", "perl"], "*.pl" => ["perl", "prolog"],
"*.plc" => ["plutus_core"], "*.plc" => ["plutus_core"],
"*.plot" => ["gnuplot"], "*.plot" => ["gnuplot"],
"*.plt" => ["gnuplot"], "*.plt" => ["gnuplot"],
@ -992,6 +998,7 @@ module Tartrazine
"*.reg" => ["reg"], "*.reg" => ["reg"],
"*.rego" => ["rego"], "*.rego" => ["rego"],
"*.rei" => ["reasonml"], "*.rei" => ["reasonml"],
"*.rest" => ["rst"],
"*.rex" => ["rexx"], "*.rex" => ["rexx"],
"*.rexx" => ["rexx"], "*.rexx" => ["rexx"],
"*.rkt" => ["racket"], "*.rkt" => ["racket"],
@ -1001,9 +1008,10 @@ module Tartrazine
"*.rs" => ["rust"], "*.rs" => ["rust"],
"*.rs.in" => ["rust"], "*.rs.in" => ["rust"],
"*.rss" => ["xml"], "*.rss" => ["xml"],
"*.rst" => ["rst"],
"*.rvt" => ["tcl"], "*.rvt" => ["tcl"],
"*.rx" => ["rexx"], "*.rx" => ["rexx"],
"*.s" => ["r", "armasm", "gas"], "*.s" => ["armasm", "gas", "r"],
"*.sage" => ["python"], "*.sage" => ["python"],
"*.sas" => ["sas"], "*.sas" => ["sas"],
"*.sass" => ["sass"], "*.sass" => ["sass"],
@ -1016,7 +1024,7 @@ module Tartrazine
"*.scope" => ["systemd"], "*.scope" => ["systemd"],
"*.scss" => ["scss"], "*.scss" => ["scss"],
"*.sed" => ["sed"], "*.sed" => ["sed"],
"*.service" => ["systemd", "ini"], "*.service" => ["ini", "systemd"],
"*.sh" => ["bash"], "*.sh" => ["bash"],
"*.sh-session" => ["bash_session"], "*.sh-session" => ["bash_session"],
"*.sieve" => ["sieve"], "*.sieve" => ["sieve"],
@ -1026,7 +1034,7 @@ module Tartrazine
"*.smali" => ["smali"], "*.smali" => ["smali"],
"*.sml" => ["standard_ml"], "*.sml" => ["standard_ml"],
"*.snobol" => ["snobol"], "*.snobol" => ["snobol"],
"*.socket" => ["systemd", "ini"], "*.socket" => ["ini", "systemd"],
"*.sol" => ["solidity"], "*.sol" => ["solidity"],
"*.sp" => ["sourcepawn"], "*.sp" => ["sourcepawn"],
"*.sparql" => ["sparql"], "*.sparql" => ["sparql"],
@ -1061,7 +1069,7 @@ module Tartrazine
"*.tpl" => ["smarty"], "*.tpl" => ["smarty"],
"*.tpp" => ["c++"], "*.tpp" => ["c++"],
"*.trig" => ["psl"], "*.trig" => ["psl"],
"*.ts" => ["typoscript", "typescript"], "*.ts" => ["typescript", "typoscript"],
"*.tst" => ["scilab"], "*.tst" => ["scilab"],
"*.tsx" => ["typescript"], "*.tsx" => ["typescript"],
"*.ttl" => ["turtle"], "*.ttl" => ["turtle"],
@ -1097,7 +1105,7 @@ module Tartrazine
"*.xml" => ["xml"], "*.xml" => ["xml"],
"*.xsd" => ["xml"], "*.xsd" => ["xml"],
"*.xsl" => ["xml"], "*.xsl" => ["xml"],
"*.xslt" => ["xml", "html"], "*.xslt" => ["html", "xml"],
"*.yaml" => ["yaml"], "*.yaml" => ["yaml"],
"*.yang" => ["yang"], "*.yang" => ["yang"],
"*.yml" => ["yaml"], "*.yml" => ["yaml"],

81
src/heuristics.cr Normal file
View File

@ -0,0 +1,81 @@
require "yaml"
# Use linguist's heuristics to disambiguate between languages
# This is *shamelessly* stolen from https://github.com/github-linguist/linguist
# and ported to Crystal. Deepest thanks to the authors of Linguist
# for licensing it liberally.
#
# Consider this code (c) 2017 GitHub, Inc. even if I wrote it.
module Linguist
class Heuristic
include YAML::Serializable
property disambiguations : Array(Disambiguation)
property named_patterns : Hash(String, String | Array(String))
# Run the heuristics on the given filename and content
def run(filename, content)
ext = File.extname filename
disambiguation = disambiguations.find do |item|
item.extensions.includes? ext
end
disambiguation.try &.run(content, named_patterns)
end
end
class Disambiguation
include YAML::Serializable
property extensions : Array(String)
property rules : Array(LangRule)
def run(content, named_patterns)
rules.each do |rule|
if rule.match(content, named_patterns)
return rule.language
end
end
nil
end
end
class LangRule
include YAML::Serializable
property pattern : (String | Array(String))?
property negative_pattern : (String | Array(String))?
property named_pattern : String?
property and : Array(LangRule)?
property language : String | Array(String)?
# ameba:disable Metrics/CyclomaticComplexity
def match(content, named_patterns)
# This rule matches without conditions
return true if !pattern && !negative_pattern && !named_pattern && !and
if pattern
p_arr = [] of String
p_arr << pattern.as(String) if pattern.is_a? String
p_arr = pattern.as(Array(String)) if pattern.is_a? Array(String)
return true if p_arr.any? { |pat| ::Regex.new(pat).matches?(content) }
end
if negative_pattern
p_arr = [] of String
p_arr << negative_pattern.as(String) if negative_pattern.is_a? String
p_arr = negative_pattern.as(Array(String)) if negative_pattern.is_a? Array(String)
return true if p_arr.none? { |pat| ::Regex.new(pat).matches?(content) }
end
if named_pattern
p_arr = [] of String
if named_patterns[named_pattern].is_a? String
p_arr << named_patterns[named_pattern].as(String)
else
p_arr = named_patterns[named_pattern].as(Array(String))
end
result = p_arr.any? { |pat| ::Regex.new(pat).matches?(content) }
end
if and
result = and.as(Array(LangRule)).all?(&.match(content, named_patterns))
end
result
end
end
end

View File

@ -9,13 +9,21 @@ module Tartrazine
# Get the lexer object for a language name # Get the lexer object for a language name
# FIXME: support mimetypes # FIXME: support mimetypes
def self.lexer(name : String? = nil, filename : String? = nil) : BaseLexer def self.lexer(name : String? = nil, filename : String? = nil, mimetype : String? = nil) : BaseLexer
return lexer_by_name(name) if name && name != "autodetect" return lexer_by_name(name) if name && name != "autodetect"
return lexer_by_filename(filename) if filename return lexer_by_filename(filename) if filename
return lexer_by_mimetype(mimetype) if mimetype
Lexer.from_xml(LexerFiles.get("/#{LEXERS_BY_NAME["plaintext"]}.xml").gets_to_end) Lexer.from_xml(LexerFiles.get("/#{LEXERS_BY_NAME["plaintext"]}.xml").gets_to_end)
end end
private def self.lexer_by_mimetype(mimetype : String) : BaseLexer
lexer_file_name = LEXERS_BY_MIMETYPE.fetch(mimetype, nil)
raise Exception.new("Unknown mimetype: #{mimetype}") if lexer_file_name.nil?
Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end)
end
private def self.lexer_by_name(name : String) : BaseLexer private def self.lexer_by_name(name : String) : BaseLexer
lexer_file_name = LEXERS_BY_NAME.fetch(name.downcase, nil) lexer_file_name = LEXERS_BY_NAME.fetch(name.downcase, nil)
return create_delegating_lexer(name) if lexer_file_name.nil? && name.includes? "+" return create_delegating_lexer(name) if lexer_file_name.nil? && name.includes? "+"
@ -36,12 +44,30 @@ module Tartrazine
when 1 when 1
lexer_file_name = candidates.first lexer_file_name = candidates.first
else else
raise Exception.new("Multiple lexers match the filename: #{candidates.to_a.join(", ")}") lexer_file_name = self.lexer_by_content(filename)
begin
return self.lexer(lexer_file_name)
rescue ex : Exception
raise Exception.new("Multiple lexers match the filename: #{candidates.to_a.join(", ")}, heuristics suggest #{lexer_file_name} but there is no matching lexer.")
end
end end
Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end) Lexer.from_xml(LexerFiles.get("/#{lexer_file_name}.xml").gets_to_end)
end end
private def self.lexer_by_content(fname : String) : String?
h = Linguist::Heuristic.from_yaml(LexerFiles.get("/heuristics.yml").gets_to_end)
result = h.run(fname, File.read(fname))
case result
when Nil
raise Exception.new "No lexer found for #{fname}"
when String
result.as(String)
when Array(String)
result.first
end
end
private def self.create_delegating_lexer(name : String) : BaseLexer private def self.create_delegating_lexer(name : String) : BaseLexer
language, root = name.split("+", 2) language, root = name.split("+", 2)
language_lexer = lexer(language) language_lexer = lexer(language)