Sync with linguist 7.12.1

This commit is contained in:
Lauris BH
2020-11-15 14:32:56 +02:00
parent 0fb4b8a768
commit 289ac3d9f0
28 changed files with 29756 additions and 28772 deletions

View File

@ -150,7 +150,7 @@ macOS and linux platforms. Windows support is planned under [src-d/enry#150](htt
## Divergences from Linguist
The `enry` library is based on the data from `github/linguist` version **v7.11.1**.
The `enry` library is based on the data from `github/linguist` version **v7.12.1**.
Parsing [linguist/samples](https://github.com/github/linguist/tree/master/samples) the following `enry` results are different from the Linguist:
@ -164,6 +164,8 @@ Parsing [linguist/samples](https://github.com/github/linguist/tree/master/sample
- [Heuristics for ".inc" extension](https://github.com/github/linguist/blob/f0e2d0d7f1ce600b2a5acccaef6b149c87d8b99c/lib/linguist/heuristics.yml#L222) in NASL could not be parsed, due to unsupported possessive quantifier in RE2 regexp engine.
- [Heuristics for ".as" extension](https://github.com/github/linguist/blob/223c00bb80eff04788e29010f98c5778993d2b2a/lib/linguist/heuristics.yml#L67) in ActionScript could not be parsed, due to unsupported positive lookahead in RE2 regexp engine.
- As of [Linguist v5.3.2](https://github.com/github/linguist/releases/tag/v5.3.2) it is using [flex-based scanner in C for tokenization](https://github.com/github/linguist/pull/3846). Enry still uses [extract_token](https://github.com/github/linguist/pull/3846/files#diff-d5179df0b71620e3fac4535cd1368d15L60) regex-based algorithm. See [#193](https://github.com/src-d/enry/issues/193).
- Bayesian classifier can't distinguish "SQL" from "PLpgSQL. See [#194](https://github.com/src-d/enry/issues/194).