Commit Graph

109 Commits

Author SHA1 Message Date
Luke Francl
4bde6c61a1 remove obsolete TODO 2021-10-11 14:06:29 -07:00
Luke Francl
b248b21349 Expose LanguageInfo with all Linguist data
As discussed in https://github.com/go-enry/go-enry/issues/54, this provides an
API for accessing a LanguageInfo struct which is populated with all the data
from the Linguist YAML source file. Functions are provided to access the
LanguageInfo by name or ID.

The other top-level functions like GetLanguageExtensions, GetLanguageGroup, etc.
could in principle be implemented using this structure, which would simplify the
code generation. But that would be a big change so I didn't do any of that.
Perhaps in the next major version something like that would make sense.
2021-10-11 13:32:29 -07:00
Lauris BH
0affa3ccca Update to Linguist v7.16.1 2021-09-25 23:57:50 +03:00
Luke Francl
dfb8041dcc Update generated code for Linguist 7.14.0 2021-04-26 09:36:25 -07:00
Luke Francl
eb043e80a8 Add GetLanguageID function
The Linguist-defined language IDs are important to our use case because they are
used as database identifiers. This adds a new generator to extract the language
IDs into a map and uses that to implement GetLanguageID.

Because one language has the ID 0, there is no way to tell if a language name is
found or not. If desired, we could add this by returning (string, bool) from
GetLanguageID. But none of the other functions that take language names do this,
so I didn't want to introduce it here.
2021-04-13 11:49:21 -07:00
Lauris BH
c40b34c351 Sync with Liguist v7.13.0 2021-03-07 18:02:04 +02:00
Lauris BH
497e2f85d3 Sync with github/linguist version v7.12.2 2021-01-17 14:10:38 +02:00
Lauris BH
289ac3d9f0 Sync with linguist 7.12.1 2020-11-15 14:32:56 +02:00
Lauris BH
bc76dd38b0 sync to the latest github/linguist v7.11.1 2020-10-12 12:32:48 +03:00
Lauris BH
7c562a6c34 sync to the latest github/linguist v7.11.0 2020-09-17 10:34:41 +03:00
Máximo Cuadros
29bc0a181b
data: replace substring package with regex package 2020-04-15 17:27:48 +02:00
Lauris BH
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
Lauris BH
9030d3671b sync to the latest github/linguist v7.9.0 2020-03-30 01:25:57 +03:00
Alexander Bezzubov
3ea961e5ab
generator: change-detector tests on EOL-dependant sample
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 23:23:56 +02:00
Alexander Bezzubov
9be0211f04
generator: skip symlinks on *nix and win
As Git on win does not support symlinks [1], we have to hard-code
the paths to fils under ./samples/ in Linguist codebase that are
known to be a symlink.

 1. https://github.com/git-for-windows/git/wiki/Symbolic-Links

TestPlan:
 - go test ./internal/code-generator/generator -run Test_GeneratorTestSuite

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 23:23:56 +02:00
Alexander Bezzubov
78eee0cf7e
generator: flag to debug building of bayesian classifier
It seems that reading ./samples/ from Linguist consumes
a different number of files from filesystem on different OSes.

This change adds ENRY_DEBUG env var to print some debug output
about calculations of token stats from  samples.

TestPlan:
 - ENRY_DEBUG=1 go test -v ./internal/code-generator/generator \
	-run Test_GeneratorTestSuite -testify.m TestGenerationFiles

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 19:35:49 +02:00
Alexander
b78e4423f0
generator: drop platform-specific separator
Co-Authored-By: Lauris BH <lauris@nix.lv>
2020-03-25 19:27:46 +01:00
Alexander Bezzubov
3a5f4b2db1
generator: mode debug output in case of failure
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-25 14:20:26 +01:00
Alexander Bezzubov
b0f94ad693
generator: CLI tool fix to support win paths
On Win `make code-generate` produces unreasonable
Bayesian classifier weights from Linguist samples
silently, failing only the final classification tests.

TestPlan:
 - go test ./internal/code-generator/... \
    -run Test_GeneratorTestSuite -testify.m TestGenerationFiles

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-25 14:00:24 +01:00
Alexander Bezzubov
78d8f43a88
tokenizer: hide flex-based impl, avoid build failures on win
TestPlan:
 - go test -run TestTokenize ./internal/tokenizer
 - go test -tags flex -run TestTokenize ./internal/tokenizer
   (shold fail as default fixtures are from regex-based tokenizer)
2020-03-19 19:58:48 +01:00
Alexander Bezzubov
1ab8148c10
test: fix platform-depenent paths in tests
Test Plan:
 - go test ./internal/code-generator/... -run Test_GeneratorTestSuite -testify.m TestGenerationFiles
2020-03-19 19:47:22 +01:00
Alexander Bezzubov
e32a70a784
tokenizer: fix a bug and regenerate the code \w latest Go
See https://github.com/bzz/enry/pull/4 for details.

Test Plan:
 - go test ./...
2020-03-19 19:08:21 +01:00
Máximo Cuadros
84efad7693
*: module rename to go-enry/go-enry/v4 2020-03-19 17:31:29 +01:00
Alexander Bezzubov
bc5e031cee Drop src-d org ref except for issues
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
Lauris Bukšis-Haberkorns
4e3e15e80d
Sync to linguist v7.5.1
Signed-off-by: Lauris BH <lauris@nix.lv>
2019-08-06 17:18:01 +03:00
Lauris Bukšis-Haberkorns
2f5526ddba
Improve detection of unsupported regexp syntax
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-08-05 22:24:03 +03:00
Lauris Bukšis-Haberkorns
25b29ebdc4 Implement getting color code for languages
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-07-19 23:59:46 +03:00
Alexander Bezzubov
f3ceaa6330
token: refactor & simplify test fixtures
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-05-08 22:17:32 +02:00
Alexander Bezzubov
a724a2f841
token: test case for regexp + non-valid UTF8
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-05-07 13:46:36 +02:00
Alexander Bezzubov
8bdc830833
token: new test case with Unicode replacement
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-17 19:28:06 +02:00
Alexander Bezzubov
278eaf1c22
tokenizer: move flex-based to modules
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-17 13:54:34 +02:00
Alexander
ae43e1a91f
Merge pull request #219 from bzz/go-mod
Introduce Go modules
2019-04-17 13:39:55 +02:00
Alexander Bezzubov
7e136bade8
test: don't export tokenizer fixtures
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
6c7b91cb91
doc: improve API doc on review feedback
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
ada6f15c93
address review feedback
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
7929933eb5
tokenizer: cleanup & attributions
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
8756fbdcb4
refactor to build tags
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
553399ed76
tokenizer: port flex-based C impl from linguist
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
6a5f37e9e2
modules: prepare for v2 release
- update go.mod \w v2
 - update all import paths

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:28:12 +02:00
Alexander Bezzubov
20c6d2845a
build: gopkg.in -> github.com imports
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-12 11:49:16 +02:00
Alexander Bezzubov
85d5906b2b
address review feedback - tixing a fypo
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-11 21:36:29 +02:00
Alexander Bezzubov
41478262f3
fix verb mismatch in a format string
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-11 15:28:49 +02:00
Alexander Bezzubov
bdb5603f28
Address code review feedback
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-08 16:07:10 +02:00
Alexander Bezzubov
b2b61c2a8c
gen: refactoring, renaming vars for readability
This does not change the logic of the generatro
but only renames/moves some vars for readability

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-03 15:40:23 +02:00
M. J. Fromberger
3a6d42b39a
doc: fix spelling
Co-Authored-By: bzz <bzz@users.noreply.github.com>
2019-02-21 09:33:17 +01:00
Alexander Bezzubov
baefa18475
gen: compare generated code to gold ignoring whitespaces
Reason is that gofmt can change between versions e.g
see https://go-review.googlesource.com/c/go/+/122295/
and this would avoid breaking tests and edit wars

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-02-20 23:22:02 +01:00
Alexander Bezzubov
c8e0f75132
test: make gen test output less verbose
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-02-20 23:22:02 +01:00
Alexander
3499750785
Sync to linguist 7.2.0: heuristics.yml support (#189)
Sync \w Github Linguist v7.2.0

Includes new way of handling `heuristics.yml` and
all `./data/*` re-generated using Github Linguist [v7.2.0](https://github.com/github/linguist/releases/tag/v7.2.0)
release tag.

 - many new languages
 - better vendoring detection
 - update doc on update&known issues.
2019-02-14 12:47:45 +01:00
M. J. Fromberger
5245079744 Apply suggestions from review.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:28:44 -08:00
M. J. Fromberger
dabb41527f Apply suggestions from review.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:28:42 -08:00