Alexander Bezzubov
b0f94ad693
generator: CLI tool fix to support win paths
...
On Win `make code-generate` produces unreasonable
Bayesian classifier weights from Linguist samples
silently, failing only the final classification tests.
TestPlan:
- go test ./internal/code-generator/... \
-run Test_GeneratorTestSuite -testify.m TestGenerationFiles
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-25 14:00:24 +01:00
Alexander Bezzubov
78d8f43a88
tokenizer: hide flex-based impl, avoid build failures on win
...
TestPlan:
- go test -run TestTokenize ./internal/tokenizer
- go test -tags flex -run TestTokenize ./internal/tokenizer
(shold fail as default fixtures are from regex-based tokenizer)
2020-03-19 19:58:48 +01:00
Alexander Bezzubov
1ab8148c10
test: fix platform-depenent paths in tests
...
Test Plan:
- go test ./internal/code-generator/... -run Test_GeneratorTestSuite -testify.m TestGenerationFiles
2020-03-19 19:47:22 +01:00
Alexander Bezzubov
e32a70a784
tokenizer: fix a bug and regenerate the code \w latest Go
...
See https://github.com/bzz/enry/pull/4 for details.
Test Plan:
- go test ./...
2020-03-19 19:08:21 +01:00
Máximo Cuadros
84efad7693
*: module rename to go-enry/go-enry/v4
2020-03-19 17:31:29 +01:00
Alexander Bezzubov
bc5e031cee
Drop src-d org ref except for issues
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
Lauris Bukšis-Haberkorns
4e3e15e80d
Sync to linguist v7.5.1
...
Signed-off-by: Lauris BH <lauris@nix.lv>
2019-08-06 17:18:01 +03:00
Lauris Bukšis-Haberkorns
2f5526ddba
Improve detection of unsupported regexp syntax
...
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-08-05 22:24:03 +03:00
Lauris Bukšis-Haberkorns
25b29ebdc4
Implement getting color code for languages
...
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-07-19 23:59:46 +03:00
Alexander Bezzubov
f3ceaa6330
token: refactor & simplify test fixtures
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-05-08 22:17:32 +02:00
Alexander Bezzubov
a724a2f841
token: test case for regexp + non-valid UTF8
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-05-07 13:46:36 +02:00
Alexander Bezzubov
8bdc830833
token: new test case with Unicode replacement
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-17 19:28:06 +02:00
Alexander Bezzubov
278eaf1c22
tokenizer: move flex-based to modules
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-17 13:54:34 +02:00
Alexander
ae43e1a91f
Merge pull request #219 from bzz/go-mod
...
Introduce Go modules
2019-04-17 13:39:55 +02:00
Alexander Bezzubov
7e136bade8
test: don't export tokenizer fixtures
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
6c7b91cb91
doc: improve API doc on review feedback
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
ada6f15c93
address review feedback
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
7929933eb5
tokenizer: cleanup & attributions
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
8756fbdcb4
refactor to build tags
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
553399ed76
tokenizer: port flex-based C impl from linguist
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
6a5f37e9e2
modules: prepare for v2 release
...
- update go.mod \w v2
- update all import paths
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:28:12 +02:00
Alexander Bezzubov
20c6d2845a
build: gopkg.in -> github.com imports
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-12 11:49:16 +02:00
Alexander Bezzubov
85d5906b2b
address review feedback - tixing a fypo
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-11 21:36:29 +02:00
Alexander Bezzubov
41478262f3
fix verb mismatch in a format string
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-11 15:28:49 +02:00
Alexander Bezzubov
bdb5603f28
Address code review feedback
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-08 16:07:10 +02:00
Alexander Bezzubov
b2b61c2a8c
gen: refactoring, renaming vars for readability
...
This does not change the logic of the generatro
but only renames/moves some vars for readability
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-03 15:40:23 +02:00
M. J. Fromberger
3a6d42b39a
doc: fix spelling
...
Co-Authored-By: bzz <bzz@users.noreply.github.com>
2019-02-21 09:33:17 +01:00
Alexander Bezzubov
baefa18475
gen: compare generated code to gold ignoring whitespaces
...
Reason is that gofmt can change between versions e.g
see https://go-review.googlesource.com/c/go/+/122295/
and this would avoid breaking tests and edit wars
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-02-20 23:22:02 +01:00
Alexander Bezzubov
c8e0f75132
test: make gen test output less verbose
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-02-20 23:22:02 +01:00
Alexander
3499750785
Sync to linguist 7.2.0: heuristics.yml support ( #189 )
...
Sync \w Github Linguist v7.2.0
Includes new way of handling `heuristics.yml` and
all `./data/*` re-generated using Github Linguist [v7.2.0](https://github.com/github/linguist/releases/tag/v7.2.0 )
release tag.
- many new languages
- better vendoring detection
- update doc on update&known issues.
2019-02-14 12:47:45 +01:00
M. J. Fromberger
5245079744
Apply suggestions from review.
...
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:28:44 -08:00
M. J. Fromberger
dabb41527f
Apply suggestions from review.
...
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:28:42 -08:00
M. J. Fromberger
4027b494b3
Add documentation comments to package tokenizer.
...
Although this package is internal, it still exports an API and deserves some
comments. Serves in partial satisfaction of #195 .
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:18:52 -08:00
M. J. Fromberger
7d277b11de
Copy the tokenizer input to avoid modifying the caller's copy.
...
Addresses #196 . Several of the tokenizer's processing steps wind up editing the
source, and we don't want those changes to be observed by the caller, which may
use the source for other purposes afterward.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 10:12:33 -08:00
M. J. Fromberger
169060e1cd
Add a test that tokenization does not modify the input.
...
At present this test fails, since the tokenizer replaces text in shared slices
of the input. A subsequent commit will fix that.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 10:03:09 -08:00
Antonio Jesus Navarro Perez
15bb13117f
Refactor Oniguruma integration
...
Instead of use a command to change imports before build, using a build tag to generate the correct binary.
This will allow applications to compile enry using oniguruma with less troubles.
Signed-off-by: Antonio Jesus Navarro Perez <antnavper@gmail.com>
2018-08-29 18:01:13 +03:00
Denys Smirnov
7eafe024af
write a canonical header for machine-generated files
...
Signed-off-by: Denys Smirnov <denys@sourced.tech>
2018-04-30 12:57:39 +03:00
Zeger-Jan van de Weg
7923b86ebd
Rename onigumura to oniguruma
...
This change names the dependency like its called. The link to the
package was correct, but all other references were renamed where I could
find time with git grep.
Signed-off-by: Zeger-Jan van de Weg <git@zjvandeweg.nl>
2018-03-28 21:34:54 +02:00
Alfredo Beaumont
ce5adee8ab
Merge pull request #113 from vmarkovtsev/master
...
Use rubex for faster regular expressions
2017-10-26 18:03:43 +02:00
Vadim Markovtsev
a66154b7eb
Make tokenizer regexps work under rubex
...
Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
2017-10-26 17:04:31 +02:00
Vadim Markovtsev
09d6add804
Fix review
...
Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
2017-10-26 17:02:58 +02:00
Vadim Markovtsev
c97a180da5
Fix review suggestions
...
Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
2017-10-26 15:51:02 +02:00
Vadim Markovtsev
250519bb51
Add the external test linguist dir from env var
...
This allows to use a cached directory with linguist instead of cloning and speeds up the tests by -10s on my local machine.
Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
2017-09-28 23:51:38 +02:00
David Paz
52d7ccd6cf
Updated mimeType.gold and regenerated mimeType.go
2017-07-19 10:18:18 +02:00
David Paz
b2fe3f69ce
Added mymeType.gold
2017-07-18 12:47:19 +02:00
David Paz
ea819f58c2
Renamed mime to mimeType
2017-07-18 12:46:29 +02:00
David Paz
632422db69
Added pending untracked files
2017-07-18 12:46:29 +02:00
David Paz
125c802582
Now generates mime file
2017-07-18 12:46:29 +02:00
Manuel Carmona
2045abfa41
use of gopkg.in/toqueteos/substring.v1 in content.go to improve GetLanguagesByContent performance
2017-07-13 08:21:09 +02:00
David Paz
3f2248084e
Moved commit.go to data directory
2017-06-28 11:22:42 +02:00
David Paz
7e827e47ef
moved generated data to data subpackage
2017-06-28 08:31:11 +02:00
Manuel Carmona
b7d4be5fdd
commit against tests run is fixed
...
renamed tmpLinguist to repoLinguist and SimpleLinguistTestSuite to EnryTestSuit in common_test.go
changed receiver's name for TestSuites to 's'
fixed comments
2017-06-26 15:35:53 +02:00
David Paz
17a6f3dc89
Changed commit ref to .git/HEAD
2017-06-19 11:20:24 +02:00
Manuel Carmona
beda5b73e7
changed signatures for strategies
2017-06-15 10:07:23 +02:00
Manuel Carmona
1fc8cf7a5d
changes to improve detection accuracy
2017-06-15 10:07:22 +02:00
Manuel Carmona
ba53e10c7b
renamed package and cli to enry
2017-06-13 14:18:23 +02:00
Máximo Cuadros
3a470f617c
project renamed to enry
2017-06-08 09:27:27 +02:00
Manuel Carmona
0d5dff1979
changes in the API, ready to version 2
2017-06-06 11:30:23 +02:00
Manuel Carmona
5b304524d1
Rearranged code
2017-06-02 09:33:55 +02:00
Manuel Carmona
f8b8f7f5c4
Added classifier to the sequence of strategies
2017-05-30 09:07:58 +02:00
Manuel Carmona
fcf30a07c8
Added frequencies.go generation
2017-05-29 12:19:37 +02:00
Manuel Carmona
45314b4903
Added all the necessary to do GetLanguageByAlias functionality works
2017-05-08 11:34:00 +02:00
Manuel Carmona
6f3ad6d30d
separated GetLanguageType and languagesType map in different files due to a better generation files
2017-05-03 12:17:54 +02:00
Manuel Carmona
cbf44205e0
fixed GetLanguageType to return Unknown when language is not found in languagesType map
2017-05-03 10:48:28 +02:00
Manuel Carmona
664afe48d4
fixed GetLanguageByContent returned value when there is not a function matcher for the extension
2017-05-03 10:37:34 +02:00
Manuel Carmona
28dc452853
added some corner cases to content.go generation tests
2017-04-27 17:32:42 +02:00
Manuel Carmona
63d4d9bf24
removed templates from test_files directory to use templates from assets directory in tests
2017-04-27 17:32:42 +02:00
Manuel Carmona
f63a25d794
all related to extension strategy renamed to reference it
2017-04-27 17:32:42 +02:00
Manuel Carmona
645bdd7331
added filenames_map.go generation
...
languagesByFilename now is a map[string]string
2017-04-27 17:30:57 +02:00
Manuel Carmona
f45efec5fb
GetLanguageType and Type constants have comments now
...
type.go comments generated from type.go.tmpl
2017-04-27 16:40:28 +02:00
Manuel Carmona
c6d74bca66
added shebang functionality
...
fixed autogenerated comment
changed constant types names
GetLanguageByShebang doesn't print errors
languageInfo struct change to have only necessary fields
GetLanguageByShebang has a comment now
2017-04-27 16:40:08 +02:00
Manuel Carmona
2644a7c8da
added interpreters_map.go generation
...
fixed Interpreters comment
2017-04-27 16:39:54 +02:00
Manuel Carmona
6ddbb79af0
changed generator_test.go to use only TestFromFile
...
modified *.test.yml to contain only necessary information
fixed white spaces
remove duplicated file languages.test.tmpl
2017-04-27 16:39:36 +02:00
Manuel Carmona
1bf555bc4c
changed getAlphabeticalOrderedKeys to use sort.Strings
2017-04-27 16:35:23 +02:00
Manuel Carmona
c08b85120d
created 'type Type int' for type.go generation
2017-04-17 12:08:54 +02:00
Manuel Carmona
b277944b2a
fixed constant iotas
2017-04-17 12:00:50 +02:00
Manuel Carmona
25e835f5fd
slice of languages arranged in alphabetical order
2017-04-17 11:55:29 +02:00
Manuel Carmona
9a9968dca0
added comments to constants
2017-04-17 11:55:29 +02:00
Manuel Carmona
ef39403555
added type.go generation
2017-04-17 11:55:29 +02:00
Manuel Carmona
5d61ca93d8
changed langs.go to unmarshal on a languageInfo struct
2017-04-17 11:55:29 +02:00
Manuel Carmona
ca3ae587f3
added documentation_matchers.go generation
2017-04-17 11:52:11 +02:00
Manuel Carmona
65996506ae
fixed Vendor function's comment
2017-04-10 10:32:54 +02:00
Manuel Carmona
30772e4ea0
changed executeVendorTemplate's paramaters names
2017-04-10 10:27:44 +02:00
Manuel Carmona
f175c2d20b
changed Vendor function's comment and parameters names
2017-04-10 10:25:52 +02:00
Manuel Carmona
eaf473743b
changed function name executeUtilsTemplate to executeVendorTemplate
2017-04-10 10:20:38 +02:00
Manuel Carmona
e998b0ff2e
regexp for vendored files and directories are generated in vendor_matchers.go
2017-04-07 09:27:40 +02:00
Manuel Carmona
13e7886a02
Added utils.go generation
2017-04-06 17:31:17 +02:00
Máximo Cuadros
3a2a62baad
move srcd.works to gopkg.in
2017-04-05 18:26:58 +02:00
Manuel Carmona
03c71a9b93
move content.go generation to internal
2017-04-05 18:15:27 +02:00
Manuel Carmona
ba22a0a243
content generator
2017-04-05 18:09:14 +02:00
Manuel Carmona
665b7475e3
code generation move to internal/code-generator
2017-04-05 17:49:58 +02:00