Alexander Bezzubov
78d8f43a88
tokenizer: hide flex-based impl, avoid build failures on win
...
TestPlan:
- go test -run TestTokenize ./internal/tokenizer
- go test -tags flex -run TestTokenize ./internal/tokenizer
(shold fail as default fixtures are from regex-based tokenizer)
2020-03-19 19:58:48 +01:00
Alexander Bezzubov
e32a70a784
tokenizer: fix a bug and regenerate the code \w latest Go
...
See https://github.com/bzz/enry/pull/4 for details.
Test Plan:
- go test ./...
2020-03-19 19:08:21 +01:00
Alexander Bezzubov
f3ceaa6330
token: refactor & simplify test fixtures
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-05-08 22:17:32 +02:00
Alexander Bezzubov
a724a2f841
token: test case for regexp + non-valid UTF8
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-05-07 13:46:36 +02:00
Alexander Bezzubov
8bdc830833
token: new test case with Unicode replacement
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-17 19:28:06 +02:00
Alexander Bezzubov
7e136bade8
test: don't export tokenizer fixtures
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
ada6f15c93
address review feedback
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
7929933eb5
tokenizer: cleanup & attributions
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
8756fbdcb4
refactor to build tags
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
553399ed76
tokenizer: port flex-based C impl from linguist
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
M. J. Fromberger
169060e1cd
Add a test that tokenization does not modify the input.
...
At present this test fails, since the tokenizer replaces text in shared slices
of the input. A subsequent commit will fix that.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 10:03:09 -08:00
Manuel Carmona
1fc8cf7a5d
changes to improve detection accuracy
2017-06-15 10:07:22 +02:00
Manuel Carmona
fcf30a07c8
Added frequencies.go generation
2017-05-29 12:19:37 +02:00