Alexander Bezzubov
ada6f15c93
address review feedback
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
7929933eb5
tokenizer: cleanup & attributions
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
8756fbdcb4
refactor to build tags
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
553399ed76
tokenizer: port flex-based C impl from linguist
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
M. J. Fromberger
169060e1cd
Add a test that tokenization does not modify the input.
...
At present this test fails, since the tokenizer replaces text in shared slices
of the input. A subsequent commit will fix that.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 10:03:09 -08:00
Manuel Carmona
1fc8cf7a5d
changes to improve detection accuracy
2017-06-15 10:07:22 +02:00
Manuel Carmona
fcf30a07c8
Added frequencies.go generation
2017-05-29 12:19:37 +02:00