Commit Graph

17 Commits

Author SHA1 Message Date
7e136bade8 test: don't export tokenizer fixtures
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
6c7b91cb91 doc: improve API doc on review feedback
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
ada6f15c93 address review feedback
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
7929933eb5 tokenizer: cleanup & attributions
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
8756fbdcb4 refactor to build tags
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
553399ed76 tokenizer: port flex-based C impl from linguist
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
5245079744 Apply suggestions from review.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:28:44 -08:00
dabb41527f Apply suggestions from review.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:28:42 -08:00
4027b494b3 Add documentation comments to package tokenizer.
Although this package is internal, it still exports an API and deserves some
comments. Serves in partial satisfaction of #195.

Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:18:52 -08:00
7d277b11de Copy the tokenizer input to avoid modifying the caller's copy.
Addresses #196. Several of the tokenizer's processing steps wind up editing the
source, and we don't want those changes to be observed by the caller, which may
use the source for other purposes afterward.

Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 10:12:33 -08:00
169060e1cd Add a test that tokenization does not modify the input.
At present this test fails, since the tokenizer replaces text in shared slices
of the input. A subsequent commit will fix that.

Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 10:03:09 -08:00
15bb13117f Refactor Oniguruma integration
Instead of use a command to change imports before build, using a build tag to generate the correct binary.

This will allow applications to compile enry using oniguruma with less troubles.

Signed-off-by: Antonio Jesus Navarro Perez <antnavper@gmail.com>
2018-08-29 18:01:13 +03:00
7923b86ebd Rename onigumura to oniguruma
This change names the dependency like its called. The link to the
package was correct, but all other references were renamed where I could
find time with git grep.

Signed-off-by: Zeger-Jan van de Weg <git@zjvandeweg.nl>
2018-03-28 21:34:54 +02:00
a66154b7eb Make tokenizer regexps work under rubex
Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
2017-10-26 17:04:31 +02:00
1fc8cf7a5d changes to improve detection accuracy 2017-06-15 10:07:22 +02:00
5b304524d1 Rearranged code 2017-06-02 09:33:55 +02:00
fcf30a07c8 Added frequencies.go generation 2017-05-29 12:19:37 +02:00