Máximo Cuadros
84efad7693
*: module rename to go-enry/go-enry/v4
2020-03-19 17:31:29 +01:00
Alexander Bezzubov
bc5e031cee
Drop src-d org ref except for issues
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
Alexander Bezzubov
f3ceaa6330
token: refactor & simplify test fixtures
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-05-08 22:17:32 +02:00
Alexander Bezzubov
a724a2f841
token: test case for regexp + non-valid UTF8
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-05-07 13:46:36 +02:00
Alexander Bezzubov
8bdc830833
token: new test case with Unicode replacement
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-17 19:28:06 +02:00
Alexander Bezzubov
278eaf1c22
tokenizer: move flex-based to modules
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-17 13:54:34 +02:00
Alexander
ae43e1a91f
Merge pull request #219 from bzz/go-mod
...
Introduce Go modules
2019-04-17 13:39:55 +02:00
Alexander Bezzubov
7e136bade8
test: don't export tokenizer fixtures
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
6c7b91cb91
doc: improve API doc on review feedback
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
ada6f15c93
address review feedback
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-16 19:38:48 +02:00
Alexander Bezzubov
7929933eb5
tokenizer: cleanup & attributions
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
8756fbdcb4
refactor to build tags
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
553399ed76
tokenizer: port flex-based C impl from linguist
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:38:16 +02:00
Alexander Bezzubov
6a5f37e9e2
modules: prepare for v2 release
...
- update go.mod \w v2
- update all import paths
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:28:12 +02:00
Alexander Bezzubov
20c6d2845a
build: gopkg.in -> github.com imports
...
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-12 11:49:16 +02:00
M. J. Fromberger
5245079744
Apply suggestions from review.
...
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:28:44 -08:00
M. J. Fromberger
dabb41527f
Apply suggestions from review.
...
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:28:42 -08:00
M. J. Fromberger
4027b494b3
Add documentation comments to package tokenizer.
...
Although this package is internal, it still exports an API and deserves some
comments. Serves in partial satisfaction of #195 .
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 11:18:52 -08:00
M. J. Fromberger
7d277b11de
Copy the tokenizer input to avoid modifying the caller's copy.
...
Addresses #196 . Several of the tokenizer's processing steps wind up editing the
source, and we don't want those changes to be observed by the caller, which may
use the source for other purposes afterward.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 10:12:33 -08:00
M. J. Fromberger
169060e1cd
Add a test that tokenization does not modify the input.
...
At present this test fails, since the tokenizer replaces text in shared slices
of the input. A subsequent commit will fix that.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-01-29 10:03:09 -08:00
Antonio Jesus Navarro Perez
15bb13117f
Refactor Oniguruma integration
...
Instead of use a command to change imports before build, using a build tag to generate the correct binary.
This will allow applications to compile enry using oniguruma with less troubles.
Signed-off-by: Antonio Jesus Navarro Perez <antnavper@gmail.com>
2018-08-29 18:01:13 +03:00
Zeger-Jan van de Weg
7923b86ebd
Rename onigumura to oniguruma
...
This change names the dependency like its called. The link to the
package was correct, but all other references were renamed where I could
find time with git grep.
Signed-off-by: Zeger-Jan van de Weg <git@zjvandeweg.nl>
2018-03-28 21:34:54 +02:00
Vadim Markovtsev
a66154b7eb
Make tokenizer regexps work under rubex
...
Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
2017-10-26 17:04:31 +02:00
Manuel Carmona
1fc8cf7a5d
changes to improve detection accuracy
2017-06-15 10:07:22 +02:00
Manuel Carmona
5b304524d1
Rearranged code
2017-06-02 09:33:55 +02:00
Manuel Carmona
fcf30a07c8
Added frequencies.go generation
2017-05-29 12:19:37 +02:00