tartrazine

mirror of https://github.com/ralsina/tartrazine.git synced 2025-08-24 05:52:08 +00:00

Author	SHA1	Message	Date
Alexander Bezzubov	78eee0cf7e	generator: flag to debug building of bayesian classifier It seems that reading ./samples/ from Linguist consumes a different number of files from filesystem on different OSes. This change adds ENRY_DEBUG env var to print some debug output about calculations of token stats from samples. TestPlan: - ENRY_DEBUG=1 go test -v ./internal/code-generator/generator \ -run Test_GeneratorTestSuite -testify.m TestGenerationFiles Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2020-03-29 19:35:49 +02:00
Alexander	b78e4423f0	generator: drop platform-specific separator Co-Authored-By: Lauris BH <lauris@nix.lv>	2020-03-25 19:27:46 +01:00
Alexander Bezzubov	3a5f4b2db1	generator: mode debug output in case of failure Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2020-03-25 14:20:26 +01:00
Alexander Bezzubov	b0f94ad693	generator: CLI tool fix to support win paths On Win `make code-generate` produces unreasonable Bayesian classifier weights from Linguist samples silently, failing only the final classification tests. TestPlan: - go test ./internal/code-generator/... \ -run Test_GeneratorTestSuite -testify.m TestGenerationFiles Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2020-03-25 14:00:24 +01:00
Alexander Bezzubov	78d8f43a88	tokenizer: hide flex-based impl, avoid build failures on win TestPlan: - go test -run TestTokenize ./internal/tokenizer - go test -tags flex -run TestTokenize ./internal/tokenizer (shold fail as default fixtures are from regex-based tokenizer)	2020-03-19 19:58:48 +01:00
Alexander Bezzubov	1ab8148c10	test: fix platform-depenent paths in tests Test Plan: - go test ./internal/code-generator/... -run Test_GeneratorTestSuite -testify.m TestGenerationFiles	2020-03-19 19:47:22 +01:00
Alexander Bezzubov	e32a70a784	tokenizer: fix a bug and regenerate the code \w latest Go See https://github.com/bzz/enry/pull/4 for details. Test Plan: - go test ./...	2020-03-19 19:08:21 +01:00
Máximo Cuadros	84efad7693	*: module rename to go-enry/go-enry/v4	2020-03-19 17:31:29 +01:00
Alexander Bezzubov	bc5e031cee	Drop src-d org ref except for issues Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2020-03-19 14:04:36 +01:00
Lauris Bukšis-Haberkorns	4e3e15e80d	Sync to linguist v7.5.1 Signed-off-by: Lauris BH <lauris@nix.lv>	2019-08-06 17:18:01 +03:00
Lauris Bukšis-Haberkorns	2f5526ddba	Improve detection of unsupported regexp syntax Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>	2019-08-05 22:24:03 +03:00
Lauris Bukšis-Haberkorns	25b29ebdc4	Implement getting color code for languages Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>	2019-07-19 23:59:46 +03:00
Alexander Bezzubov	f3ceaa6330	token: refactor & simplify test fixtures Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-05-08 22:17:32 +02:00
Alexander Bezzubov	a724a2f841	token: test case for regexp + non-valid UTF8 Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-05-07 13:46:36 +02:00
Alexander Bezzubov	8bdc830833	token: new test case with Unicode replacement Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-17 19:28:06 +02:00
Alexander Bezzubov	278eaf1c22	tokenizer: move flex-based to modules Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-17 13:54:34 +02:00
Alexander	ae43e1a91f	Merge pull request #219 from bzz/go-mod Introduce Go modules	2019-04-17 13:39:55 +02:00
Alexander Bezzubov	7e136bade8	test: don't export tokenizer fixtures Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-16 19:38:48 +02:00
Alexander Bezzubov	6c7b91cb91	doc: improve API doc on review feedback Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-16 19:38:48 +02:00
Alexander Bezzubov	ada6f15c93	address review feedback Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-16 19:38:48 +02:00
Alexander Bezzubov	7929933eb5	tokenizer: cleanup & attributions Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-14 21:38:16 +02:00
Alexander Bezzubov	8756fbdcb4	refactor to build tags Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-14 21:38:16 +02:00
Alexander Bezzubov	553399ed76	tokenizer: port flex-based C impl from linguist Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-14 21:38:16 +02:00
Alexander Bezzubov	6a5f37e9e2	modules: prepare for v2 release - update go.mod \w v2 - update all import paths Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-14 21:28:12 +02:00
Alexander Bezzubov	20c6d2845a	build: gopkg.in -> github.com imports Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-12 11:49:16 +02:00
Alexander Bezzubov	85d5906b2b	address review feedback - tixing a fypo Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-11 21:36:29 +02:00
Alexander Bezzubov	41478262f3	fix verb mismatch in a format string Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-11 15:28:49 +02:00
Alexander Bezzubov	bdb5603f28	Address code review feedback Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-08 16:07:10 +02:00
Alexander Bezzubov	b2b61c2a8c	gen: refactoring, renaming vars for readability This does not change the logic of the generatro but only renames/moves some vars for readability Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-04-03 15:40:23 +02:00
M. J. Fromberger	3a6d42b39a	doc: fix spelling Co-Authored-By: bzz <bzz@users.noreply.github.com>	2019-02-21 09:33:17 +01:00
Alexander Bezzubov	baefa18475	gen: compare generated code to gold ignoring whitespaces Reason is that gofmt can change between versions e.g see https://go-review.googlesource.com/c/go/+/122295/ and this would avoid breaking tests and edit wars Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-02-20 23:22:02 +01:00
Alexander Bezzubov	c8e0f75132	test: make gen test output less verbose Signed-off-by: Alexander Bezzubov <bzz@apache.org>	2019-02-20 23:22:02 +01:00
Alexander	3499750785	Sync to linguist 7.2.0: heuristics.yml support (#189 ) Sync \w Github Linguist v7.2.0 Includes new way of handling `heuristics.yml` and all `./data/*` re-generated using Github Linguist [v7.2.0](https://github.com/github/linguist/releases/tag/v7.2.0) release tag. - many new languages - better vendoring detection - update doc on update&known issues.	2019-02-14 12:47:45 +01:00
M. J. Fromberger	5245079744	Apply suggestions from review. Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>	2019-01-29 11:28:44 -08:00
M. J. Fromberger	dabb41527f	Apply suggestions from review. Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>	2019-01-29 11:28:42 -08:00
M. J. Fromberger	4027b494b3	Add documentation comments to package tokenizer. Although this package is internal, it still exports an API and deserves some comments. Serves in partial satisfaction of #195. Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>	2019-01-29 11:18:52 -08:00
M. J. Fromberger	7d277b11de	Copy the tokenizer input to avoid modifying the caller's copy. Addresses #196. Several of the tokenizer's processing steps wind up editing the source, and we don't want those changes to be observed by the caller, which may use the source for other purposes afterward. Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>	2019-01-29 10:12:33 -08:00
M. J. Fromberger	169060e1cd	Add a test that tokenization does not modify the input. At present this test fails, since the tokenizer replaces text in shared slices of the input. A subsequent commit will fix that. Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>	2019-01-29 10:03:09 -08:00
Antonio Jesus Navarro Perez	15bb13117f	Refactor Oniguruma integration Instead of use a command to change imports before build, using a build tag to generate the correct binary. This will allow applications to compile enry using oniguruma with less troubles. Signed-off-by: Antonio Jesus Navarro Perez <antnavper@gmail.com>	2018-08-29 18:01:13 +03:00
Denys Smirnov	7eafe024af	write a canonical header for machine-generated files Signed-off-by: Denys Smirnov <denys@sourced.tech>	2018-04-30 12:57:39 +03:00
Zeger-Jan van de Weg	7923b86ebd	Rename onigumura to oniguruma This change names the dependency like its called. The link to the package was correct, but all other references were renamed where I could find time with git grep. Signed-off-by: Zeger-Jan van de Weg <git@zjvandeweg.nl>	2018-03-28 21:34:54 +02:00
Alfredo Beaumont	ce5adee8ab	Merge pull request #113 from vmarkovtsev/master Use rubex for faster regular expressions	2017-10-26 18:03:43 +02:00
Vadim Markovtsev	a66154b7eb	Make tokenizer regexps work under rubex Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>	2017-10-26 17:04:31 +02:00
Vadim Markovtsev	09d6add804	Fix review Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>	2017-10-26 17:02:58 +02:00
Vadim Markovtsev	c97a180da5	Fix review suggestions Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>	2017-10-26 15:51:02 +02:00
Vadim Markovtsev	250519bb51	Add the external test linguist dir from env var This allows to use a cached directory with linguist instead of cloning and speeds up the tests by -10s on my local machine. Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>	2017-09-28 23:51:38 +02:00
David Paz	52d7ccd6cf	Updated mimeType.gold and regenerated mimeType.go	2017-07-19 10:18:18 +02:00
David Paz	b2fe3f69ce	Added mymeType.gold	2017-07-18 12:47:19 +02:00
David Paz	ea819f58c2	Renamed mime to mimeType	2017-07-18 12:46:29 +02:00
David Paz	632422db69	Added pending untracked files	2017-07-18 12:46:29 +02:00

1 2

94 Commits