Commit Graph

483 Commits

Author SHA1 Message Date
e98983b3f9 ci: add Python tests profile (\wo gopy)
Signed-off-by: Alexander Bezzubov <alexander.bezzubov@jetbrains.com>
2020-08-12 15:23:01 +02:00
328c16f948 py: use readme as pypy description
Signed-off-by: Alexander Bezzubov <alexander.bezzubov@jetbrains.com>
2020-08-12 15:22:55 +02:00
7ee65cc9d0 doc: upd build instructions
Signed-off-by: Alexander Bezzubov <alexander.bezzubov@jetbrains.com>
2020-08-12 15:22:50 +02:00
5d58b1aaaf Merge pull request #29 from vsmaxim/master
python: cover the rest of python bindings from shared library, add tests, add docstrings for API
2020-08-12 14:35:58 +02:00
59f0f17834 Remove unneded todos 2020-08-11 00:29:33 +03:00
08bc9bca0e Cover the rest of python bindings from shared library, add tests, add docstrings, add setup.py. 2020-08-11 00:12:43 +03:00
dc6fc02209 Merge pull request #24 from erizocosmico/fix/bail-out-if-not-enough-lines
data: bailout in some cases if there arent enough lines
2020-05-28 16:45:10 +02:00
78696c2272 data: bailout in some cases if there arent enough lines
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 13:39:59 +02:00
2880ccae4a Merge pull request #23 from erizocosmico/fix/get-first-line
data: fix getting the first line for empty content
2020-05-28 11:52:49 +02:00
79398a925d data: fix getting the first line for empty content
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 11:28:49 +02:00
e1f1b57a84 Merge pull request #22 from erizocosmico/feature/generated
implement IsGenerated helper to filter out generated files
2020-05-28 10:34:37 +02:00
8ff885a3a8 implement IsGenerated helper to filter out generated files
Closes #17

Implements the IsGenerated helper function to filter out generated
files using the rules and matchers in:
- https://github.com/github/linguist/blob/master/lib/linguist/generated.rb

Since the vast majority of matchers have very different logic, it cannot
be autogenerated directly from linguist like other logics in enry, so it's
translated by hand.

There are three different types of matchers in this implementation:
- By extension, which mark as generated based only in the extension. These
  are the fastest matchers, so they're done first.
- By file name, which matches patterns against the filename. These
  are performed in second place. Unlike linguist, we try to use string
  functions instead of regexps as much as possible.
- Finally, the rest of the matchers, which go into the content and try
  to identify if they're generated or not based on the content. Unlike
  linguist, we try to only read the content we need and not split it
  all unless it's necessary and use byte functions instead of regexps
  as much as possible.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 08:55:13 +02:00
bda45fdc8e go.mod: update go-oniguruma v1.2.1 2020-05-06 21:42:07 +02:00
4b468762b6 Merge pull request #13 from go-enry/python-wrapper
Python: API to expose highest-level enry.GetLanguage
2020-04-24 20:57:37 +02:00
35575d0a3e py: expose highest-level enry.language()
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-04-24 20:51:46 +02:00
1d23012ae6 Merge pull request #15 from go-enry/ci-libonig5
ci: update libonig5 version
2020-04-24 19:24:34 +02:00
aa46bf4a37 ci: update libonig5 version 2020-04-24 19:18:27 +02:00
9fba3da45f Merge pull request #12 from mcuadros/oniguruma
data: replace substring package with regex package
2020-04-15 19:58:28 +02:00
29bc0a181b data: replace substring package with regex package 2020-04-15 17:27:48 +02:00
b34576bd71 Merge pull request #10 from mcuadros/is-test
IsTest function for top 10 languages
2020-04-06 16:29:11 +02:00
b851ee83ad IsTest function for top 10 languages 2020-04-06 16:23:48 +02:00
4fb0b4cc5e doc: add coloring to the ToC
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-31 11:20:06 +02:00
6c06680bef Merge pull request #2 from lafriks-fork/feat/lang_groups
Return group color if language has none
2020-03-31 11:17:52 +02:00
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
cfaa7a1711 Merge pull request #8 from go-enry/update-docs
Documentation update
2020-03-30 22:53:10 +02:00
6a09a2a684 doc: update badges
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-30 19:42:03 +02:00
64d02e5441 doc: re-structure README by use case, update links
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-30 19:31:53 +02:00
c0806d20c8 Merge pull request #3 from lafriks-fork/feat/sync_linguist_7_9_0
sync to the latest github/linguist v7.9.0
2020-03-30 12:28:03 +02:00
9030d3671b sync to the latest github/linguist v7.9.0 2020-03-30 01:25:57 +03:00
fa1c6f39b5 Merge pull request #7 from go-enry/win-support
Code generator Win support
2020-03-29 23:31:10 +02:00
172486906a ci: force git to use LF on win to pass tests on linguist samples
This mitigates the problem that tokenizer uses regex
that matches platform-specific line endings

TestPlan:
 - go test ./internal/code-generator/generator \
	-run Test_GeneratorTestSuite -testify.m TestTokenizerOnATS

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 23:23:56 +02:00
3ea961e5ab generator: change-detector tests on EOL-dependant sample
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 23:23:56 +02:00
9be0211f04 generator: skip symlinks on *nix and win
As Git on win does not support symlinks [1], we have to hard-code
the paths to fils under ./samples/ in Linguist codebase that are
known to be a symlink.

 1. https://github.com/git-for-windows/git/wiki/Symbolic-Links

TestPlan:
 - go test ./internal/code-generator/generator -run Test_GeneratorTestSuite

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 23:23:56 +02:00
9c082eb2d4 ci: add ENRY_DEBUG flag
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 19:35:49 +02:00
78eee0cf7e generator: flag to debug building of bayesian classifier
It seems that reading ./samples/ from Linguist consumes
a different number of files from filesystem on different OSes.

This change adds ENRY_DEBUG env var to print some debug output
about calculations of token stats from  samples.

TestPlan:
 - ENRY_DEBUG=1 go test -v ./internal/code-generator/generator \
	-run Test_GeneratorTestSuite -testify.m TestGenerationFiles

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 19:35:49 +02:00
b78e4423f0 generator: drop platform-specific separator
Co-Authored-By: Lauris BH <lauris@nix.lv>
2020-03-25 19:27:46 +01:00
3a5f4b2db1 generator: mode debug output in case of failure
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-25 14:20:26 +01:00
b0f94ad693 generator: CLI tool fix to support win paths
On Win `make code-generate` produces unreasonable
Bayesian classifier weights from Linguist samples
silently, failing only the final classification tests.

TestPlan:
 - go test ./internal/code-generator/... \
    -run Test_GeneratorTestSuite -testify.m TestGenerationFiles

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-25 14:00:24 +01:00
78d8f43a88 tokenizer: hide flex-based impl, avoid build failures on win
TestPlan:
 - go test -run TestTokenize ./internal/tokenizer
 - go test -tags flex -run TestTokenize ./internal/tokenizer
   (shold fail as default fixtures are from regex-based tokenizer)
2020-03-19 19:58:48 +01:00
1ab8148c10 test: fix platform-depenent paths in tests
Test Plan:
 - go test ./internal/code-generator/... -run Test_GeneratorTestSuite -testify.m TestGenerationFiles
2020-03-19 19:47:22 +01:00
e32a70a784 tokenizer: fix a bug and regenerate the code \w latest Go
See https://github.com/bzz/enry/pull/4 for details.

Test Plan:
 - go test ./...
2020-03-19 19:08:21 +01:00
e08125d7ee ci: add oniguruma tests 2020-03-19 17:43:40 +01:00
98cd5cf5e8 ci: based on github actions 2020-03-19 17:38:22 +01:00
84efad7693 *: module rename to go-enry/go-enry/v4 2020-03-19 17:31:29 +01:00
bc5e031cee Drop src-d org ref except for issues
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
697929e149 Merge pull request #248 from bzz/go-api-surface
go: reduce API surface
2019-10-29 19:13:48 +01:00
4d5ca8b9a6 Merge pull request #247 from bzz/doc-update
doc: cleanup and simplify
2019-10-29 18:27:02 +01:00
aa40f75657 go doc: minor improvements and clarifications
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:26:19 +01:00
fa097f4ed4 go: remove Classifier from API
Even more reduces public API surface by
hiding un-used Classifier API for providing
a pre-trained classifier weights.

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:20:33 +01:00
3f0c4e182b go: reduce API surface
Don't export defaultClassifier

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:14:43 +01:00