Commit Graph

472 Commits

Author SHA1 Message Date
Miguel Molina
8ff885a3a8
implement IsGenerated helper to filter out generated files
Closes #17

Implements the IsGenerated helper function to filter out generated
files using the rules and matchers in:
- https://github.com/github/linguist/blob/master/lib/linguist/generated.rb

Since the vast majority of matchers have very different logic, it cannot
be autogenerated directly from linguist like other logics in enry, so it's
translated by hand.

There are three different types of matchers in this implementation:
- By extension, which mark as generated based only in the extension. These
  are the fastest matchers, so they're done first.
- By file name, which matches patterns against the filename. These
  are performed in second place. Unlike linguist, we try to use string
  functions instead of regexps as much as possible.
- Finally, the rest of the matchers, which go into the content and try
  to identify if they're generated or not based on the content. Unlike
  linguist, we try to only read the content we need and not split it
  all unless it's necessary and use byte functions instead of regexps
  as much as possible.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 08:55:13 +02:00
Máximo Cuadros
bda45fdc8e
go.mod: update go-oniguruma v1.2.1 2020-05-06 21:42:07 +02:00
Alexander
4b468762b6
Merge pull request #13 from go-enry/python-wrapper
Python: API to expose highest-level enry.GetLanguage
2020-04-24 20:57:37 +02:00
Alexander Bezzubov
35575d0a3e
py: expose highest-level enry.language()
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-04-24 20:51:46 +02:00
Máximo Cuadros
1d23012ae6
Merge pull request #15 from go-enry/ci-libonig5
ci: update libonig5 version
2020-04-24 19:24:34 +02:00
Máximo Cuadros
aa46bf4a37
ci: update libonig5 version 2020-04-24 19:18:27 +02:00
Máximo Cuadros
9fba3da45f
Merge pull request #12 from mcuadros/oniguruma
data: replace substring package with regex package
2020-04-15 19:58:28 +02:00
Máximo Cuadros
29bc0a181b
data: replace substring package with regex package 2020-04-15 17:27:48 +02:00
Máximo Cuadros
b34576bd71
Merge pull request #10 from mcuadros/is-test
IsTest function for top 10 languages
2020-04-06 16:29:11 +02:00
Máximo Cuadros
b851ee83ad
IsTest function for top 10 languages 2020-04-06 16:23:48 +02:00
Alexander Bezzubov
4fb0b4cc5e
doc: add coloring to the ToC
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-31 11:20:06 +02:00
Alexander
6c06680bef
Merge pull request #2 from lafriks-fork/feat/lang_groups
Return group color if language has none
2020-03-31 11:17:52 +02:00
Lauris BH
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
Alexander
cfaa7a1711
Merge pull request #8 from go-enry/update-docs
Documentation update
2020-03-30 22:53:10 +02:00
Alexander Bezzubov
6a09a2a684
doc: update badges
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-30 19:42:03 +02:00
Alexander Bezzubov
64d02e5441
doc: re-structure README by use case, update links
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-30 19:31:53 +02:00
Alexander
c0806d20c8
Merge pull request #3 from lafriks-fork/feat/sync_linguist_7_9_0
sync to the latest github/linguist v7.9.0
2020-03-30 12:28:03 +02:00
Lauris BH
9030d3671b sync to the latest github/linguist v7.9.0 2020-03-30 01:25:57 +03:00
Alexander
fa1c6f39b5
Merge pull request #7 from go-enry/win-support
Code generator Win support
2020-03-29 23:31:10 +02:00
Alexander Bezzubov
172486906a
ci: force git to use LF on win to pass tests on linguist samples
This mitigates the problem that tokenizer uses regex
that matches platform-specific line endings

TestPlan:
 - go test ./internal/code-generator/generator \
	-run Test_GeneratorTestSuite -testify.m TestTokenizerOnATS

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 23:23:56 +02:00
Alexander Bezzubov
3ea961e5ab
generator: change-detector tests on EOL-dependant sample
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 23:23:56 +02:00
Alexander Bezzubov
9be0211f04
generator: skip symlinks on *nix and win
As Git on win does not support symlinks [1], we have to hard-code
the paths to fils under ./samples/ in Linguist codebase that are
known to be a symlink.

 1. https://github.com/git-for-windows/git/wiki/Symbolic-Links

TestPlan:
 - go test ./internal/code-generator/generator -run Test_GeneratorTestSuite

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 23:23:56 +02:00
Alexander Bezzubov
9c082eb2d4
ci: add ENRY_DEBUG flag
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 19:35:49 +02:00
Alexander Bezzubov
78eee0cf7e
generator: flag to debug building of bayesian classifier
It seems that reading ./samples/ from Linguist consumes
a different number of files from filesystem on different OSes.

This change adds ENRY_DEBUG env var to print some debug output
about calculations of token stats from  samples.

TestPlan:
 - ENRY_DEBUG=1 go test -v ./internal/code-generator/generator \
	-run Test_GeneratorTestSuite -testify.m TestGenerationFiles

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-29 19:35:49 +02:00
Alexander
b78e4423f0
generator: drop platform-specific separator
Co-Authored-By: Lauris BH <lauris@nix.lv>
2020-03-25 19:27:46 +01:00
Alexander Bezzubov
3a5f4b2db1
generator: mode debug output in case of failure
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-25 14:20:26 +01:00
Alexander Bezzubov
b0f94ad693
generator: CLI tool fix to support win paths
On Win `make code-generate` produces unreasonable
Bayesian classifier weights from Linguist samples
silently, failing only the final classification tests.

TestPlan:
 - go test ./internal/code-generator/... \
    -run Test_GeneratorTestSuite -testify.m TestGenerationFiles

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-25 14:00:24 +01:00
Alexander Bezzubov
78d8f43a88
tokenizer: hide flex-based impl, avoid build failures on win
TestPlan:
 - go test -run TestTokenize ./internal/tokenizer
 - go test -tags flex -run TestTokenize ./internal/tokenizer
   (shold fail as default fixtures are from regex-based tokenizer)
2020-03-19 19:58:48 +01:00
Alexander Bezzubov
1ab8148c10
test: fix platform-depenent paths in tests
Test Plan:
 - go test ./internal/code-generator/... -run Test_GeneratorTestSuite -testify.m TestGenerationFiles
2020-03-19 19:47:22 +01:00
Alexander Bezzubov
e32a70a784
tokenizer: fix a bug and regenerate the code \w latest Go
See https://github.com/bzz/enry/pull/4 for details.

Test Plan:
 - go test ./...
2020-03-19 19:08:21 +01:00
Máximo Cuadros
e08125d7ee
ci: add oniguruma tests 2020-03-19 17:43:40 +01:00
Máximo Cuadros
98cd5cf5e8
ci: based on github actions 2020-03-19 17:38:22 +01:00
Máximo Cuadros
84efad7693
*: module rename to go-enry/go-enry/v4 2020-03-19 17:31:29 +01:00
Alexander Bezzubov
bc5e031cee Drop src-d org ref except for issues
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
Alexander
697929e149
Merge pull request #248 from bzz/go-api-surface
go: reduce API surface
2019-10-29 19:13:48 +01:00
Alexander
4d5ca8b9a6
Merge pull request #247 from bzz/doc-update
doc: cleanup and simplify
2019-10-29 18:27:02 +01:00
Alexander Bezzubov
aa40f75657
go doc: minor improvements and clarifications
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:26:19 +01:00
Alexander Bezzubov
fa097f4ed4
go: remove Classifier from API
Even more reduces public API surface by
hiding un-used Classifier API for providing
a pre-trained classifier weights.

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:20:33 +01:00
Alexander Bezzubov
3f0c4e182b
go: reduce API surface
Don't export defaultClassifier

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:14:43 +01:00
Alexander Bezzubov
c7272bd4f1
address review feedback
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:11:35 +01:00
Alexander Bezzubov
324cb1d7c9
doc: cleanup and simplify
Make it shorter and more structured, update ToC
remove ref from links, etc

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 15:49:16 +01:00
Alexander
a4c166cc04
Merge pull request #245 from bzz/initial-cffi-python-bindings
Initial cffi bindings for python
2019-10-28 14:07:29 +01:00
Alexander
31878fe4e1
Merge pull request #242 from creachadair/maint
Update MAINTAINERS, add CODEOWNERS.
2019-10-14 19:55:55 +02:00
Alexander Bezzubov
6cf5bf2ca4
python: expose is_vendor()
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-14 19:38:33 +02:00
Alexander Bezzubov
be583cad06
python: add dependencies
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-14 19:38:33 +02:00
Alexander Bezzubov
cff9c07009
python: expose language_by_filename()
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-14 19:38:33 +02:00
Alexander Bezzubov
ee7a0f1139
python: initial impl of bindings using cFFI
A PoC that exposes single function
`enry.language_by_extension()` and a small
number of helpers to deal with string
coversion between Go<->C<->Python.

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-14 19:38:33 +02:00
M. J. Fromberger
6a6a3cc26e
Merge pull request #244 from creachadair/installdoc
docs: Update CLI installation instructions.
2019-10-07 09:44:23 -07:00
M. J. Fromberger
bf29b9a924 Use conditional composition instead of sequential.
Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-10-07 09:39:07 -07:00
M. J. Fromberger
7763fcde19 docs: Update CLI installation instructions.
Fixes #243. The default behaviour for `go get` has changed slightly and we now
need to either provide a module context or disable modules for installation to
work correctly.

Also remove a now-obsolete reference to the source{d} engine CLI.

Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
2019-10-07 08:34:50 -07:00