Commit Graph

26 Commits

Author SHA1 Message Date
Alex Bezzubov
95f30f0db4 IsVendor: refactor RE collation optimization
The same optimization still happens during package initialization
at runtime, but an effort was made to make it more transparent and
self-documented.

Both the test & the benchmark were updated.

Old version (usefull for benchmark) was

```go
func IsVendor(filename string) bool {
	for _, matcher := range data.VendorMatchers {
		if matcher.MatchString(filename) {
			return true
		}
	}
	return false
}
```

Test plan:
 * go test -run ^TestIsVendor$ github.com/go-enry/go-enry/v2
2022-10-23 15:19:53 +02:00
Clark Boylan
16a5ff8e22 Fix IsVendor() regex generation
The regex generation introduced by
20726a1de3
had a subtle but important bug in it. In particular the ^ and ^|/
prefixes were only applied to the first regex in the OR'd groupings and
not every regex in the grouping.

We fix this by wrapping each of those OR'd groups in a new non capturing
group to ensure the prefix applies to every entry in the OR'd listing.

Additionally we add three new quick tests of IsVendor to guard against
this regression in the future.

This fixes https://github.com/go-enry/go-enry/issues/135
2022-08-25 16:22:09 -07:00
silverwind
b1bf2238b3
Add poetry.lock to generated files
`poetry.lock` is a generated file by the python poetry package manager,
see https://python-poetry.org/docs/basic-usage/ for references.
2022-03-17 15:29:07 +01:00
Lauris BH
0affa3ccca Update to Linguist v7.16.1 2021-09-25 23:57:50 +03:00
6543
d2d4c32d4d
Extend & simplify the test for IsVendor (#45) 2021-04-22 22:24:27 +02:00
Miguel Molina
8ff885a3a8
implement IsGenerated helper to filter out generated files
Closes #17

Implements the IsGenerated helper function to filter out generated
files using the rules and matchers in:
- https://github.com/github/linguist/blob/master/lib/linguist/generated.rb

Since the vast majority of matchers have very different logic, it cannot
be autogenerated directly from linguist like other logics in enry, so it's
translated by hand.

There are three different types of matchers in this implementation:
- By extension, which mark as generated based only in the extension. These
  are the fastest matchers, so they're done first.
- By file name, which matches patterns against the filename. These
  are performed in second place. Unlike linguist, we try to use string
  functions instead of regexps as much as possible.
- Finally, the rest of the matchers, which go into the content and try
  to identify if they're generated or not based on the content. Unlike
  linguist, we try to only read the content we need and not split it
  all unless it's necessary and use byte functions instead of regexps
  as much as possible.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 08:55:13 +02:00
Máximo Cuadros
b851ee83ad
IsTest function for top 10 languages 2020-04-06 16:23:48 +02:00
Lauris BH
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
Lauris Bukšis-Haberkorns
25b29ebdc4 Implement getting color code for languages
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-07-19 23:59:46 +03:00
Alexander
13d3d66d37
refactoring: remove un-used code, add go doc, fix ci (#199)
Refactoring, consisting of
 - remove unused method `isAuxiliaryLanguage` and `FileCountList`
   in order to reduce public API surfaces (go/java)
 - add GoDoc to public APIs
 - ci: java profile use latest go src
  It also now mimics https://docs.travis-ci.com/user/languages/go/#go-import-path
  for non-go build image, as code relies on internal imports.

TEST PLAN:
 - make test
2019-02-05 22:54:14 +01:00
Alexander
ef50154395
Maintenance: batch of minor changes (#183)
* exclude build artifacts from git
* build: simplify building by using src-d/ci
* bench: simplify&fix shell runners
* build: simplify benchmarks* targets
* test: remove dependency on single test suite
* doc: rel image link + linguist cli difference highlight
* suggestions from code review
* bench: add fail fast to all shell runners

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2018-12-27 11:55:34 +01:00
Davor Kapsa
66fda9af16 add TestGetMimeType
Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>
2018-10-03 22:12:00 +03:00
Davor Kapsa
f9c6cbabb1 add TestIsImage
Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>
2018-10-03 22:12:00 +03:00
Davor Kapsa
039c4b9628 add TestIsAuxiliaryLanguage
Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>
2018-10-03 22:12:00 +03:00
Huitse Tai
887bc6a4be make IsDotFile do not treat '.' as true
Signed-off-by: Huitse Tai <geb.1989@gmail.com>
2017-10-18 12:18:52 +08:00
Sunny
b84e338f9e cli: sort the results
Adds `FileCount` and `FileCountList` types for storing language file and
their count which can be sorted based on the value of count.

Signed-off-by: Sunny <me@darkowlzz.space>
2017-09-30 18:52:30 +05:30
Manuel Carmona
8d91dc7be8 added benchmarks and scripts to run, parse and plot them
moved benchmark/run-slow-benchmarks.sh's content to Makefile
2017-07-13 08:21:09 +02:00
Manuel Carmona
b7d4be5fdd commit against tests run is fixed
renamed tmpLinguist to repoLinguist and SimpleLinguistTestSuite to EnryTestSuit in common_test.go

changed receiver's name for TestSuites to 's'

fixed comments
2017-06-26 15:35:53 +02:00
Manuel Carmona
beda5b73e7 changed signatures for strategies 2017-06-15 10:07:23 +02:00
Manuel Carmona
ba53e10c7b renamed package and cli to enry 2017-06-13 14:18:23 +02:00
Manuel Carmona
5b304524d1 Rearranged code 2017-06-02 09:33:55 +02:00
Manuel Carmona
5d61ca93d8 changed langs.go to unmarshal on a languageInfo struct 2017-04-17 11:55:29 +02:00
Manuel Carmona
ca3ae587f3 added documentation_matchers.go generation 2017-04-17 11:52:11 +02:00
Manuel Carmona
ef19999fe8 removed vendorRegexp and benchmarks 2017-04-10 12:40:52 +02:00
Manuel Carmona
5e13b984c9 changed TestIsVendor 2017-04-06 18:04:47 +02:00
Máximo Cuadros
bead3a606f tests 2016-07-13 22:21:18 +02:00