Commit Graph

28 Commits

Author SHA1 Message Date
20726a1de3 Make IsVendor quicker
Although iterating across the regexps is quicker than naively concatenating them,
it is still quite slow.

This PR proposes a slightly cleverer solution.

First instead of just concatenating with groups this PR uses non-capturing groups.
This speeds up the regexp processing.

Secondly we group the regexps in to 3 groups - those that have to be at the start,
those that are segments or at the start and the rest. This makes a considerable speed
improvement.

Thirdly the regexps are sorted within those groups - which also speeds things up.

All in all for a non-vendored file this makes IsVendor around twice as fast.

Signed-off-by: Andrew Thornton <art27@cantab.net>
2021-04-23 10:18:28 +01:00
8ff885a3a8 implement IsGenerated helper to filter out generated files
Closes #17

Implements the IsGenerated helper function to filter out generated
files using the rules and matchers in:
- https://github.com/github/linguist/blob/master/lib/linguist/generated.rb

Since the vast majority of matchers have very different logic, it cannot
be autogenerated directly from linguist like other logics in enry, so it's
translated by hand.

There are three different types of matchers in this implementation:
- By extension, which mark as generated based only in the extension. These
  are the fastest matchers, so they're done first.
- By file name, which matches patterns against the filename. These
  are performed in second place. Unlike linguist, we try to use string
  functions instead of regexps as much as possible.
- Finally, the rest of the matchers, which go into the content and try
  to identify if they're generated or not based on the content. Unlike
  linguist, we try to only read the content we need and not split it
  all unless it's necessary and use byte functions instead of regexps
  as much as possible.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 08:55:13 +02:00
29bc0a181b data: replace substring package with regex package 2020-04-15 17:27:48 +02:00
b851ee83ad IsTest function for top 10 languages 2020-04-06 16:23:48 +02:00
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
84efad7693 *: module rename to go-enry/go-enry/v4 2020-03-19 17:31:29 +01:00
bc5e031cee Drop src-d org ref except for issues
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
25b29ebdc4 Implement getting color code for languages
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-07-19 23:59:46 +03:00
6a5f37e9e2 modules: prepare for v2 release
- update go.mod \w v2
 - update all import paths

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:28:12 +02:00
20c6d2845a build: gopkg.in -> github.com imports
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-12 11:49:16 +02:00
13d3d66d37 refactoring: remove un-used code, add go doc, fix ci (#199)
Refactoring, consisting of
 - remove unused method `isAuxiliaryLanguage` and `FileCountList`
   in order to reduce public API surfaces (go/java)
 - add GoDoc to public APIs
 - ci: java profile use latest go src
  It also now mimics https://docs.travis-ci.com/user/languages/go/#go-import-path
  for non-go build image, as code relies on internal imports.

TEST PLAN:
 - make test
2019-02-05 22:54:14 +01:00
a786f6175e patch IsDotFile function compatible with both '.' and '..'
Signed-off-by: Huitse Tai <geb.1989@gmail.com>
2017-10-18 12:50:33 +08:00
887bc6a4be make IsDotFile do not treat '.' as true
Signed-off-by: Huitse Tai <geb.1989@gmail.com>
2017-10-18 12:18:52 +08:00
b84e338f9e cli: sort the results
Adds `FileCount` and `FileCountList` types for storing language file and
their count which can be sorted based on the value of count.

Signed-off-by: Sunny <me@darkowlzz.space>
2017-09-30 18:52:30 +05:30
bd402a063c Fixed output text 2017-07-18 12:54:54 +02:00
b2fe3f69ce Added mymeType.gold 2017-07-18 12:47:19 +02:00
25e12e9c03 Returns text/plain when mime it's undefined 2017-07-18 12:46:29 +02:00
125c802582 Now generates mime file 2017-07-18 12:46:29 +02:00
7e827e47ef moved generated data to data subpackage 2017-06-28 08:31:11 +02:00
beda5b73e7 changed signatures for strategies 2017-06-15 10:07:23 +02:00
ba53e10c7b renamed package and cli to enry 2017-06-13 14:18:23 +02:00
0d5dff1979 changes in the API, ready to version 2 2017-06-06 11:30:23 +02:00
5b304524d1 Rearranged code 2017-06-02 09:33:55 +02:00
ca3ae587f3 added documentation_matchers.go generation 2017-04-17 11:52:11 +02:00
e998b0ff2e regexp for vendored files and directories are generated in vendor_matchers.go 2017-04-07 09:27:40 +02:00
13e7886a02 Added utils.go generation 2017-04-06 17:31:17 +02:00
2bbd7ec440 unified GetLanguage function 2016-07-18 16:20:12 +02:00
bead3a606f tests 2016-07-13 22:21:18 +02:00