Commit Graph

47 Commits

Author SHA1 Message Date
Luke Francl
eb043e80a8 Add GetLanguageID function
The Linguist-defined language IDs are important to our use case because they are
used as database identifiers. This adds a new generator to extract the language
IDs into a map and uses that to implement GetLanguageID.

Because one language has the ID 0, there is no way to tell if a language name is
found or not. If desired, we could add this by returning (string, bool) from
GetLanguageID. But none of the other functions that take language names do this,
so I didn't want to introduce it here.
2021-04-13 11:49:21 -07:00
Lauris BH
0596fda1a4
Fix strategy order 2020-11-26 13:56:25 +02:00
Lauris BH
6d8f15af5b Add XML strategy 2020-11-15 15:43:37 +02:00
Lauris BH
cb353b4b05 Add support for Roff man pages filenames 2020-10-12 12:18:57 +03:00
Miguel Molina
8ff885a3a8
implement IsGenerated helper to filter out generated files
Closes #17

Implements the IsGenerated helper function to filter out generated
files using the rules and matchers in:
- https://github.com/github/linguist/blob/master/lib/linguist/generated.rb

Since the vast majority of matchers have very different logic, it cannot
be autogenerated directly from linguist like other logics in enry, so it's
translated by hand.

There are three different types of matchers in this implementation:
- By extension, which mark as generated based only in the extension. These
  are the fastest matchers, so they're done first.
- By file name, which matches patterns against the filename. These
  are performed in second place. Unlike linguist, we try to use string
  functions instead of regexps as much as possible.
- Finally, the rest of the matchers, which go into the content and try
  to identify if they're generated or not based on the content. Unlike
  linguist, we try to only read the content we need and not split it
  all unless it's necessary and use byte functions instead of regexps
  as much as possible.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 08:55:13 +02:00
Lauris BH
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
Máximo Cuadros
84efad7693
*: module rename to go-enry/go-enry/v4 2020-03-19 17:31:29 +01:00
Alexander Bezzubov
bc5e031cee Drop src-d org ref except for issues
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
Alexander Bezzubov
aa40f75657
go doc: minor improvements and clarifications
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:26:19 +01:00
Alexander Bezzubov
fa097f4ed4
go: remove Classifier from API
Even more reduces public API surface by
hiding un-used Classifier API for providing
a pre-trained classifier weights.

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:20:33 +01:00
Alexander Bezzubov
3f0c4e182b
go: reduce API surface
Don't export defaultClassifier

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:14:43 +01:00
Lauris Bukšis-Haberkorns
a4cf6d2ef1
If osascript is called with argument -l it could be different language so do not relay on it
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-08-05 22:28:51 +03:00
Alexander Bezzubov
6a5f37e9e2
modules: prepare for v2 release
- update go.mod \w v2
 - update all import paths

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:28:12 +02:00
Alexander Bezzubov
20c6d2845a
build: gopkg.in -> github.com imports
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-12 11:49:16 +02:00
Alexander Bezzubov
bdb5603f28
Address code review feedback
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-08 16:07:10 +02:00
Alexander Bezzubov
df01124e18
doc: better wording in API godoc
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-03 16:07:14 +02:00
kuba--
5adfee5761
Do not return empty lang.
It's better to return any potential candidate than nothing.

Signed-off-by: kuba-- <kuba@sourced.tech>
2019-03-14 14:08:19 +01:00
Alexander
3499750785
Sync to linguist 7.2.0: heuristics.yml support (#189)
Sync \w Github Linguist v7.2.0

Includes new way of handling `heuristics.yml` and
all `./data/*` re-generated using Github Linguist [v7.2.0](https://github.com/github/linguist/releases/tag/v7.2.0)
release tag.

 - many new languages
 - better vendoring detection
 - update doc on update&known issues.
2019-02-14 12:47:45 +01:00
Alexander
13d3d66d37
refactoring: remove un-used code, add go doc, fix ci (#199)
Refactoring, consisting of
 - remove unused method `isAuxiliaryLanguage` and `FileCountList`
   in order to reduce public API surfaces (go/java)
 - add GoDoc to public APIs
 - ci: java profile use latest go src
  It also now mimics https://docs.travis-ci.com/user/languages/go/#go-import-path
  for non-go build image, as code relies on internal imports.

TEST PLAN:
 - make test
2019-02-05 22:54:14 +01:00
Antonio Jesus Navarro Perez
15bb13117f Refactor Oniguruma integration
Instead of use a command to change imports before build, using a build tag to generate the correct binary.

This will allow applications to compile enry using oniguruma with less troubles.

Signed-off-by: Antonio Jesus Navarro Perez <antnavper@gmail.com>
2018-08-29 18:01:13 +03:00
Denys Smirnov
8da8516ac1 clarify GetLanguages usage
Signed-off-by: Denys Smirnov <denys@sourced.tech>
2018-07-31 00:24:20 +03:00
Alfredo Beaumont
c590beb039 common: Return nil on empty filenames
Signed-off-by: Alfredo Beaumont <alfredo.beaumont@gmail.com>
2017-12-07 16:45:19 +01:00
Manuel Carmona
5dad184af0 check for empty content in getHeaderAndFooter function
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
2017-11-08 14:41:31 +01:00
Alfredo Beaumont
c9122aad9f common: Use underscore as parameter name for unused parameters 2017-08-08 14:58:01 +02:00
Alfredo Beaumont
0990c2868d common: Add filename parameter to GetLanguageByContent function
Internal code needs the filename to select a matcher, so not passing
any filename means no language will ever be found otherwise.
2017-08-08 11:58:02 +02:00
Alexander Bezzubov
3303cf7824 Fix 🐛 on file starting with single shebang 2017-07-25 10:37:11 +02:00
Manuel Carmona
d8798c2dd9 binary files are returned as OtherLanguage by GetLanguage 2017-07-04 11:38:43 +02:00
David Paz
78df505715 Deleted common.go file 2017-06-28 11:04:51 +02:00
David Paz
7e827e47ef moved generated data to data subpackage 2017-06-28 08:31:11 +02:00
Manuel Carmona
b7d4be5fdd commit against tests run is fixed
renamed tmpLinguist to repoLinguist and SimpleLinguistTestSuite to EnryTestSuit in common_test.go

changed receiver's name for TestSuites to 's'

fixed comments
2017-06-26 15:35:53 +02:00
Manuel Carmona
bea1bc3af8 split GetLanguage into GetLanguage and GetLanguages 2017-06-15 13:02:59 +02:00
Manuel Carmona
beda5b73e7 changed signatures for strategies 2017-06-15 10:07:23 +02:00
Manuel Carmona
1fc8cf7a5d changes to improve detection accuracy 2017-06-15 10:07:22 +02:00
Manuel Carmona
ba53e10c7b renamed package and cli to enry 2017-06-13 14:18:23 +02:00
Manuel Carmona
0d5dff1979 changes in the API, ready to version 2 2017-06-06 11:30:23 +02:00
Manuel Carmona
5b304524d1 Rearranged code 2017-06-02 09:33:55 +02:00
Manuel Carmona
f8b8f7f5c4 Added classifier to the sequence of strategies 2017-05-30 09:07:58 +02:00
Manuel Carmona
3d867abac3 Added modeline strategy 2017-05-11 10:09:02 +02:00
Manuel Carmona
df60eab1ad added language detection by filename strategy 2017-04-27 17:32:39 +02:00
Manuel Carmona
1b8d51419d GetLanguage follows strategies shebang, extension, content 2017-04-27 16:40:23 +02:00
Miguel Molina
74733eac19 Make ExtensionsByLanguage public 2016-08-02 10:38:14 +02:00
Máximo Cuadros
2bbd7ec440 unified GetLanguage function 2016-07-18 16:20:12 +02:00
Máximo Cuadros
ec9c23e411 content disambiguate logic 2016-07-16 23:38:23 +02:00
Máximo Cuadros
52986d00fc new by content heuristisc, nore 2016-07-14 18:12:12 +02:00
Máximo Cuadros
b1a3085e44 new by content heuristisc 2016-07-14 15:14:32 +02:00
Máximo Cuadros
947a0d3d44 latest linguist patterns 36ba3783443275525fff7b72b633a3bccfb132cb 2016-07-14 00:08:09 +02:00
Máximo Cuadros
6bc52531e7 code from domain 2016-07-13 19:05:09 +02:00