Commit Graph

55 Commits

Author SHA1 Message Date
Luke Francl
57c5940dbe
Update common.go
Thanks @lafriks for catching this!

Co-authored-by: Lauris BH <lauris@nix.lv>
2021-10-12 16:20:14 -07:00
Luke Francl
6212f1fcb4 Remove name -> LanguageInfo mapping per code review
The GetLanguageInfo method is now implemented in terms of GetLanguageInfoByID.
This is possible because you can use GetLanguageID to get the ID for a language.
2021-10-12 13:29:39 -07:00
Luke Francl
b248b21349 Expose LanguageInfo with all Linguist data
As discussed in https://github.com/go-enry/go-enry/issues/54, this provides an
API for accessing a LanguageInfo struct which is populated with all the data
from the Linguist YAML source file. Functions are provided to access the
LanguageInfo by name or ID.

The other top-level functions like GetLanguageExtensions, GetLanguageGroup, etc.
could in principle be implemented using this structure, which would simplify the
code generation. But that would be a big change so I didn't do any of that.
Perhaps in the next major version something like that would make sense.
2021-10-11 13:32:29 -07:00
Lauris BH
4686615d9e Improve shebang parsing to detect correct interpreter 2021-09-25 19:24:44 +03:00
Michael Rykov
58f8dccbcf Fixed GetLanguagesByShebang for paths with “env” 2021-06-19 00:49:05 +08:00
Alex
0a9864e6ec
Merge pull request #46 from look/look/add-language-id
Add GetLanguageID function
2021-04-24 08:32:32 +02:00
Luke Francl
cabfdaffc0 Update GetLanguageID to return a found boolean per code review 2021-04-22 16:55:42 -07:00
Luke Francl
bf7167fc44 Rewrite GetLanguages to work like Linguist.detect
Prior to this change, GetLanguages collected all candidate languages from each
strategy to pass to the next strategy (without de-duplicating them). Linguist
only uses the previous strategy's candidates for the next strategy. Also, it
would overwrite languages with nil if a strategy returned that, so you could get
into a situation where you go from multiple languages to no language.

See the Ruby code for details: aad49acc06/lib/linguist.rb (L14-L49)

This addresses https://github.com/src-d/enry/issues/207 because GetLanguages
should not return all candidates detected, otherwise it would work differently
than Linguist.
2021-04-13 12:04:47 -07:00
Luke Francl
eb043e80a8 Add GetLanguageID function
The Linguist-defined language IDs are important to our use case because they are
used as database identifiers. This adds a new generator to extract the language
IDs into a map and uses that to implement GetLanguageID.

Because one language has the ID 0, there is no way to tell if a language name is
found or not. If desired, we could add this by returning (string, bool) from
GetLanguageID. But none of the other functions that take language names do this,
so I didn't want to introduce it here.
2021-04-13 11:49:21 -07:00
Lauris BH
0596fda1a4
Fix strategy order 2020-11-26 13:56:25 +02:00
Lauris BH
6d8f15af5b Add XML strategy 2020-11-15 15:43:37 +02:00
Lauris BH
cb353b4b05 Add support for Roff man pages filenames 2020-10-12 12:18:57 +03:00
Miguel Molina
8ff885a3a8
implement IsGenerated helper to filter out generated files
Closes #17

Implements the IsGenerated helper function to filter out generated
files using the rules and matchers in:
- https://github.com/github/linguist/blob/master/lib/linguist/generated.rb

Since the vast majority of matchers have very different logic, it cannot
be autogenerated directly from linguist like other logics in enry, so it's
translated by hand.

There are three different types of matchers in this implementation:
- By extension, which mark as generated based only in the extension. These
  are the fastest matchers, so they're done first.
- By file name, which matches patterns against the filename. These
  are performed in second place. Unlike linguist, we try to use string
  functions instead of regexps as much as possible.
- Finally, the rest of the matchers, which go into the content and try
  to identify if they're generated or not based on the content. Unlike
  linguist, we try to only read the content we need and not split it
  all unless it's necessary and use byte functions instead of regexps
  as much as possible.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 08:55:13 +02:00
Lauris BH
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
Máximo Cuadros
84efad7693
*: module rename to go-enry/go-enry/v4 2020-03-19 17:31:29 +01:00
Alexander Bezzubov
bc5e031cee Drop src-d org ref except for issues
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
Alexander Bezzubov
aa40f75657
go doc: minor improvements and clarifications
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:26:19 +01:00
Alexander Bezzubov
fa097f4ed4
go: remove Classifier from API
Even more reduces public API surface by
hiding un-used Classifier API for providing
a pre-trained classifier weights.

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:20:33 +01:00
Alexander Bezzubov
3f0c4e182b
go: reduce API surface
Don't export defaultClassifier

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:14:43 +01:00
Lauris Bukšis-Haberkorns
a4cf6d2ef1
If osascript is called with argument -l it could be different language so do not relay on it
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-08-05 22:28:51 +03:00
Alexander Bezzubov
6a5f37e9e2
modules: prepare for v2 release
- update go.mod \w v2
 - update all import paths

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:28:12 +02:00
Alexander Bezzubov
20c6d2845a
build: gopkg.in -> github.com imports
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-12 11:49:16 +02:00
Alexander Bezzubov
bdb5603f28
Address code review feedback
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-08 16:07:10 +02:00
Alexander Bezzubov
df01124e18
doc: better wording in API godoc
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-03 16:07:14 +02:00
kuba--
5adfee5761
Do not return empty lang.
It's better to return any potential candidate than nothing.

Signed-off-by: kuba-- <kuba@sourced.tech>
2019-03-14 14:08:19 +01:00
Alexander
3499750785
Sync to linguist 7.2.0: heuristics.yml support (#189)
Sync \w Github Linguist v7.2.0

Includes new way of handling `heuristics.yml` and
all `./data/*` re-generated using Github Linguist [v7.2.0](https://github.com/github/linguist/releases/tag/v7.2.0)
release tag.

 - many new languages
 - better vendoring detection
 - update doc on update&known issues.
2019-02-14 12:47:45 +01:00
Alexander
13d3d66d37
refactoring: remove un-used code, add go doc, fix ci (#199)
Refactoring, consisting of
 - remove unused method `isAuxiliaryLanguage` and `FileCountList`
   in order to reduce public API surfaces (go/java)
 - add GoDoc to public APIs
 - ci: java profile use latest go src
  It also now mimics https://docs.travis-ci.com/user/languages/go/#go-import-path
  for non-go build image, as code relies on internal imports.

TEST PLAN:
 - make test
2019-02-05 22:54:14 +01:00
Antonio Jesus Navarro Perez
15bb13117f Refactor Oniguruma integration
Instead of use a command to change imports before build, using a build tag to generate the correct binary.

This will allow applications to compile enry using oniguruma with less troubles.

Signed-off-by: Antonio Jesus Navarro Perez <antnavper@gmail.com>
2018-08-29 18:01:13 +03:00
Denys Smirnov
8da8516ac1 clarify GetLanguages usage
Signed-off-by: Denys Smirnov <denys@sourced.tech>
2018-07-31 00:24:20 +03:00
Alfredo Beaumont
c590beb039 common: Return nil on empty filenames
Signed-off-by: Alfredo Beaumont <alfredo.beaumont@gmail.com>
2017-12-07 16:45:19 +01:00
Manuel Carmona
5dad184af0 check for empty content in getHeaderAndFooter function
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
2017-11-08 14:41:31 +01:00
Alfredo Beaumont
c9122aad9f common: Use underscore as parameter name for unused parameters 2017-08-08 14:58:01 +02:00
Alfredo Beaumont
0990c2868d common: Add filename parameter to GetLanguageByContent function
Internal code needs the filename to select a matcher, so not passing
any filename means no language will ever be found otherwise.
2017-08-08 11:58:02 +02:00
Alexander Bezzubov
3303cf7824 Fix 🐛 on file starting with single shebang 2017-07-25 10:37:11 +02:00
Manuel Carmona
d8798c2dd9 binary files are returned as OtherLanguage by GetLanguage 2017-07-04 11:38:43 +02:00
David Paz
78df505715 Deleted common.go file 2017-06-28 11:04:51 +02:00
David Paz
7e827e47ef moved generated data to data subpackage 2017-06-28 08:31:11 +02:00
Manuel Carmona
b7d4be5fdd commit against tests run is fixed
renamed tmpLinguist to repoLinguist and SimpleLinguistTestSuite to EnryTestSuit in common_test.go

changed receiver's name for TestSuites to 's'

fixed comments
2017-06-26 15:35:53 +02:00
Manuel Carmona
bea1bc3af8 split GetLanguage into GetLanguage and GetLanguages 2017-06-15 13:02:59 +02:00
Manuel Carmona
beda5b73e7 changed signatures for strategies 2017-06-15 10:07:23 +02:00
Manuel Carmona
1fc8cf7a5d changes to improve detection accuracy 2017-06-15 10:07:22 +02:00
Manuel Carmona
ba53e10c7b renamed package and cli to enry 2017-06-13 14:18:23 +02:00
Manuel Carmona
0d5dff1979 changes in the API, ready to version 2 2017-06-06 11:30:23 +02:00
Manuel Carmona
5b304524d1 Rearranged code 2017-06-02 09:33:55 +02:00
Manuel Carmona
f8b8f7f5c4 Added classifier to the sequence of strategies 2017-05-30 09:07:58 +02:00
Manuel Carmona
3d867abac3 Added modeline strategy 2017-05-11 10:09:02 +02:00
Manuel Carmona
df60eab1ad added language detection by filename strategy 2017-04-27 17:32:39 +02:00
Manuel Carmona
1b8d51419d GetLanguage follows strategies shebang, extension, content 2017-04-27 16:40:23 +02:00
Miguel Molina
74733eac19 Make ExtensionsByLanguage public 2016-08-02 10:38:14 +02:00
Máximo Cuadros
2bbd7ec440 unified GetLanguage function 2016-07-18 16:20:12 +02:00