Commit Graph

38 Commits

Author SHA1 Message Date
6212f1fcb4 Remove name -> LanguageInfo mapping per code review
The GetLanguageInfo method is now implemented in terms of GetLanguageInfoByID.
This is possible because you can use GetLanguageID to get the ID for a language.
2021-10-12 13:29:39 -07:00
b248b21349 Expose LanguageInfo with all Linguist data
As discussed in https://github.com/go-enry/go-enry/issues/54, this provides an
API for accessing a LanguageInfo struct which is populated with all the data
from the Linguist YAML source file. Functions are provided to access the
LanguageInfo by name or ID.

The other top-level functions like GetLanguageExtensions, GetLanguageGroup, etc.
could in principle be implemented using this structure, which would simplify the
code generation. But that would be a big change so I didn't do any of that.
Perhaps in the next major version something like that would make sense.
2021-10-11 13:32:29 -07:00
0affa3ccca Update to Linguist v7.16.1 2021-09-25 23:57:50 +03:00
dfb8041dcc Update generated code for Linguist 7.14.0 2021-04-26 09:36:25 -07:00
eb043e80a8 Add GetLanguageID function
The Linguist-defined language IDs are important to our use case because they are
used as database identifiers. This adds a new generator to extract the language
IDs into a map and uses that to implement GetLanguageID.

Because one language has the ID 0, there is no way to tell if a language name is
found or not. If desired, we could add this by returning (string, bool) from
GetLanguageID. But none of the other functions that take language names do this,
so I didn't want to introduce it here.
2021-04-13 11:49:21 -07:00
c40b34c351 Sync with Liguist v7.13.0 2021-03-07 18:02:04 +02:00
497e2f85d3 Sync with github/linguist version v7.12.2 2021-01-17 14:10:38 +02:00
289ac3d9f0 Sync with linguist 7.12.1 2020-11-15 14:32:56 +02:00
bc76dd38b0 sync to the latest github/linguist v7.11.1 2020-10-12 12:32:48 +03:00
7c562a6c34 sync to the latest github/linguist v7.11.0 2020-09-17 10:34:41 +03:00
78696c2272 data: bailout in some cases if there arent enough lines
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 13:39:59 +02:00
79398a925d data: fix getting the first line for empty content
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 11:28:49 +02:00
8ff885a3a8 implement IsGenerated helper to filter out generated files
Closes #17

Implements the IsGenerated helper function to filter out generated
files using the rules and matchers in:
- https://github.com/github/linguist/blob/master/lib/linguist/generated.rb

Since the vast majority of matchers have very different logic, it cannot
be autogenerated directly from linguist like other logics in enry, so it's
translated by hand.

There are three different types of matchers in this implementation:
- By extension, which mark as generated based only in the extension. These
  are the fastest matchers, so they're done first.
- By file name, which matches patterns against the filename. These
  are performed in second place. Unlike linguist, we try to use string
  functions instead of regexps as much as possible.
- Finally, the rest of the matchers, which go into the content and try
  to identify if they're generated or not based on the content. Unlike
  linguist, we try to only read the content we need and not split it
  all unless it's necessary and use byte functions instead of regexps
  as much as possible.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 08:55:13 +02:00
29bc0a181b data: replace substring package with regex package 2020-04-15 17:27:48 +02:00
b851ee83ad IsTest function for top 10 languages 2020-04-06 16:23:48 +02:00
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
9030d3671b sync to the latest github/linguist v7.9.0 2020-03-30 01:25:57 +03:00
e32a70a784 tokenizer: fix a bug and regenerate the code \w latest Go
See https://github.com/bzz/enry/pull/4 for details.

Test Plan:
 - go test ./...
2020-03-19 19:08:21 +01:00
84efad7693 *: module rename to go-enry/go-enry/v4 2020-03-19 17:31:29 +01:00
bc5e031cee Drop src-d org ref except for issues
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
4e3e15e80d Sync to linguist v7.5.1
Signed-off-by: Lauris BH <lauris@nix.lv>
2019-08-06 17:18:01 +03:00
25b29ebdc4 Implement getting color code for languages
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-07-19 23:59:46 +03:00
6a5f37e9e2 modules: prepare for v2 release
- update go.mod \w v2
 - update all import paths

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:28:12 +02:00
20c6d2845a build: gopkg.in -> github.com imports
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-12 11:49:16 +02:00
3499750785 Sync to linguist 7.2.0: heuristics.yml support (#189)
Sync \w Github Linguist v7.2.0

Includes new way of handling `heuristics.yml` and
all `./data/*` re-generated using Github Linguist [v7.2.0](https://github.com/github/linguist/releases/tag/v7.2.0)
release tag.

 - many new languages
 - better vendoring detection
 - update doc on update&known issues.
2019-02-14 12:47:45 +01:00
7eafe024af write a canonical header for machine-generated files
Signed-off-by: Denys Smirnov <denys@sourced.tech>
2018-04-30 12:57:39 +03:00
da7dbb7211 synchronized data/ directory
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
2017-11-08 14:41:31 +01:00
fd25e6d201 data: Sync with linguist
Signed-off-by: Alfredo Beaumont <alfredo.beaumont@gmail.com>
2017-10-04 16:26:56 +02:00
4d42cb06d3 Merge pull request #73 from dpaz/issue72
Cli to analyze a single file
2017-07-20 13:40:56 +02:00
52d7ccd6cf Updated mimeType.gold and regenerated mimeType.go 2017-07-19 10:18:18 +02:00
b2fe3f69ce Added mymeType.gold 2017-07-18 12:47:19 +02:00
ea819f58c2 Renamed mime to mimeType 2017-07-18 12:46:29 +02:00
25e12e9c03 Returns text/plain when mime it's undefined 2017-07-18 12:46:29 +02:00
12e09ae1d0 upadated data 2017-07-18 09:01:42 +02:00
2045abfa41 use of gopkg.in/toqueteos/substring.v1 in content.go to improve GetLanguagesByContent performance 2017-07-13 08:21:09 +02:00
a05b5ee202 regenerated to the last commit of linguist 2017-06-28 11:24:20 +02:00
3f2248084e Moved commit.go to data directory 2017-06-28 11:22:42 +02:00
7e827e47ef moved generated data to data subpackage 2017-06-28 08:31:11 +02:00