Commit Graph

34 Commits

Author SHA1 Message Date
Luke Francl
eb043e80a8 Add GetLanguageID function
The Linguist-defined language IDs are important to our use case because they are
used as database identifiers. This adds a new generator to extract the language
IDs into a map and uses that to implement GetLanguageID.

Because one language has the ID 0, there is no way to tell if a language name is
found or not. If desired, we could add this by returning (string, bool) from
GetLanguageID. But none of the other functions that take language names do this,
so I didn't want to introduce it here.
2021-04-13 11:49:21 -07:00
Lauris BH
c40b34c351 Sync with Liguist v7.13.0 2021-03-07 18:02:04 +02:00
Lauris BH
497e2f85d3 Sync with github/linguist version v7.12.2 2021-01-17 14:10:38 +02:00
Lauris BH
289ac3d9f0 Sync with linguist 7.12.1 2020-11-15 14:32:56 +02:00
Lauris BH
bc76dd38b0 sync to the latest github/linguist v7.11.1 2020-10-12 12:32:48 +03:00
Lauris BH
7c562a6c34 sync to the latest github/linguist v7.11.0 2020-09-17 10:34:41 +03:00
Miguel Molina
78696c2272
data: bailout in some cases if there arent enough lines
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 13:39:59 +02:00
Miguel Molina
79398a925d
data: fix getting the first line for empty content
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 11:28:49 +02:00
Miguel Molina
8ff885a3a8
implement IsGenerated helper to filter out generated files
Closes #17

Implements the IsGenerated helper function to filter out generated
files using the rules and matchers in:
- https://github.com/github/linguist/blob/master/lib/linguist/generated.rb

Since the vast majority of matchers have very different logic, it cannot
be autogenerated directly from linguist like other logics in enry, so it's
translated by hand.

There are three different types of matchers in this implementation:
- By extension, which mark as generated based only in the extension. These
  are the fastest matchers, so they're done first.
- By file name, which matches patterns against the filename. These
  are performed in second place. Unlike linguist, we try to use string
  functions instead of regexps as much as possible.
- Finally, the rest of the matchers, which go into the content and try
  to identify if they're generated or not based on the content. Unlike
  linguist, we try to only read the content we need and not split it
  all unless it's necessary and use byte functions instead of regexps
  as much as possible.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
2020-05-28 08:55:13 +02:00
Máximo Cuadros
29bc0a181b
data: replace substring package with regex package 2020-04-15 17:27:48 +02:00
Máximo Cuadros
b851ee83ad
IsTest function for top 10 languages 2020-04-06 16:23:48 +02:00
Lauris BH
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
Lauris BH
9030d3671b sync to the latest github/linguist v7.9.0 2020-03-30 01:25:57 +03:00
Alexander Bezzubov
e32a70a784
tokenizer: fix a bug and regenerate the code \w latest Go
See https://github.com/bzz/enry/pull/4 for details.

Test Plan:
 - go test ./...
2020-03-19 19:08:21 +01:00
Máximo Cuadros
84efad7693
*: module rename to go-enry/go-enry/v4 2020-03-19 17:31:29 +01:00
Alexander Bezzubov
bc5e031cee Drop src-d org ref except for issues
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
Lauris Bukšis-Haberkorns
4e3e15e80d
Sync to linguist v7.5.1
Signed-off-by: Lauris BH <lauris@nix.lv>
2019-08-06 17:18:01 +03:00
Lauris Bukšis-Haberkorns
25b29ebdc4 Implement getting color code for languages
Signed-off-by: Lauris Bukšis-Haberkorns <lauris@nix.lv>
2019-07-19 23:59:46 +03:00
Alexander Bezzubov
6a5f37e9e2
modules: prepare for v2 release
- update go.mod \w v2
 - update all import paths

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:28:12 +02:00
Alexander Bezzubov
20c6d2845a
build: gopkg.in -> github.com imports
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-12 11:49:16 +02:00
Alexander
3499750785
Sync to linguist 7.2.0: heuristics.yml support (#189)
Sync \w Github Linguist v7.2.0

Includes new way of handling `heuristics.yml` and
all `./data/*` re-generated using Github Linguist [v7.2.0](https://github.com/github/linguist/releases/tag/v7.2.0)
release tag.

 - many new languages
 - better vendoring detection
 - update doc on update&known issues.
2019-02-14 12:47:45 +01:00
Denys Smirnov
7eafe024af write a canonical header for machine-generated files
Signed-off-by: Denys Smirnov <denys@sourced.tech>
2018-04-30 12:57:39 +03:00
Manuel Carmona
da7dbb7211 synchronized data/ directory
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
2017-11-08 14:41:31 +01:00
Alfredo Beaumont
fd25e6d201 data: Sync with linguist
Signed-off-by: Alfredo Beaumont <alfredo.beaumont@gmail.com>
2017-10-04 16:26:56 +02:00
Alfredo Beaumont
4d42cb06d3 Merge pull request #73 from dpaz/issue72
Cli to analyze a single file
2017-07-20 13:40:56 +02:00
David Paz
52d7ccd6cf Updated mimeType.gold and regenerated mimeType.go 2017-07-19 10:18:18 +02:00
David Paz
b2fe3f69ce Added mymeType.gold 2017-07-18 12:47:19 +02:00
David Paz
ea819f58c2 Renamed mime to mimeType 2017-07-18 12:46:29 +02:00
David Paz
25e12e9c03 Returns text/plain when mime it's undefined 2017-07-18 12:46:29 +02:00
Manuel Carmona
12e09ae1d0 upadated data 2017-07-18 09:01:42 +02:00
Manuel Carmona
2045abfa41 use of gopkg.in/toqueteos/substring.v1 in content.go to improve GetLanguagesByContent performance 2017-07-13 08:21:09 +02:00
David Paz
a05b5ee202 regenerated to the last commit of linguist 2017-06-28 11:24:20 +02:00
David Paz
3f2248084e Moved commit.go to data directory 2017-06-28 11:22:42 +02:00
David Paz
7e827e47ef moved generated data to data subpackage 2017-06-28 08:31:11 +02:00