Commit Graph

41 Commits

Author SHA1 Message Date
b248b21349 Expose LanguageInfo with all Linguist data
As discussed in https://github.com/go-enry/go-enry/issues/54, this provides an
API for accessing a LanguageInfo struct which is populated with all the data
from the Linguist YAML source file. Functions are provided to access the
LanguageInfo by name or ID.

The other top-level functions like GetLanguageExtensions, GetLanguageGroup, etc.
could in principle be implemented using this structure, which would simplify the
code generation. But that would be a big change so I didn't do any of that.
Perhaps in the next major version something like that would make sense.
2021-10-11 13:32:29 -07:00
0affa3ccca Update to Linguist v7.16.1 2021-09-25 23:57:50 +03:00
4686615d9e Improve shebang parsing to detect correct interpreter 2021-09-25 19:24:44 +03:00
58f8dccbcf Fixed GetLanguagesByShebang for paths with “env” 2021-06-19 00:49:05 +08:00
0a9864e6ec Merge pull request #46 from look/look/add-language-id
Add GetLanguageID function
2021-04-24 08:32:32 +02:00
cabfdaffc0 Update GetLanguageID to return a found boolean per code review 2021-04-22 16:55:42 -07:00
bf7167fc44 Rewrite GetLanguages to work like Linguist.detect
Prior to this change, GetLanguages collected all candidate languages from each
strategy to pass to the next strategy (without de-duplicating them). Linguist
only uses the previous strategy's candidates for the next strategy. Also, it
would overwrite languages with nil if a strategy returned that, so you could get
into a situation where you go from multiple languages to no language.

See the Ruby code for details: aad49acc06/lib/linguist.rb (L14-L49)

This addresses https://github.com/src-d/enry/issues/207 because GetLanguages
should not return all candidates detected, otherwise it would work differently
than Linguist.
2021-04-13 12:04:47 -07:00
eb043e80a8 Add GetLanguageID function
The Linguist-defined language IDs are important to our use case because they are
used as database identifiers. This adds a new generator to extract the language
IDs into a map and uses that to implement GetLanguageID.

Because one language has the ID 0, there is no way to tell if a language name is
found or not. If desired, we could add this by returning (string, bool) from
GetLanguageID. But none of the other functions that take language names do this,
so I didn't want to introduce it here.
2021-04-13 11:49:21 -07:00
323d739170 Fix test 2021-03-07 18:34:08 +02:00
6d8f15af5b Add XML strategy 2020-11-15 15:43:37 +02:00
cb353b4b05 Add support for Roff man pages filenames 2020-10-12 12:18:57 +03:00
7c562a6c34 sync to the latest github/linguist v7.11.0 2020-09-17 10:34:41 +03:00
97a26011a9 Return group color if language has none 2020-03-31 09:30:27 +03:00
1ab8148c10 test: fix platform-depenent paths in tests
Test Plan:
 - go test ./internal/code-generator/... -run Test_GeneratorTestSuite -testify.m TestGenerationFiles
2020-03-19 19:47:22 +01:00
84efad7693 *: module rename to go-enry/go-enry/v4 2020-03-19 17:31:29 +01:00
bc5e031cee Drop src-d org ref except for issues
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2020-03-19 14:04:36 +01:00
fa097f4ed4 go: remove Classifier from API
Even more reduces public API surface by
hiding un-used Classifier API for providing
a pre-trained classifier weights.

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:20:33 +01:00
3f0c4e182b go: reduce API surface
Don't export defaultClassifier

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-10-29 18:14:43 +01:00
6a5f37e9e2 modules: prepare for v2 release
- update go.mod \w v2
 - update all import paths

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-14 21:28:12 +02:00
20c6d2845a build: gopkg.in -> github.com imports
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2019-04-12 11:49:16 +02:00
5adfee5761 Do not return empty lang.
It's better to return any potential candidate than nothing.

Signed-off-by: kuba-- <kuba@sourced.tech>
2019-03-14 14:08:19 +01:00
3499750785 Sync to linguist 7.2.0: heuristics.yml support (#189)
Sync \w Github Linguist v7.2.0

Includes new way of handling `heuristics.yml` and
all `./data/*` re-generated using Github Linguist [v7.2.0](https://github.com/github/linguist/releases/tag/v7.2.0)
release tag.

 - many new languages
 - better vendoring detection
 - update doc on update&known issues.
2019-02-14 12:47:45 +01:00
ef50154395 Maintenance: batch of minor changes (#183)
* exclude build artifacts from git
* build: simplify building by using src-d/ci
* bench: simplify&fix shell runners
* build: simplify benchmarks* targets
* test: remove dependency on single test suite
* doc: rel image link + linguist cli difference highlight
* suggestions from code review
* bench: add fail fast to all shell runners

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
2018-12-27 11:55:34 +01:00
a7cfa65953 tests: Add testcases for empty filenames
Signed-off-by: Alfredo Beaumont <alfredo.beaumont@gmail.com>
2017-12-07 16:42:02 +01:00
8ddce8bc4b Added cases with nil and empty content to TestGetLanguagesByModeline
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
2017-11-08 14:41:31 +01:00
f6649550f0 fixed test for GetLanguagesByShebang function
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
2017-11-08 14:41:31 +01:00
c97a180da5 Fix review suggestions
Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
2017-10-26 15:51:02 +02:00
250519bb51 Add the external test linguist dir from env var
This allows to use a cached directory with linguist instead of cloning and speeds up the tests by -10s on my local machine.

Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
2017-09-28 23:51:38 +02:00
3303cf7824 Fix 🐛 on file starting with single shebang 2017-07-25 10:37:11 +02:00
510c430fd0 fixed some tests that were not using a temp-linguist-repo 2017-07-18 13:31:34 +02:00
d8798c2dd9 binary files are returned as OtherLanguage by GetLanguage 2017-07-04 11:38:43 +02:00
3f2248084e Moved commit.go to data directory 2017-06-28 11:22:42 +02:00
b7d4be5fdd commit against tests run is fixed
renamed tmpLinguist to repoLinguist and SimpleLinguistTestSuite to EnryTestSuit in common_test.go

changed receiver's name for TestSuites to 's'

fixed comments
2017-06-26 15:35:53 +02:00
bea1bc3af8 split GetLanguage into GetLanguage and GetLanguages 2017-06-15 13:02:59 +02:00
beda5b73e7 changed signatures for strategies 2017-06-15 10:07:23 +02:00
5f0e92b1a8 changed test LinguistCorpus to use GetLanguage and fail if not assert 2017-06-15 10:07:23 +02:00
ba53e10c7b renamed package and cli to enry 2017-06-13 14:18:23 +02:00
0d5dff1979 changes in the API, ready to version 2 2017-06-06 11:30:23 +02:00
5b304524d1 Rearranged code 2017-06-02 09:33:55 +02:00
2bbd7ec440 unified GetLanguage function 2016-07-18 16:20:12 +02:00
bead3a606f tests 2016-07-13 22:21:18 +02:00