The Linguist-defined language IDs are important to our use case because they are
used as database identifiers. This adds a new generator to extract the language
IDs into a map and uses that to implement GetLanguageID.
Because one language has the ID 0, there is no way to tell if a language name is
found or not. If desired, we could add this by returning (string, bool) from
GetLanguageID. But none of the other functions that take language names do this,
so I didn't want to introduce it here.
On Win `make code-generate` produces unreasonable
Bayesian classifier weights from Linguist samples
silently, failing only the final classification tests.
TestPlan:
- go test ./internal/code-generator/... \
-run Test_GeneratorTestSuite -testify.m TestGenerationFiles
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Sync \w Github Linguist v7.2.0
Includes new way of handling `heuristics.yml` and
all `./data/*` re-generated using Github Linguist [v7.2.0](https://github.com/github/linguist/releases/tag/v7.2.0)
release tag.
- many new languages
- better vendoring detection
- update doc on update&known issues.
renamed tmpLinguist to repoLinguist and SimpleLinguistTestSuite to EnryTestSuit in common_test.go
changed receiver's name for TestSuites to 's'
fixed comments