# enry [![GoDoc](https://godoc.org/gopkg.in/src-d/enry.v1?status.svg)](https://godoc.org/gopkg.in/src-d/enry.v1) [![Build Status](https://travis-ci.org/src-d/enry.svg?branch=master)](https://travis-ci.org/src-d/enry) [![codecov](https://codecov.io/gh/src-d/enry/branch/master/graph/badge.svg)](https://codecov.io/gh/src-d/enry) File programming language detector and toolbox to ignore binary or vendored files. *enry*, started as a port to _Go_ of the original [linguist](https://github.com/github/linguist) _Ruby_ library, that has an improved *2x performance*. Installation ------------ The recommended way to install simple-linguist ``` go get gopkg.in/src-d/enry.v1/... ``` To build enry's CLI you must run make build-cli it generates a binary in the project's root directory called `enry`. You can move this binary to anywhere in your `PATH`. Examples ------------ ```go lang, safe := enry.GetLanguageByExtension("foo.go") fmt.Println(lang) // result: Go lang, safe := enry.GetLanguageByContent("foo.m", "") fmt.Println(lang) // result: Matlab lang, safe := enry.GetLanguageByContent("bar.m", "") fmt.Println(lang) // result: Objective-C // all strategies together lang := enry.GetLanguage("foo.cpp", "") ``` Note the returned boolean value "safe" is set either to true, if there is only one possible language detected or, to false otherwise. To get a list of possible languages for a given file, you can use the plural version of the detecting functions. ```go langs := enry.GetLanguages("foo.h", "") // result: []string{"C++", "C"} langs := enry.GetLanguagesByExtension("foo.asc", "", nil) // result: []string{"AGS Script", "AsciiDoc", "Public Key"} langs := enry.GetLanguagesByFilename("Gemfile", "", []string{}) // result: []string{"Ruby"} ``` CLI ------------ You can use enry as a command, ```bash $ enry --help enry, A simple (and faster) implementation of github/linguist usage: enry enry [--json] [--breakdown] enry [--json] [--breakdown] ``` and it will return an output similar to *linguist*'s output, ```bash $ enry 11.11% Gnuplot 22.22% Ruby 55.56% Shell 11.11% Go ``` but not only the output, also its flags are the same as *linguist*'s ones, ```bash $ enry --breakdown 11.11% Gnuplot 22.22% Ruby 55.56% Shell 11.11% Go Gnuplot plot-histogram.gp Ruby linguist-samples.rb linguist-total.rb Shell parse.sh plot-histogram.sh run-benchmark.sh run-slow-benchmark.sh run.sh Go parser/main.go ``` even the JSON flag, ```bash $ enry --json {"Gnuplot":["plot-histogram.gp"],"Go":["parser/main.go"],"Ruby":["linguist-samples.rb","linguist-total.rb"],"Shell":["parse.sh","plot-histogram.sh","run-benchmark.sh","run-slow-benchmark.sh","run.sh"]} ``` Note that even if enry's CLI is compatible with linguist's, its main point is that, contrary to linguist, **_enry doesn't need a git repository to work!_** Development ------------ *enry* re-uses parts of original [linguist](https://github.com/github/linguist) to generate internal data structures. In order to update to latest upstream and generate the necessary code you must run: go generate We update enry due to changes in linguist's master branch related to the following files: * [languages.yml](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml) * [heuristics.rb](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb) * [vendor.yml](https://github.com/github/linguist/blob/master/lib/linguist/vendor.yml) * [documentation.yml](https://github.com/github/linguist/blob/master/lib/linguist/documentation.yml) For the moment we don't have any procedure established to detect changes in the linguist project automatically and regenerate the code. So we are updating the generated code as needed, without any specific criteria. If you want update *enry* because of changes in linguist, you can run the *go generate* command and do a pull request that only contains the changes in generated files (those files in the subdirectory [data](https://github.com/src-d/enry/tree/master/data)). To run the tests make test Divergences from linguist ------------ Using [linguist/samples](https://github.com/github/linguist/tree/master/samples) as a set against run tests the following issues were found: * with [hello.ms](https://github.com/github/linguist/blob/master/samples/Unix%20Assembly/hello.ms) we can't detect the language (Unix Assembly) because we don't have a matcher in contentMatchers (content.go) for Unix Assembly. Linguist uses this [regexp](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb#L300) in its code, `elsif /(?