enry

File programming language detector and toolbox to ignore binary or vendored files. *enry*, started as a port to _Go_ of the original [linguist](https://github.com/github/linguist) _Ruby_ library, that has an improved *performance of 100x*.

Installation
------------

The recommended way to install simple-linguist

```
go get gopkg.in/src-d/enry.v1/...
```

To build enry's CLI you must run

    make build-cli

it generates a binary in the project's root directory called `enry`. You can move this binary to anywhere in your `PATH`.

Examples
--------

```go
lang, safe := enry.GetLanguageByExtension("foo.go")
fmt.Println(lang) // result: Go

lang, safe := enry.GetLanguageByContent("foo.m", "")
fmt.Println(lang) // result: Matlab

lang, safe := enry.GetLanguageByContent("bar.m", "")
fmt.Println(lang) // result: Objective-C

// all strategies together
lang := enry.GetLanguage("foo.cpp", "")
```

Note the returned boolean value "safe" is set either to true, if there is only one possible language detected or, to false otherwise.

To get a list of possible languages for a given file, you can use the plural version of the detecting functions.

```go
langs := enry.GetLanguages("foo.h", "")
// result: []string{"C++", "C"}

langs := enry.GetLanguagesByExtension("foo.asc", "", nil)
// result: []string{"AGS Script", "AsciiDoc", "Public Key"}

langs := enry.GetLanguagesByFilename("Gemfile", "", []string{})
// result: []string{"Ruby"}
```

CLI
-------------

You can use enry as a command,

```bash
$ enry --help
  enry, A simple (and faster) implementation of github/linguist
  usage: enry <path>
         enry <path> [--json] [--breakdown]
         enry [--json] [--breakdown]
```

and it will return an output similar to *linguist*'s output,

```bash
$ enry
11.11%  Gnuplot
22.22%  Ruby
55.56%  Shell
11.11%  Go
```

but not only the output, also its flags are the same as *linguist*'s ones,

```bash
$ enry --breakdown
11.11%  Gnuplot
22.22%  Ruby
55.56%  Shell
11.11%  Go

Gnuplot
plot-histogram.gp

Ruby
linguist-samples.rb
linguist-total.rb

Shell
parse.sh
plot-histogram.sh
run-benchmark.sh
run-slow-benchmark.sh
run.sh

Go
parser/main.go
```

even the JSON flag,

```bash
$ enry --json
{"Gnuplot":["plot-histogram.gp"],"Go":["parser/main.go"],"Ruby":["linguist-samples.rb","linguist-total.rb"],"Shell":["parse.sh","plot-histogram.sh","run-benchmark.sh","run-slow-benchmark.sh","run.sh"]}
```

Note that even if enry's CLI is compatible with linguist's, its main point is that, contrary to linguist, **_enry doesn't need a git repository to work!_**

Development
-----------

*enry* re-uses parts of original [linguist](https://github.com/github/linguist) especially data in `languages.yml` to generate internal data structures. In oreder to update to latest upstream run

    make clean code-generate

To run the tests

    make test

Divergences from linguist
--------------------------

Using [linguist/samples](https://github.com/github/linguist/tree/master/samples) as a set against run tests the following issues were found:

* with [hello.ms](https://github.com/github/linguist/blob/master/samples/Unix%20Assembly/hello.ms) we can't detect the language (Unix Assembly) because we don't have a matcher in contentMatchers (content.go) for Unix Assembly. Linguist uses this [regexp](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb#L300) in its code, `elsif /(?