From 6b2c55abb20ece2f485b35b5c3e6b87294ca2ea0 Mon Sep 17 00:00:00 2001 From: Juanjo Alvarez Date: Wed, 4 Oct 2017 17:09:58 +0200 Subject: [PATCH 1/2] Grammar and style changes. Signed-off-by: Juanjo Alvarez --- README.md | 61 ++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 38 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index 976fcbb..4044028 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,8 @@ To build enry's CLI you must run make build-cli -it generates a binary in the project's root directory called `enry`. You can move this binary to anywhere in your `PATH`. +this will generate a binary in the project's root directory called `enry`. You can +then move this binary to anywhere in your `PATH`. Examples @@ -40,7 +41,7 @@ lang := enry.GetLanguage("foo.cpp", []byte("")) // result: C++ true ``` -Note the returned boolean value "safe" is set either to true, if there is only one possible language detected or, to false otherwise. +Note the returned boolean value `safe` is set either to `true`, if there is only one possible language detected, or to `false` otherwise. To get a list of possible languages for a given file, you can use the plural version of the detecting functions. @@ -71,7 +72,7 @@ $ enry --help enry [-version] ``` -and it will return an output similar to *linguist*'s output, +and it'll return an output similar to *linguist*'s output, ```bash $ enry @@ -81,7 +82,7 @@ $ enry 11.11% Go ``` -but not only the output, also its flags are the same as *linguist*'s ones, +but not only the output; its flags are also the same as *linguist*'s ones, ```bash $ enry --breakdown @@ -115,7 +116,7 @@ $ enry --json {"Gnuplot":["plot-histogram.gp"],"Go":["parser/main.go"],"Ruby":["linguist-samples.rb","linguist-total.rb"],"Shell":["parse.sh","plot-histogram.sh","run-benchmark.sh","run-slow-benchmark.sh","run.sh"]} ``` -Note that even if enry's CLI is compatible with linguist's, its main point is that, contrary to linguist, **_enry doesn't need a git repository to work!_** +Note that even if enry's CLI is compatible with linguist's its main point is that, contrary to linguist, **_enry doesn't need a git repository to work!_** Java bindings ------------ @@ -125,21 +126,24 @@ Generated Java binidings using a C shared library + JNI are located under [`java Development ------------ -*enry* re-uses parts of original [linguist](https://github.com/github/linguist) to generate internal data structures. In order to update to latest upstream and generate the necessary code you must run: +*enry* re-uses parts of original [linguist](https://github.com/github/linguist) to generate internal data structures. In order to update to the latest upstream and generate the necessary code you must run: go generate -We update enry due to changes in linguist's master branch related to the following files: +We update enry when changes are done in linguist's master branch on the following files: * [languages.yml](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml) * [heuristics.rb](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb) * [vendor.yml](https://github.com/github/linguist/blob/master/lib/linguist/vendor.yml) * [documentation.yml](https://github.com/github/linguist/blob/master/lib/linguist/documentation.yml) -For the moment we don't have any procedure established to detect changes in the linguist project automatically and regenerate the code. So we are updating the generated code as needed, without any specific criteria. +Currently we don't have any procedure established to automatically detect changes in the linguist project and regenerate the code. +So we update the generated code as needed, without any specific criteria. -If you want update *enry* because of changes in linguist, you can run the *go generate* command and do a pull request that only contains the changes in generated files (those files in the subdirectory [data](data)). +If you want to update *enry* because of changes in linguist, you can run the *go +generate* command and do a pull request that only contains the changes in +generated files (those files in the subdirectory [data](data)). -To run the tests +To run the tests, make test @@ -147,46 +151,57 @@ To run the tests Divergences from linguist ------------ -Using [linguist/samples](https://github.com/github/linguist/tree/master/samples) as a set against run tests the following issues were found: -* with [hello.ms](https://github.com/github/linguist/blob/master/samples/Unix%20Assembly/hello.ms) we can't detect the language (Unix Assembly) because we don't have a matcher in contentMatchers (content.go) for Unix Assembly. Linguist uses this [regexp](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb#L300) in its code, +Using [linguist/samples](https://github.com/github/linguist/tree/master/samples) +as a set for the tests, the following issues were found: + +* With [hello.ms](https://github.com/github/linguist/blob/master/samples/Unix%20Assembly/hello.ms) we can't detect the language (Unix Assembly) because we don't have a matcher in contentMatchers (content.go) for Unix Assembly. Linguist uses this [regexp](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb#L300) in its code, `elsif /(? Date: Wed, 4 Oct 2017 17:18:38 +0200 Subject: [PATCH 2/2] tmp Signed-off-by: Juanjo Alvarez --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 4044028..2a80d33 100644 --- a/README.md +++ b/README.md @@ -16,8 +16,7 @@ To build enry's CLI you must run make build-cli -this will generate a binary in the project's root directory called `enry`. You can -then move this binary to anywhere in your `PATH`. +this will generate a binary in the project's root directory called `enry`. You can then move this binary to anywhere in your `PATH`. Examples @@ -41,7 +40,7 @@ lang := enry.GetLanguage("foo.cpp", []byte("")) // result: C++ true ``` -Note the returned boolean value `safe` is set either to `true`, if there is only one possible language detected, or to `false` otherwise. +Note that the returned boolean value `safe` is set either to `true`, if there is only one possible language detected, or to `false` otherwise. To get a list of possible languages for a given file, you can use the plural version of the detecting functions. @@ -116,7 +115,7 @@ $ enry --json {"Gnuplot":["plot-histogram.gp"],"Go":["parser/main.go"],"Ruby":["linguist-samples.rb","linguist-total.rb"],"Shell":["parse.sh","plot-histogram.sh","run-benchmark.sh","run-slow-benchmark.sh","run.sh"]} ``` -Note that even if enry's CLI is compatible with linguist's its main point is that, contrary to linguist, **_enry doesn't need a git repository to work!_** +Note that even if enry's CLI is compatible with linguist's, its main point is that **_enry doesn't need a git repository to work!_** Java bindings ------------ @@ -131,6 +130,7 @@ Development go generate We update enry when changes are done in linguist's master branch on the following files: + * [languages.yml](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml) * [heuristics.rb](https://github.com/github/linguist/blob/master/lib/linguist/heuristics.rb) * [vendor.yml](https://github.com/github/linguist/blob/master/lib/linguist/vendor.yml) @@ -172,14 +172,14 @@ Benchmarks Enry's language detection has been compared with Linguist's one. In order to do that, linguist's project directory [*linguist/samples*](https://github.com/github/linguist/tree/master/samples) was used as a set of files to run benchmarks against. -The following results were obtained: +We got these results: ![histogram](https://raw.githubusercontent.com/src-d/enry/master/benchmarks/histogram/distribution.png) The histogram represents the number of files for which spent time in language detection was in the range of the time interval indicated in the x axis. -So you can see that most of the files were detected quickly in enry. +So you can see that most of the files were detected quicker in enry. We found some few cases where enry turns slower than linguist. This is due to Golang's regexp engine being slower than Ruby's, which uses the [oniguruma](https://github.com/kkos/oniguruma) library, written in C.