Add documentation comments to package tokenizer.

Although this package is internal, it still exports an API and deserves some
comments. Serves in partial satisfaction of #195.

Signed-off-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
This commit is contained in:
M. J. Fromberger 2019-01-29 10:41:06 -08:00
parent 260dcfe002
commit 4027b494b3

View File

@ -1,3 +1,6 @@
// Package tokenizer implements file tokenization used by the enry file
// classifier. This package is an implementation detail of enry and should not
// be imported by other packages.
package tokenizer package tokenizer
import ( import (
@ -8,6 +11,9 @@ import (
const byteLimit = 100000 const byteLimit = 100000
// Tokenize returns classification tokens from content. The tokens returned
// should match what the Linguist library returns. At most the first 100KB of
// content are tokenized.
func Tokenize(content []byte) []string { func Tokenize(content []byte) []string {
if len(content) > byteLimit { if len(content) > byteLimit {
content = content[:byteLimit] content = content[:byteLimit]