- 
                Notifications
    You must be signed in to change notification settings 
- Fork 92
Helpers
        yooper edited this page Oct 28, 2017 
        ·
        5 revisions
      
    Helpers help simplify the process of text analysis.
$tokens = tokenize($text);You can customize which type of tokenizer to tokenize with by passing in the name of the tokenizer class
$tokens = tokenize($text, \TextAnalysis\Tokenizers\PennTreeBankTokenizer::class);The default tokenizer is \TextAnalysis\Tokenizers\GeneralTokenizer::class . Some tokenizers require parameters to be set upon instantiation.
By default, normalize_tokens uses the function strtolower to lowercase all the tokens. To customize the normalize function, pass in either a function or a string to be used by array_map.
$normalizedTokens = normalize_tokens(array $tokens); $normalizedTokens = normalize_tokens(array $tokens, 'mb_strtolower');
$normalizedTokens = normalize_tokens(array $tokens, function($token){ return mb_strtoupper($token); });The call to freq_dist returns a FreqDist instance.
$freqDist = freq_dist(tokenize($text));By default bigrams are generated.
$bigrams = ngrams($tokens);Customize the ngrams
// create trigrams with a pipe delimiter in between each word
$trigrams = ngrams($tokens,3, '|');