Skip to content

soulprogrammer01/TypeTokenRatio---Reckoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

TypeTokenRatio---Reckoner

A information blog to show the usablitity of the Type-Token-Ratio Measure (TTR) as an introduction to Natural Language Procesing(NLP).

How to run:
python TTR_Rec.py

Lets Start:


STEP 1:

We will have to import our dependencies.

For this script, we are using fantastic NLP library called NLTK.

To install NLTK in your terminal, simply type:

pip install nltk 

We will then import nltk and regex by

import nltk as nlp 
import re 

STEP 2: Declare a string containing our string for which we need to calculate the TTR.

document="""Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken and written -- referred to as natural language. It is a component of artificial intelligence!"""

STEP 3: Remove all special characters using this regex.

document= re.sub(r'[^\w]', ' ', document)

STEP 4: Convert Document to Lower Case

document=document.lower()

Tokenize the document to generate a list of words

tokens=nlp.word_tokenize(document)

STEP 5: Group the tokens and find the count value of each token and store in dict types.

types=nlp.Counter(tokens)

And finally, find the TTR by dividing the length of dict types by length of list tokens

TTR= (len(types)/len(tokens))*100
print(TTR)
And it is that simple! You can now use this simple measure to rank the quality of texts!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages