Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove (or make optional) nltk dependency #33

Open
IsaacHaze opened this issue Nov 19, 2013 · 6 comments
Open

Remove (or make optional) nltk dependency #33

IsaacHaze opened this issue Nov 19, 2013 · 6 comments

Comments

@IsaacHaze
Copy link
Contributor

Currently we pull in nltk for doing:

  • tokenizing text with a regexp
  • generating ngrams from a list
  • PunktSentenceTokenizer

The first two point are easy, the last one can (should?) be made optional (if you're dealing with document and want to split them into sentences.)

@IsaacHaze
Copy link
Contributor Author

isaac@u024529 [master] git grep nltk
README.md:   * nltk
semanticizer/processors/semanticize.py:from nltk import regexp_tokenize
semanticizer/processors/semanticize.py:from nltk.util import ngrams as nltk_ngrams
semanticizer/processors/semanticize.py:                for ngram in nltk_ngrams(token_list, n):
semanticizer/processors/semanticizer.py:from nltk.tokenize.punkt import PunktSentenceTokenizer
isaac@u024529 [master] 

@dodijk
Copy link
Contributor

dodijk commented Nov 19, 2013

Yes, would be nice to remove this dependency, but the last point is indeed not that easy to remove. We do want to support longer documents, so we need some kind of sentence splitting built in. Unless of course @larsmans comes up with his fancy new super fast matching algorithm...

@IsaacHaze
Copy link
Contributor Author

He has a fancy super fast matching algorithm? I thought he did a levenshtein implementation?

@dodijk
Copy link
Contributor

dodijk commented Nov 19, 2013

He can multitask and said he wanted to do something about the matching today…

@IsaacHaze
Copy link
Contributor Author

+1

@larsmans
Copy link
Contributor

Did my big mouth speak for itself again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants