Skip to content

Conversation

@flavioamieiro
Copy link
Member

Adds worker to calculate the FreqDist of a corpus

This is the first draft of a worker that can get a corpus and create an
analysis for it. This first attempt was a freqdist worker, that takes the
freqdist for each document in the corpus and condensates it in a new analysis:
the freqtdist for the entire corpus.

This is a work in progress because I was mainly worried with the basis for this
to work (specially the celery task). I did not pay any attention to the way the
worker itself is working (it's probably doing more work than it needs to), and
it also probably needs more tests.
from utils import TaskTest


class TestCorpusFreqDistWorker(TaskTest):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to test PyPLNCorpusTask separately from CorpusFreqDist? Then later if another subclass of PyPLNCorpusTask is created only the returned dict would need to be checked.

Also, is this hitting an actual mongo instance? If so, would you consider mocking the db methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I was testing both in the same test case (and not testing correctly). I separated the tests and I think it's better now.

It is really hitting an actual mongo instance. This is inherited from the old days when MongoDict was still part of our codebase. It's also one of the reasons our tests are slow. I would be very glad to mock everything and have better, more isolated and quicker tests. I would probably need your help, though @geron :)

…ests

Thanks @geron for pointing out that I was testing everything together
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants