Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Bitdeli Badge to README #178

Merged
merged 92 commits into from
Jul 7, 2015
Merged

Add a Bitdeli Badge to README #178

merged 92 commits into from
Jul 7, 2015

Conversation

bitdeli-chef
Copy link
Contributor

Pull request made by @fccoelho at https://bitdeli.com

fccoelho and others added 30 commits February 22, 2013 17:07
For some reason, in some pypln installations the text extracted from the
document was not getting to the `PalavrasRaw` worker as unicode. This may be
due to previous errors during the decoding process that we fixed earlier. That
meant that, when we got a non-unicode string, python would try to decode it
using the default codec (ascii) in `text.encode(PALAVRAS_ENCODING)`. Since we
know the text came from mongodb, we can just decode it using utf-8 to make sure
we have a unicode object.
In some instalations (in our production server, for example)
`sys.getfilesystemencoding()` was not returning the correct encoding when the
worker was run by pypelinin's broker. This meant that, when sending text to
palavras stdin, we were using the wrong encoding, resulting in a
`UnicodeEncodeError`. This commits forces utf-8 as the encoding to be used when
communicating with the process and also adds some code to make sure we use the
correct encoding everywhere in the workers that depend on palavras.
I'm sorry about so many commits, but in every one of them I thought I had fixed
the bug, only to be suprised after deploy. It is incredibly frustrating to deal
with bugs that only show up in production.
This will make it easier to know if palavras didn't run because it shouldn't
(if the document is not in portuguese, for example) or if there was an error.
Feature/spellchecker 

Merging as it seems to be no objections left.
One of the reasons it was failing was because the test expectations were not up
to date with the code. We have for a while been returning an empty string in
case we can't coerce the content to unicode.
This is really a work in progress. The idea is to write a few workers this way,
to extract common behaviour.
For some reason, using `apply` only worked after using `delay` once and
reruning the tests.
This will keep the logic of getting the keys by the document id in
mongodict.
This commit makes sure we:

- Use CELERY_ALWAYS_EAGER to run the tests syncronously.
- Drop the collection in beetween test cases.
- Delete the testing database after the entire test suite has run.
This class encapsulates the logic of getting document data from mongo and
saving it back, leaving to the tasks themselves only the logic of processing
the data.
flavioamieiro and others added 28 commits May 7, 2015 16:26
fixed minor typos
pymongo.Connection was removed in pymongo>=3. Since we will probably remove
MongoDict as a dependency, we will use version 2.8.1 for now
…in a test

This happened whenever the 'test' index didn't exist in the elasticsearch
server.
This commit introduces a more specific index name. `test` is too generic, with
`test_pypln` we have less chance of stepping on someone else's toes.
Feature/elastic indexer seems fine to me. Merging.
The idea is to use this for the indexing pipeline. Since the document will be
stored in the elastic index anyways, it's better not to have it replicated.
Adds a worker that deletes a file from GridFS

Seems fine. merging
We should not index the original file contents for two reasons: 1) they are not
relevant to the search. The `text` attribute should include the relevant
content and 2) they may be in a binary format that will not be serializable.

Fixes #176
Fixes ElasticIndexer for binary files

seems ok. Merging.
flavioamieiro added a commit that referenced this pull request Jul 7, 2015
Add a Bitdeli Badge to README
@flavioamieiro flavioamieiro merged commit c5732ec into NAMD:master Jul 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants