Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FeatureUnhasher does not support an input_type of dict #236

Open
manyu90 opened this issue Aug 8, 2017 · 6 comments
Open

FeatureUnhasher does not support an input_type of dict #236

manyu90 opened this issue Aug 8, 2017 · 6 comments

Comments

@manyu90
Copy link

manyu90 commented Aug 8, 2017

The current implementation only supports input types of String. It will be nice to have a FeatureUnhasher which accepts Featurehashers of input type dict

@kmike
Copy link
Contributor

kmike commented Aug 9, 2017

As I recall, we added FeatureUnhasher mainly to support HashingVectorizer, so we started with 'string'.

On a first sight, adding input_type='dict' support it a matter of removing the exception, changing the way _term_counts attribute is computed, and adding some tests.

I don't have immediate plans to implement this feature, but it looks like a good problem for new contributors, so pull requests are welcome!

@mohdkashif93
Copy link

Is it okay if I work on this issue, I mean if nobody else is working on this?

@coderop2
Copy link

coderop2 commented Mar 5, 2019

@kmike as i was going through tests there are no tests for the function featureunhasher..?

@kmike
Copy link
Contributor

kmike commented Mar 5, 2019

@coderop2 right; adding them can be a good first step. It is tested only indirectly, by testing InvertableHashingVectorizer which uses FeatureUnhasher internally.

@coderop2
Copy link

coderop2 commented Mar 5, 2019 via email

@kmike
Copy link
Contributor

kmike commented Mar 5, 2019

@coderop2 yes, this works. Alternatively, one can start by adding tests for existing FeatureHasher, to get their feet wet; this would be a smaller change which can be merged separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants