-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Shujian2015 edited this page Feb 9, 2018
·
8 revisions
- Len of text
- Mean price of each category
- Mean of brand/shipping
- Average of word embeddings: Lookup all words in Word2vec and take the average of them. paper, Github Quora
- Better way to remove stop word cached
- Reduce TF time
- strategy
- Ridge: performance/computation time trade off
- ensemble averaging
- Why Ridge is much better than other sklearn models
- Efficient Way to do TFIDF
- Using log price as Dependent Variable But becarefull with those "without zero price" kernel, as it also remove it from the validation set it makes local CV score useless. If you want to remove zero price,, remove it inside the fold, so the validation set still resemble the original dataset, and then your CV score shall resemble LB
- Wordbatch(TFIDF) vs WordSequence
- Best single model
- Wordbatch for preprocessing and modeling
- Surpass 0.40000
- LB shake up
- CNN or RNN: Best single model
- Rewrite the code:
- Tune: dropout/FC layers
- Use averaged GloVe for TF
- Other features for TF: Quora solutions
- No 1: Number of capital letters, question marks etc...
- No 3: We used TFIDF and LSA distances, word co-occurrence measures (pointwise mutual information), word matching measures, fuzzy word matching measures (edit distance, character ngram distances, etc), LDA, word2vec distances, part of speech and named entity features, and some other minor features. These features were mostly recycled from a previous NLP competition, and were not nearly as essential in this competition.
- No 8 -> a lot
- Combine (condition and shipping)
- Concatination of brand, item description and product name