Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

todos #1

Open
12 of 20 tasks
aobo-y opened this issue Nov 5, 2018 · 1 comment
Open
12 of 20 tasks

todos #1

aobo-y opened this issue Nov 5, 2018 · 1 comment
Assignees

Comments

@aobo-y
Copy link
Owner

aobo-y commented Nov 5, 2018

  • implement pre-trained word embedding @aobo-y
  • south park/the simpsons data preprocessing @jiahao42
  • implement Beam search in construct response @jiahao42
  • pretrain with movie data, then train with southpark data @ddddwy
  • personal embedding @ddddwy
  • implement Blue @quq99
  • try Ocean @quq99

lower priority

  • support LSTM @ddddwy
  • auto load latest possible checkpoint
  • return responses generated by beam search randomly @jiahao42
  • deploy bots to telegram @jiahao42
  • handle 'm 've 're.... in normalize
  • split train mode into pretrain & finetune to avoid keep changing config @jiahao42
  • decrease TEACHER_FORCING_RATIO after a number of iterations @aobo-y
  • train glove embedding with our dataset
  • figure out more appropriate LR (we are 0.0001, much smaller than default) & CLIP
  • make bot say something actively
  • save frequent checkpoints & auto purge too many checkpoints
  • combine encoder decoder as one module seq2seq
  • general data has lines of 3 parts (it contains 2 \t)
@aobo-y aobo-y assigned aobo-y, jiahao42, wyu-du and quq99 and unassigned aobo-y, jiahao42 and wyu-du Nov 6, 2018
@aobo-y
Copy link
Owner Author

aobo-y commented Nov 27, 2018

I tried 2 pretrained GloVe embedding and filtered it with our corpus, the result is as below

glove.42B.300d.zip

size of the data: 49928
size of the embedding: 1917495
size of the filtered embedding: 45116

glove.6B.zip

size of the data: 49928
size of the embedding: 400001
size of the filtered embedding: 40724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants