added debugging to train.py #88

Subhanshusethi · 2024-11-25T22:14:43Z

Added a valid indices check which, while loading the tokens file, ensures that the number of captions matches the number of embeddings. If not, the mismatched entries are filtered out.
Fine-tuning GPT-2 is a major task that highly relies on the dataset. Removed single letters, special characters, and stop words (as defined by NLTK by default) to reduce the impact of connector words while training in the embedding space.

1. Added a valid indices check which, while loading the tokens file, ensures that the number of captions matches the number of embeddings. If not, the mismatched entries are filtered out. 2. Fine-tuning GPT-2 is a major task that highly relies on the dataset. Removed single letters, special characters, and stop words (as defined by NLTK by default) to reduce the impact of connector words while training in the embedding space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added debugging to train.py #88

added debugging to train.py #88

Uh oh!

Subhanshusethi commented Nov 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

added debugging to train.py #88

Are you sure you want to change the base?

added debugging to train.py #88

Uh oh!

Conversation

Subhanshusethi commented Nov 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant