Skip to content

Fix dataset index when iterating for tokenization#138

Merged
timothyngo merged 4 commits intodevfrom
deadlock
Dec 22, 2025
Merged

Fix dataset index when iterating for tokenization#138
timothyngo merged 4 commits intodevfrom
deadlock

Conversation

@timothyngo
Copy link
Collaborator

@timothyngo timothyngo commented Dec 18, 2025

Closes KEM-44

There was a bug where the dataset index used len(self.datasets) instead of len(self.dataset_iters). This meant that the index would not reflect the reduced number of available iterators as the iterators were exhausted. This PR fixes this bug so that the iterators are properly iterated through without out-of-bounds errors.

@timothyngo timothyngo changed the title Add log Fix dataset index Dec 19, 2025
@timothyngo timothyngo changed the title Fix dataset index Fix dataset index when iterating for tokenization Dec 19, 2025
@timothyngo timothyngo marked this pull request as ready for review December 19, 2025 02:28
@timothyngo timothyngo merged commit 91435e1 into dev Dec 22, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants