Fix dataset index when iterating for tokenization by timothyngo · Pull Request #138 · KempnerInstitute/tatm

timothyngo · 2025-12-18T20:26:32Z

There was a bug where the dataset index used len(self.datasets) instead of len(self.dataset_iters). This meant that the index would not reflect the reduced number of available iterators as the iterators were exhausted. This PR fixes this bug so that the iterators are properly iterated through without out-of-bounds errors.

timothyngo added 4 commits December 18, 2025 15:22

Add log

329d39f

More logs

646c88d

More logs

9e14250

Fix dataset idx and clear logs

f465ebb

timothyngo changed the title ~~Add log~~ Fix dataset index Dec 19, 2025

timothyngo changed the title ~~Fix dataset index~~ Fix dataset index when iterating for tokenization Dec 19, 2025

timothyngo marked this pull request as ready for review December 19, 2025 02:28

Naeemkh approved these changes Dec 22, 2025

View reviewed changes

timothyngo merged commit 91435e1 into dev Dec 22, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dataset index when iterating for tokenization#138

Fix dataset index when iterating for tokenization#138
timothyngo merged 4 commits intodevfrom
deadlock

timothyngo commented Dec 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timothyngo commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timothyngo commented Dec 18, 2025 •

edited

Loading