Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init Implementation per_stream batching #438

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

schopra8
Copy link

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes #434

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@schopra8 schopra8 requested a review from tchaton as a code owner December 22, 2024 01:07
@schopra8
Copy link
Author

@tchaton When is the CacheDataLoader used vs StreamingDataLoader in conjunction with a CombinedStreamingDataset?

I'm trying to send messages to iterators and process them within their worker loops -- when the iterator must change it's dataset index (i.e., on batch boundaries). I'm getting confused by the difference between _MultiProcessingDataLoaderIterPatch and _StreamingMultiProcessingDataLoaderIter. Seems like the former is used in CacheDataLoader while the latter is used in StreamingDataLoader.

@Borda Borda changed the title Init Implementation per_stream batching Init Implementation per_stream batching Jan 9, 2025
for batch in super().__iter__():
print("Fetched a batch ...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use loggier instead of print

Copy link

codecov bot commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 55.26316% with 17 lines in your changes missing coverage. Please review.

Project coverage is 78%. Comparing base (bd116a4) to head (32abd77).

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #438   +/-   ##
===================================
- Coverage    78%    78%   -0%     
===================================
  Files        35     35           
  Lines      5106   5141   +35     
===================================
+ Hits       4006   4024   +18     
- Misses     1100   1117   +17     

Copy link

gitguardian bot commented Feb 8, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
5685611 Triggered Generic High Entropy Secret 7cf1b41 tests/streaming/test_resolver.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Advanced Batching Logic with CombinedStreamingDataset
3 participants