Skip to content

Conversation

deependujha
Copy link
Collaborator

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

  • stream.py
import litdata as ld
import lightning as L
from litdata.streaming.dataloader import LitDataLoader

if __name__ == "__main__":
	fabric = L.Fabric(accelerator="cpu",devices=4, strategy="ddp")
	fabric.launch()

	ds = ld.StreamingDataset("fast_data")
	dl = LitDataLoader(ds, batch_size = 4, shuffle=True)

	print("-"*100)
	for batch in dl:
		print(f"{batch=}")
  • output:
❯ python stream.py
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/4
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/4
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/4
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 4 processes
----------------------------------------------------------------------------------------------------

sys:1: UserWarning: A newer version of litdata is available (0.2.56). Please consider upgrading with `pip install -U litdata`. Not all functionalities of the platform can be guaranteed to work with the current version.
sys:1: UserWarning: A newer version of litdata is available (0.2.56). Please consider upgrading with `pip install -U litdata`. Not all functionalities of the platform can be guaranteed to work with the current version.
sys:1: UserWarning: A newer version of litdata is available (0.2.56). Please consider upgrading with `pip install -U litdata`. Not all functionalities of the platform can be guaranteed to work with the current version.
sys:1: UserWarning: A newer version of litdata is available (0.2.56). Please consider upgrading with `pip install -U litdata`. Not all functionalities of the platform can be guaranteed to work with the current version.
Worker chunks: [5, 6, 3, 9, 10]
Worker intervals: [[250, 250, 300, 300], [300, 300, 350, 350], [150, 150, 200, 200], [450, 450, 500, 500], [500, 500, 550, 550]]
----------------------------------------------------------------------------------------------------
batch={'data': 5}
batch={'data': 6}
batch={'data': 3}
batch={'data': 9}
batch={'data': 10}
Worker chunks: [1, 17, 2, 8, 15]
Worker intervals: [[50, 50, 100, 100], [850, 850, 900, 900], [100, 100, 150, 150], [400, 400, 450, 450], [750, 750, 800, 800]]
----------------------------------------------------------------------------------------------------
batch={'data': 1}
batch={'data': 17}
batch={'data': 2}
batch={'data': 8}
batch={'data': 15}
Worker chunks: [0, 14, 7, 11, 4]
Worker intervals: [[0, 0, 50, 50], [700, 700, 750, 750], [350, 350, 400, 400], [550, 550, 600, 600], [200, 200, 250, 250]]
----------------------------------------------------------------------------------------------------
batch={'data': 0}
batch={'data': 14}
batch={'data': 7}
batch={'data': 11}
batch={'data': 4}
Worker chunks: [12, 16, 13, 19, 18]
Worker intervals: [[600, 600, 650, 650], [800, 800, 850, 850], [650, 650, 700, 700], [950, 950, 1000, 1000], [900, 900, 950, 950]]
----------------------------------------------------------------------------------------------------
batch={'data': 12}
batch={'data': 16}
batch={'data': 13}
batch={'data': 19}
batch={'data': 18}

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant