Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make a custom LeRobotDataset with v2? #547

Open
alik-git opened this issue Dec 4, 2024 · 5 comments
Open

How to make a custom LeRobotDataset with v2? #547

alik-git opened this issue Dec 4, 2024 · 5 comments

Comments

@alik-git
Copy link

alik-git commented Dec 4, 2024

Hi folks, thanks for the amazing open source work!

I am trying to make a custom dataset to use with the LeRobotDataset format.

The readme says to copy the example scripts here which I've done, and I have a working format script of my own.

If your dataset format is not supported, implement your own in `lerobot/common/datasets/push_dataset_to_hub/${raw_format}_format.py` by copying examples like [pusht_zarr](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/pusht_zarr_format.py), [umi_zarr](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/umi_zarr_format.py), [aloha_hdf5](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py), or [xarm_pkl](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/xarm_pkl_format.py).

but when it comes time to create the dataset, the push_dataset_to_hub.py uses LeRobotDataset.from_preloaded which is no longer supported in dataset V2

lerobot_dataset = LeRobotDataset.from_preloaded(

So I'm just wondering what the proper way of loading your own custom local dataset is?

Thank you in advance for your help!

@alik-git alik-git changed the title How to make a custom dataset with v2? How to make a custom LeRobotDataset with v2? Dec 4, 2024
@alik-git
Copy link
Author

alik-git commented Dec 4, 2024

okay so I've found a work around for now. I initialize an empty dataset and add the frames to it, and then I can load it after calling dataset.consolidate(). If this is a proper way to do it, pls lmk and I'll make a PR with updates to the docs.

Otherwise please let me know what the right way to do this is. Thank you! I'll update this issue with my code once I've cleaned it up.

@Robert-hua
Copy link

I encountered the same issue.

@taochenshh
Copy link

@aliberts i also got the same issue, the documentation on how to generate custom dataset is not up to date now (the code doesn't run anymore). could you please up the instruction and relevant scripts for custom dataset generation? thanks

@aliberts
Copy link
Collaborator

Hey there,
Yes, all the push_to_hub script are deprecated in favor of scripts in examples/port_datasets (just one for now).

Basically, you need to create a new empty dataset using LeRobotDataset.create(), then add individual frames using add_frame(), then save the added frames into an episode using save_episode() (which actually saves data).
Then at the end you need to call the consolidate() method to handle a few more things (we will try to get rid of this step in future iterations) before finally calling the push_to_hub() method.

You can find more info about the changes of this new api in the PR (#461)

We will remove push_to_hub.py scripts in the future after adding more equivalent scripts like the one mentioned above in the examples section. Hope this helps!

@aliberts
Copy link
Collaborator

Will update the Readme soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants