Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Too many progress bars when using Data with Train #47735

Open
vladjohnson opened this issue Sep 18, 2024 · 2 comments
Open

[Data] Too many progress bars when using Data with Train #47735

vladjohnson opened this issue Sep 18, 2024 · 2 comments
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks

Comments

@vladjohnson
Copy link

What happened + What you expected to happen

Screenshot

Hey guys, looking for the way to fix this mess... tqdm is creating a bunch of progress bars and my logs keep growing the notebook to a massive size. I've tried setting RAY_DATA_DISABLE_PROGRESS_BARS=1, but that did not help. How do I either turn off the progress bars or ideally, make them work as they are supposed to (one single progress bar)?

Thanks

Versions / Dependencies

Ray version: 2.35.0
Python version: 3.11.9
OS: Ubuntu 20.04

Reproduction script

trainer = TorchTrainer(
        demo_train_loop_per_worker,
        train_loop_config={
            "experiment_name": "demo_experiment",
            "tracking_uri": "file:~/.cache/mlruns",
            "train_batch_size": 1000,
            "num_epochs": 100,
        },
        datasets={
            "train": train_ds,
        },
        scaling_config=ray.train.ScalingConfig(
            num_workers=1,
            use_gpu=True,
        ),
    )

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@vladjohnson vladjohnson added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 18, 2024
@scottjlee
Copy link
Contributor

scottjlee commented Sep 18, 2024

Thanks for reporting the issue; this behavior is definitely not expected.

Especially with setting RAY_DATA_DISABLE_PROGRESS_BARS=1, this should definitely disable the progress bar.
My first thought is that you should pass this env var into the ray runtime env. For example, if you are using ray.init(), you can pass it into env_vars (see the docs). This will ensure all workers get this env var, and disables progress bars properly.

You can also explicitly set the variable in DataContext:

ctx = ray.data.DataContext.get_current()
ctx.enable_progress_bars = False

If the above doesn't work, I also have a few other temporary fixes to suggest:

(1)

ctx = ray.data.DataContext.get_current()
ctx.use_ray_tqdm = False

This disables the special tqdm implementation for distributed settings, which Ray Data uses to manage progress bars across multiple workers.

(2) Another temporary workaround that might work:

ctx.enable_operator_progress_bars = False

this disables operator-level progress bars, so that it only shows the top-level global progress bar. Although this won't resolve the issue completely, it will at least help reduce the output spam.

@scottjlee scottjlee added P1 Issue that should be fixed within a few weeks data Ray Data-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 18, 2024
@scottjlee scottjlee changed the title [Ray Train] Logging Hell in VS Code's Jupyter Editor [Data] Logging Hell in VS Code's Jupyter Editor Sep 18, 2024
@vladjohnson
Copy link
Author

Thank you so much, @scottjlee! Highly appreciated

@richardliaw richardliaw changed the title [Data] Logging Hell in VS Code's Jupyter Editor [Data] Too many progress bars when using Data with Train Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

2 participants