Skip to content

[data] Split out long running scaling test #54045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

omatthew98
Copy link
Contributor

Why are these changes needed?

Test test_arrow_block has become flakey, often failing because it times out on test_arrow_batch_gt_2gb which is a scaling test to see if the arrow code works with a single 2gb batch. This splits out that test into its own suite to see if that will reduce the likelihood of a timeout (limit should be 180s). If this does not work, the next step will be to try running this on a larger worker.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@omatthew98 omatthew98 requested a review from a team as a code owner June 24, 2025 18:58
@omatthew98 omatthew98 added the go add ONLY when ready to merge, run all tests label Jun 24, 2025
@omatthew98 omatthew98 requested a review from bveeramani June 24, 2025 20:46
del batch
del ds
# Force GC to free up object store memory
gc.collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add the if __name__ = "__main__" stuff for this test to actually run

from ray.data import DataContext
from ray.data._internal.util import GiB
from ray.data.tests.test_arrow_block import (
parquet_dataset_single_column_gt_2gb, # noqa: F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also move this? I think it's only used by the the one test

Signed-off-by: Matthew Owen <[email protected]>
@bveeramani bveeramani enabled auto-merge (squash) June 25, 2025 18:04
@bveeramani bveeramani merged commit c081542 into ray-project:master Jun 25, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants