Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read local csv file with remote ray clusters #6634

Closed
SiRumCz opened this issue Oct 5, 2023 · 2 comments
Closed

Cannot read local csv file with remote ray clusters #6634

SiRumCz opened this issue Oct 5, 2023 · 2 comments
Labels
question ❓ Questions about Modin Triage 🩹 Issues that need triage

Comments

@SiRumCz
Copy link

SiRumCz commented Oct 5, 2023

I have below env vars

export MODIN_EXPERIMENTAL_GROUPBY=True
export MODIN_RAY_CLUSTER=True
export RAY_ADDRESS="ray://xx.xxx.xxx.xxx:10001"

and when I run below code, it will give me FileNotFoundError:

fpath = "sample.csv" # relative path, same parent dir as the script

import pandas
import modin.pandas as mpd

df = pandas.read_csv(fpath) # no error
mdf = pd.read_csv(fpath) # FileNotFoundError: [Errno 2] No such file or directory: <full path of the file> the parent dir is the path of the ray node.

but if I unset both MODIN_RAY_CLUSTER and RAY_ADDRESS, a local ray instance will be created and file will be loaded without an error. Could anyone confirm if this is a bug, or I am using it wrong, how can I load it without first load into local vanilla pandas dataframe.

I also tried things like:

  1. upload the file to server and parse the path to be the absolute path of that file in the server, still getting FileNotFoundError
  2. unset the RAY env vars and init ray inside of the script with the address, still same error

I am using [email protected] because I need to use it on python 3.8, and my machine that calls the script is a WSL2 on Win 11, and my ray clusters are on ubuntu LTS 22.04.

@SiRumCz SiRumCz added question ❓ Questions about Modin Triage 🩹 Issues that need triage labels Oct 5, 2023
@Garra1980
Copy link
Collaborator

Yes, this is expected, some details can be found eg here - #3179

@SiRumCz
Copy link
Author

SiRumCz commented Oct 6, 2023

@Garra1980 thank you, after making file available to all workers (including my local machine), read_csv will work. But to be honest, I was expecting Modin can read my local file and distribute the DataFrame automatically, wonder if that can be made as a future feature.

@SiRumCz SiRumCz closed this as completed Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question ❓ Questions about Modin Triage 🩹 Issues that need triage
Projects
None yet
Development

No branches or pull requests

2 participants