Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Data Notebook: Spaces in file paths can cause issues with bash commands #137

Open
cdleong opened this issue Dec 12, 2020 · 3 comments

Comments

@cdleong
Copy link
Contributor

cdleong commented Dec 12, 2020

For example, /content/drive/My Drive/masakhane/$src-$tgt-$tag can cause issues, but also the following situation caused an error for me:

source_file = f"/content/drive/My Drive/Research/Hani Machine Translation/hni_story_corpus/v2/hani_story_corpus_train.{source_language}"
target_file = f"/content/drive/My Drive/Research/Hani MachineTranslation/hni_story_corpus/v2/hani_story_corpus_train.{target_language}"

# They should both have the same length.
! wc -l $source_file
! wc -l $target_file

Mitigations we could do:

"MyDrive" instead of "My Drive" helps

Actually, it seems you can just change from using My Drive to MyDrive paths, which helps a lot so long as there aren't spaces elsewhere in the path, e.g. in my case where Hani Machine Translation was in the path to train.eng and train.hni

Add quotes around bash variables

For example
! wc -l "$source_file" instead of wc -l $source_file

and `

! head "$source_file"* instead of ! head "$source_file"*

but this doesn't completely solve it, and can get complicated when we've got some of the more complex cases later in the notebook, like

!cp -r joeynmt/models/${src}${tgt}_transformer/* "$gdrive_path/models/${src}${tgt}_transformer/"

or within the yaml file:

#load_model: "{gdrive_path}/models/{name}_transformer/1.ckpt" # if uncommented, load a pre-trained model from this checkpoint

Warn the user about whitespaces.

Add a section that checks all the paths for white spaces and warns the user that, maybe it would be easier if they just removed them?

Do all our file manipulations with Python

We could rewrite a lot of these to use pathlib

See also pjreddie/darknet#1672 and https://stackoverflow.com/questions/56640534/cannot-open-train-txt-with-white-space-my-drivehe

Originally posted this on masakhane-io/masakhane-community#25, whoops.

@cdleong
Copy link
Contributor Author

cdleong commented Dec 12, 2020

In my case I simply took the spaces out, and that prevented any issues. As in, I used /content/drive/MyDrive/ instead of /content/drive/My Drive/, and also manually renamed my Hani Machine Translation folder to HaniMachineTranslation

I'm currently testing whether I can get the whole notebook to run with spaces left in the path. I'm adding quotations around variables.

@cdleong
Copy link
Contributor Author

cdleong commented Dec 12, 2020

Ah, I think maybe I forgot that you can right-click the Drive name in Google Colab and rename it.
image

I think I changed my drive name to MyDrive and forgot I had done so.

@cdleong
Copy link
Contributor Author

cdleong commented Dec 12, 2020

I will rename it again and see if it breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant