-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert len(splits) == 2, "Unable to process %s" % treebank #16
Comments
Well, it's hard to be angry over such a detailed and politely worded error message. The issue is that the script doesn't expect a complete path to the treebank. This works for me:
whereas you are giving I may look into making that a legal thing to do, since it seems like a natural enough option, but for now I suggest just setting your Similarly, I have no difficulty with the Tamil dataset:
|
Thankyou for your immediate response : )
I tried our above suggestionm but got another set of errors : (
I guess it has something to do with my path files, i will try and check them again, and any comments/suggestions from your side will also be helpful. Thanks ! |
I don't know anything about mingw, but on regular windows, I would expect the path to be So for example, on my Windows machine with Java installed, I can do:
|
I'm really sorry for the delayed response from my side,
I'm having Windows, and I'm using GitBash to execute those linux commands on my PC. "MINGW64" in terminal prompt just indicates that I'm using the Git Bash shell on a Windows system. I went through all the steps once again and checked my
But once i give the full path, instead of UD_English-TEST then i get the error which i mentioned in my first comment [https://github.com//issues/16#issue-1909166003] So now it seems I'm stuck between these two errors !, Is there anything I'm missing ? Thanks in Advance ! |
As I mentioned above, based on what I know of Windows paths, your |
Hey thanks for your reply, and it worked ! Now I'm facing another issue when executing the next command,
Since the error is related to some access permission, I ran it as run as administrator, That didn't work. I also searched for the error in stack overflow, and modified the permissions to the required folder and all the permissions are already given and also the file mentioned in the error is created in the directory but at the end it is found no more, i have attached the below images for clarification. If there is anything you can help with, it would be more grateful, I also want to know if there are any alternate solutions for this and sorry again for troubling you with this. Thanks ! |
It looks like your temp folder is not accessible, for whatever reason. I would encourage you not to use admin privileges to work around issues like this, but rather fix the underlying issue. One option would be to change your
Another would be to somehow make it accessible. I found this, but I'm not a Windows expert, so I encourage you to figure it out yourself: https://community.spiceworks.com/topic/2300942-windows-10-temp-folder-access-denied |
Hi,
I'm starting by using the data included in Stanza's packages to learn how Stanza works before attempting to train my own language model for a different data.
When I run the command,
(base) E:\stanza_model_try>python -m stanza.utils.datasets.prepare_tokenizer_treebank E:\stanza_model_try\stanza-train\data\udbase\UD_English-TEST
the following error appears:
2023-09-22 20:10:11 INFO: Datasets program called with: C:\Users\dell\anaconda3\lib\site-packages\stanza\utils\datasets\prepare_tokenizer_treebank.py E:\stanza_model_try\stanza-train\data\udbase\UD_English-TEST Traceback (most recent call last): File "C:\Users\dell\anaconda3\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\dell\anaconda3\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\dell\anaconda3\lib\site-packages\stanza\utils\datasets\prepare_tokenizer_treebank.py", line 1217, in <module> main() File "C:\Users\dell\anaconda3\lib\site-packages\stanza\utils\datasets\prepare_tokenizer_treebank.py", line 1214, in main common.main(process_treebank, common.ModelType.TOKENIZER, add_specific_args) File "C:\Users\dell\anaconda3\lib\site-packages\stanza\utils\datasets\common.py", line 271, in main process_treebank(treebank, model_type, paths, args) File "C:\Users\dell\anaconda3\lib\site-packages\stanza\utils\datasets\prepare_tokenizer_treebank.py", line 1168, in process_treebank short_name = treebank_to_short_name(treebank) File "C:\Users\dell\anaconda3\lib\site-packages\stanza\models\common\constant.py", line 493, in treebank_to_short_name assert len(splits) == 2, "Unable to process %s" % treebank AssertionError: Unable to process E:\stanza_model_try\stanza-train\data\udbase\UD_English-TEST
I referred many issues related to this one and made changes, but nothing worked out, I'm also a beginner to this one. I'm trying to do the same for treebank-UD_Tamil-TTB , but found the exact same error.
File "C:\Users\dell\anaconda3\lib\site-packages\stanza\models\common\constant.py", line 493, in treebank_to_short_name assert len(splits) == 2, "Unable to process %s" % treebank AssertionError: Unable to process E:\stanza_model_try\stanza-train\data\udbase\UD_Tamil-TTB
Below is my config file:
Currently I'm not using NER and any wordvec so commented that part,
Below is my directory...
So i need help to resolve this error and successfully train my own data, please ignore my silly mistakes if there are any, and sorry to open this kind of issues again even though there are many closed issues for the same problem. Any suggestions or help will be more helpful.
Thanks :)
The text was updated successfully, but these errors were encountered: