-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I want to train NER on google Colab as I have windows machine #10
Comments
There's a run_ner.py script which should work fine on windows. If it
doesn't, please let us know and we'll try to help.
As for colab, I have no idea what differences there would be.
Do you have a good dataset for Urdu NER?
|
Yes, I have a dataset. I am sharing paper name and dataset link, can you please guide me. Paper name is "Urdu Named Entity Recognition: Corpus Generation and Deep Learning Applications " and dataset link given in the paper is as: MK-PUCIT can be downloaded from https://www.dropbox.com/sh/1ivw7ykm2tugg94/AAB9t5wnN7FynESpo7TjJW8la .... Please let me know the steps, format of the training file, where to keep it, how to execute it , I have used jupyter notebook , is it OK or do you suggest something different. Thanks a lot |
There's a bunch more information here:
https://stanfordnlp.github.io/stanza/training.html
|
I made files according to the files given in stanza-train, en_sample.train.bio, en_sample.test.bio, en_sample.dev.bio, kept them in the folder ner/training. I ran this command !python run_ner.py en_sample.dev.bio, the error generated is given below: |
I made files according to the files given in stanza-train, en_sample.train.bio, en_sample.test.bio, en_sample.dev.bio, kept them in the folder ner/training. I ran this command !python run_ner.py en_sample.dev.bio, the error generated is given below: |
The directory should be data/ner or whatever NER_DATA_DIR is
…On Mon, Mar 14, 2022, 9:03 AM Fatima-Sajid ***@***.***> wrote:
I made files according to the files given in stanza-train,
en_sample.train.bio, en_sample.test.bio, en_sample.dev.bio, kept them in
the folder ner/training. I ran this command !python run_ner.py
en_sample.dev.bio, the error generated is given below:
2022-03-14 08:41:42 INFO: Training program called with:
run_ner.py en_sample.dev.bio
2022-03-14 08:41:42 DEBUG: en_sample.dev.bio: en_sample.dev.bio
2022-03-14 08:41:42 INFO: en_sample.dev.bio: saved_models/ner/
en_sample.dev.bio_nertagger.pt does not exist, training new model
2022-03-14 08:41:42 WARNING: The data for en_sample.dev.bio is missing or
incomplete. Attempting to rebuild...
2022-03-14 08:41:42 ERROR: Unable to build the data. Please correctly
build the files in data/ner/en_sample.dev.bio.train.json,
data/ner/en_sample.dev.bio.dev.json, data/ner/en_sample.dev.bio.test.json
and then try again.
Traceback (most recent call last):
File "run_ner.py", line 166, in
main()
File "run_ner.py", line 163, in main
common.main(run_treebank, "ner", "nertagger", add_ner_args)
File
"/usr/local/lib/python3.7/dist-packages/stanza/utils/training/common.py",
line 106, in main
temp_output_file.name, command_args, extra_args)
File "run_ner.py", line 90, in run_treebank
prepare_ner_dataset.main(short_name)
File
"/usr/local/lib/python3.7/dist-packages/stanza/utils/datasets/ner/prepare_ner_dataset.py",
line 426, in main
raise ValueError(f"dataset {dataset_name} currently not handled")
ValueError: dataset en_sample.dev.bio currently not handled
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWNYPGSVLODH3RNYHHDU75PLPANCNFSM5QRPE2MA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks |
I don't think anything suggested using "run_ner.py" with the full path
Earlier I meant the files are supposed to go in $NER_DATA_DIR, or in
data/ner if $NER_DATA_DIR is not set, at which point you can use
run_ner.py en_sample
…On Tue, Mar 15, 2022 at 1:08 AM Fatima-Sajid ***@***.***> wrote:
Thanks
but now error is language code, ran this command
!python run_ner.py
/usr/local/lib/python3.7/dist-packages/stanza/data/ner/en_sample.dev.bio ,
error is
2022-03-15 08:04:29 INFO: Training program called with:
run_ner.py
/usr/local/lib/python3.7/dist-packages/stanza/data/ner/en_sample.dev.bio
Traceback (most recent call last):
File "run_ner.py", line 166, in
main()
File "run_ner.py", line 163, in main
common.main(run_treebank, "ner", "nertagger", add_ner_args)
File
"/usr/local/lib/python3.7/dist-packages/stanza/utils/training/common.py",
line 89, in main
short_name = treebank_to_short_name(treebank)
File
"/usr/local/lib/python3.7/dist-packages/stanza/models/common/constant.py",
line 180, in treebank_to_short_name
raise ValueError("Unable to find language code for %s" % lang)
ValueError: Unable to find language code for /usr/local/lib/python3.7/dist
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWP64O5VBXGEU5HKYX3VABARNANCNFSM5QRPE2MA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
kindly, tell ,what I am supposed to do? I understand you did not suggested this..full path. But I don't understand where data/ner directory is or I have to make it ? |
mkdir data
mkdir data/ner
put the data in that folder
…On Tue, Mar 15, 2022, 1:50 AM Fatima-Sajid ***@***.***> wrote:
kindly, tell ,what I am supposed to do? I understand you did not suggested
this..full path. But I don't understand where data/ner directory is or I
have to make it ?
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWJTZ2BNWDNWISFIDKTVABFNTANCNFSM5QRPE2MA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
I made the directory, as you guided, problem still exist, I am pasting commands and errors from jupyter notebook
|
The program will look for the .json files in the data/ner directory, which you may need to create if this is your first time training a Stanza NER model. You can change the expected path by setting the $NER_DATA_DIR environment variable. This is from the page https://stanfordnlp.github.io/stanza/training.html, it says it requires .json file while the other page which shows example data , files are in bio extension, that page link is https://github.com/stanfordnlp/stanza-train in data directory |
Can this link help me for training using dataset, if yes, how? |
If you're using the dev branch as suggested in the stanza-train readme,
run_ner.py will convert the en_sample files for you.
…On Tue, Mar 22, 2022, 4:28 AM Fatima-Sajid ***@***.***> wrote:
https://github.com/stanfordnlp/stanza/blob/de44be871282e05f79f23f5f5e284aceb672726b/stanza/utils/training/run_ner.py
Can this link help me for training using dataset, if yes, how?
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWN3AI6V5MYOGE5APZTVBGVE3ANCNFSM5QRPE2MA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks, kindly look also this matter instruction + error
|
The directory should be data/ner or whatever NER_DATA_DIR is....I am still
confused , I did not find any directory data/ner in stanza, Am I supposed
to make it myself? I made in stanza folder but error..guide me ,thanks
…On Mon, 14 Mar 2022, 9:26 PM John Bauer, ***@***.***> wrote:
The directory should be data/ner or whatever NER_DATA_DIR is
On Mon, Mar 14, 2022, 9:03 AM Fatima-Sajid ***@***.***> wrote:
> I made files according to the files given in stanza-train,
> en_sample.train.bio, en_sample.test.bio, en_sample.dev.bio, kept them in
> the folder ner/training. I ran this command !python run_ner.py
> en_sample.dev.bio, the error generated is given below:
> 2022-03-14 08:41:42 INFO: Training program called with:
> run_ner.py en_sample.dev.bio
> 2022-03-14 08:41:42 DEBUG: en_sample.dev.bio: en_sample.dev.bio
> 2022-03-14 08:41:42 INFO: en_sample.dev.bio: saved_models/ner/
> en_sample.dev.bio_nertagger.pt does not exist, training new model
> 2022-03-14 08:41:42 WARNING: The data for en_sample.dev.bio is missing or
> incomplete. Attempting to rebuild...
> 2022-03-14 08:41:42 ERROR: Unable to build the data. Please correctly
> build the files in data/ner/en_sample.dev.bio.train.json,
> data/ner/en_sample.dev.bio.dev.json, data/ner/en_sample.dev.bio.test.json
> and then try again.
> Traceback (most recent call last):
> File "run_ner.py", line 166, in
> main()
> File "run_ner.py", line 163, in main
> common.main(run_treebank, "ner", "nertagger", add_ner_args)
> File
> "/usr/local/lib/python3.7/dist-packages/stanza/utils/training/common.py",
> line 106, in main
> temp_output_file.name, command_args, extra_args)
> File "run_ner.py", line 90, in run_treebank
> prepare_ner_dataset.main(short_name)
> File
>
"/usr/local/lib/python3.7/dist-packages/stanza/utils/datasets/ner/prepare_ner_dataset.py",
> line 426, in main
> raise ValueError(f"dataset {dataset_name} currently not handled")
> ValueError: dataset en_sample.dev.bio currently not handled
>
> —
> Reply to this email directly, view it on GitHub
> <
#10 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AA2AYWNYPGSVLODH3RNYHHDU75PLPANCNFSM5QRPE2MA
>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
>
> or Android
> <
https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub
>.
>
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AYD774WWMULVQBYYX4KACB3U75SE3ANCNFSM5QRPE2MA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
I simplified the instructions as much as I could a couple months ago. Please go through that and tell me if it helps. |
Thanks a lot, I go through the steps as you have simplified, and inform you.
Thanks again
…On Tue, Oct 11, 2022 at 9:01 PM John Bauer ***@***.***> wrote:
I simplified the instructions as much as I could a couple months ago.
Please go through that and tell me if it helps.
https://stanfordnlp.github.io/stanza/new_language_ner.html
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AYD774SRTFUNSXBFGTMF3ITWCWFMRANCNFSM5QRPE2MA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Is it possible that you guide me about steps to to be taken to train NER for Urdu on Colab or suggest and tell the steps
The text was updated successfully, but these errors were encountered: