You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
My team is currently working on removing PII information from text data that are in South East Asian languages. When using the PIIDeIdentifier for these specific languages, it throws the following error: ValueError: No matching recognizers were found to serve the request. It seems that it only has support for English language.
Describe the solution you'd like
It would be helpful if PII can be detected in South East Asian languages (e.g Bahasa Indonesia, Thai, Vietnamese)
Describe alternatives you've considered
The underlying package used is Presidio. Presidio uses Spacy and Stanza NER models as part of its detection. There are models available in SpaCy and Stanza that supports some of the South East Asian languages. They can be adapted for this use case
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
My team is currently working on removing PII information from text data that are in South East Asian languages. When using the PIIDeIdentifier for these specific languages, it throws the following error:
ValueError: No matching recognizers were found to serve the request.
It seems that it only has support for English language.Describe the solution you'd like
It would be helpful if PII can be detected in South East Asian languages (e.g Bahasa Indonesia, Thai, Vietnamese)
Describe alternatives you've considered
The underlying package used is Presidio. Presidio uses Spacy and Stanza NER models as part of its detection. There are models available in SpaCy and Stanza that supports some of the South East Asian languages. They can be adapted for this use case
The text was updated successfully, but these errors were encountered: