-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing "de Meunier" doesn't recognized the prefix #121
Comments
I have not looked into the code to see for sure, but I believe the parser treats prefixes as a name piece instead of a prefix when there are only 2 total space-separated strings. This would be helpful for people who have first names that clash with prefixes, I'd have to look if there are any specific examples in the tests. I'm curious your use case. Do you have examples in your data that occasionally include only last names and you want the parser to tell you that it is indeed a last name? |
Thanks for the quick answer! I believe my use is what you imply. In this specific case, the text includes three versions of the "name" in different places: "Sergeant de Mesnil", "Walter de Mesnil" and "de Mesnil". After adding "sergeant" as a custom title I get three different parsings: On a side-note: It would be neat if there was an explicit LAST_NAME_TITLE option for titles. This would be handy for military titles like General, Colonel, Major, etc. as well as most nobility titles outside of King/Queen and Lord/Lady. I think it sort of works out-of-the-box, but I was surprised to not see it explicit. |
There is a set of titles that when followed by a single name assume that name is a first name. (It looks like it's not exposed in the documentation though.):
All other titles are handled by the normal rest of the parser process, so assumed to be last names because there's more than one name part. It currently includes King/queen but not Lady/Lord, maybe it should. Wikipedia page seems to make me think it could be either: https://en.wikipedia.org/wiki/Lady |
FYI, I fixed the issue now by manually checking and fixing the output after parsing for the known prefix cases I have in my data. if human_name.first in ['de', 'st', 'st.', 'van']: I think the default behaviour could (should?) be similar to the above. if the original is , the output should be last = + " " + instead of first = & last = Thanks for the pointer on the FIRST_NAME_TITLES. Using it now. |
I'm running into something similar with my name, Patrick van der Leer. in Dutch we call the "van der" part a tussenvoegsel. Even Patrick van Leer gives me "van Leer" as the surname/last_name and nothing for the middle name. EDIT Line 2069 in 8b73ff9
This was not what I was expecting, "van der" would be part of the full surname/last name yes but I would set "van der" as a middle name or prefix of the surname |
I'm not sure if I am missing something, but if I run the parser on the string "de Mesnil", I am expecting it to give me either a first or a last name of "de Mesnil" (preferably the latter), given that "de" is a known prefix.
Instead I am getting a first name "de" and a last name "Mesnil".
That seems contradictory to the documentation for prefixes: Name pieces that appear before a last name. Prefixes join to the piece that follows them to make one new piece.
The text was updated successfully, but these errors were encountered: