-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noref more permissive gnd uri handling #114
Noref more permissive gnd uri handling #114
Conversation
idutils/utils.py
Outdated
@@ -81,8 +81,10 @@ | |||
https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier | |||
""" | |||
|
|||
gnd_resolver_url = "d-nb.info/gnd/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think gnd_resolver_url is not needed any more.
It's used at the beginning of the validate function, but I don't that's needed after this code change
Line 238 in 181b3db
def is_gnd(val): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its also used at
Line 240 in 26318e4
if val.startswith(gnd_resolver_url): |
idutils/utils.py
Outdated
gnd_regexp = re.compile( | ||
r"(gnd:|GND:)?(" | ||
rf"(gnd:|GND:|http://{re.escape(gnd_resolver_url)}|https://{re.escape(gnd_resolver_url)})?(" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use ? instead of repeating http and https like
Line 62 in 181b3db
r"(pmid:|https?://pubmed.ncbi.nlm.nih.gov/)?(\d+)/?$", flags=re.I |
I would also lean toward putting the url in the regex instead of having a variable
idutils/normalizers.py
Outdated
if val.startswith("http://" + gnd_resolver_url): | ||
val = val[len("http://" + gnd_resolver_url) :] | ||
elif val.startswith("https://" + gnd_resolver_url): | ||
val = val[len("https://" + gnd_resolver_url) :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just use the regex in the normalize function like
idutils/idutils/normalizers.py
Line 73 in 181b3db
def normalize_pmid(val): |
I updated the regex to match the one from https://www.wikidata.org/wiki/Property:P227 : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, Thanks!
idutils/validators.py
Outdated
if val.startswith("d-nb.info/gnd/"): | ||
val = val[len("d-nb.info/gnd/") :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: is this actually needed now that the regex contains the URL? I feel this if ...
clause would only match identifiers without the http(s)
protocol in front, i.e. d-nb.info/gnd/12345
. Maybe this logic can be captured in the regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this can probably be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed that by chaning the regex accordingly, see https://regex101.com/r/C9VJpH/1
… remove additional check in validators
❤️ Thank you for your contribution!
Description
This PR aims at simplifying the process of adding a GND ID to a person in the creatibutors modal. Until now users could only paste the actual GND ID, not one of the GND URIs. When researching a GND URI the Website of the german national library looks like this:
So most people will just copy and paste this URI. When then saving or publishing the draft in RDM there will be a "Creators: No valid scheme recognized for identifier." error, which in our opinion is a major inconvenience.
This PR allows for pasting both
http://d-nb.info/gnd/<id>
andhttps://d-nb.info/gnd/<id>
in addition toGND:<id>
,gnd:<id>
and<id>
. The normalization step is the same as before.Checklist
Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge: