-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Method #1
Comments
thoughts @dhimmel @greenelab? have been drooling over the code in your sci hub study since I found it (https://github.com/greenelab/scihub) :) |
I am unclear whether you are counting outgoing or incoming citations from/to pledgers' works. It sounds like you're counting outgoing citations... i.e. how many citations pledgers made in the past X years. However, isn't it more meaningful to count how many citations from the past X years are incoming to works by pledgers? I.e. if pledgers were to have published fully OA, what percent of all citations would have been redirected to OA works? The tentative method looks good to me. You will need three things it seems:
The best current resource for 1 is Crossref IMO. The best resource for 2 is the I4OC citations now available from Crossref (extracted versions available from several locations now). I agree the best resource for 3 is ORCID. While overall ORCID coverage of authors is low, pledgers are more likely to have filled out their ORCID profiles completely. I4OC citations are probably only 50% of all citations, since some publishers like Elsevier & ACS do not share. However, depending on the exact formulation of your metric, it may be OKAY not to have every citation. |
Sorry for the lack of clarity! Yes the idea is to count how many times articles authored by pledgers were cited (rather than the number of times pledging authors cited other articles), as a proxy for the 'impact' of the community of pledgers. I'll change the description above to make this clearer.
I was thinking that we could just use the Crossref API, because it seems to include a citation count for articles (?) - e.g. if you punch in https://api.crossref.org/works/10.1016/j.neuron.2012.10.038 it spits out a field called "is-referenced-by-count" (equal to 523 for this doi). But I'm unsure where these numbers come from (perhaps the I4OC database?), and how accurate they are, because they're different to the citation counts listed on publishers' websites (which presumably come from Scopus or WoS).
Agreed, the main thing is to ensure that the metric is a fair representation of the broader community, so if the citation counts are relatively accurate they should be good enough. I'm more concerned about low uptake of DOIs in some fields (e.g. 60% in humanities; https://doi.org/10.1016/j.joi.2015.11.008), and how this might skew the results if we use the crossref/DOI method |
I just came across this issue, and wanted to give you a heads-up on some potential problems (for which I unfortunately do not have solutions):
Point 2 is most problematic, I'd guess. |
true, but you can use content negotation https://crosscite.org/docs.html to throw any DOI to resolve URLs
They are from AFAIK internal Crossref data, not from I4OC. yes, i'd expect them to be different from WoS and Scopus. There's also the openurl Crossref service, see Scraping google scholar is tough, but i've seen blog posts describing how it can be done, i think it's pretty painful though since they attempt to block any automated usage. Seems like matching authors to authors of papers might be difficult when there is no ORCID, yes? |
Yes, this is the primary limitation of the above method - authors would need to keep their ORCID profile up to date (but note that to calculate the % of support we only need pledging authors, and not all authors, to be up to date). But it seems to me that whatever alternative method we might adopt (including scraping Google Scholar), there would still need to be some kind of verification process (e.g. automated emails like what ResearchGate uses) to check that the publications we attribute to pledging authors are actually theirs, or else we might trigger thresholds based on false data. So rather than building new lists of publications just for this project, I figured it would be simpler and less error-prone to just integrate with ORCID and request anyone who pledges to keep their profile up to date. On the plus side, pledging authors are more likely to already have an ORCID, and we can also make the case that it's good for your career to keep your ORCID profile up to date. @sckott @Vinnl seems to me that pledging authors keeping their profiles current would (mostly?) resolve the noted problems, would you agree? Is there a better approach do you think? |
Indeed, another limitation. I found this thread by @dhimmel useful here: greenelab/crossref#3 - which references a 1996 study showing that ~99% of Wikipedia citations reference articles with a Crossref DOI. If these stats are comparable for the scholarly literature (not sure if anyone has done a study on this??) we should be fine, because we're only really interested in articles that actually get cited.
Wow, this is cool! As I understand it, content negotiation can reveal metadata from Crossref, DataCite, and mEDRA (currently). Do you know if those other services also track citations? If so, we could just use this approach to access citation counts from all DOI services, rather than restricting ourselves to Crossref... |
That definitely does not sound like the way to go, no.
I agree that that's probably the most viable approach, though I'm not sure how realistic it is - but probably the most realistic out of all options. Do keep in mind, though, that people who would want to artificially inflate support numbers could simply add the most-cited DOIs to their ORCID profile. I'm not sure whether that really is something that would happen, but at least good to not let it take you by surprise if it does.
Unfortunately, as far as I know they don't. But I guess you're right - just limiting yourself to CrossRef DOI's and using the CrossRef API is probably fine, if you consider it to be a sample of support. (Depending, of course, on citation data in the CrossRef API being relatively complete.) |
Good point. I just checked, and ORCID allowed me to add a highly cited DOI to my record, despite none of the authors having my name. So you're right, this could allow people to game the system, but we should be able to solve this with a simple check to ensure that the author's name is contained in the author list |
Just an update to the proposed method, am now planning to use Dimensions (https://www.dimensions.ai) rather than Science Metrix to dissociate research fields (i.e. step 2), because it classifies articles at the article-level (using machine learning... cool :)) rather than at the journal level (which would complicate things for multidisciplinary journals and/or authors) |
For the record I'm closing this issue and porting any future discussions to the discussion repository, so that this repo can be used exclusively for platform code-related issues |
See README.md for background on the problem and goals of this project. This issue is for discussion / planning of the best way to measure 'support' in the academic community for a particular pledge/campaign. 'Support' will be quantified as the proportion of citations that reference articles (or other research outputs) produced by pledgers in the last X years (controlling for time since publication).
Tentative method:
Limitation: Only <90% of papers have a DOI, and this varies wildly by field (particularly low for humanities). Alternatively, we would get better coverage scraping Google Scholar (e.g. https://github.com/alberto-martin/googlescholar), but could lead to more errors and be harder to disambiguate authors (unless we require members to update their GS profile?).
Another limitation: me - totally new to bibliometrics - so looking for feedback on the method and ideally help to develop the platform! (:
The text was updated successfully, but these errors were encountered: