Error in computing Local Citations with Scopus Input #467

jwhwaal · 2024-05-30T08:47:25Z

I have noted there is an error when computing Local Citations with a Scopus input file. The problem does not present itself when using a Web of Science input file. In the histNetwork function there is a filter to remove false positives, based on the PP field. However, not all papers do have PP (page numbers). E.g. Journal of Cleaner Production only uses an article identifier, e.g. Van der Waal, Johannes WH, and Thomas Thijssens. "Corporate involvement in sustainable development goals: Exploring the territory." Journal of Cleaner Production 252 (2020): 119625.
It is here: CR <- CR %>%
dplyr::filter(!is.na(PY), (substr(CR$PP,1,1) %in% 0:9))
or here: CR <- CR %>%
left_join(M_merge, join_by("PY", "AU"), relationship = "many-to-many") %>%
dplyr::filter(!is.na(Included)) %>%
group_by(PY,AU) %>%
mutate(toRemove = ifelse(!is.na(PP.y) & PP.x!=PP.y, TRUE,FALSE)) %>% # to remove FALSE POSITIVE

massimoaria · 2024-06-06T10:35:14Z

Unfortunately, Scopus has changed the way it stores references and, to date, there is no way to identify them uniquely (the string does not include the DOI!).
The use of the PP field is just one 'good enough' strategy we have chosen to adopt, but unfortunately, it does lead to errors in some cases. Errors that would also be there if we decided to adopt a different strategy.

We are of course open to accepting proposals on alternative strategies for identifying local citations in Scopus.

jwhwaal · 2024-06-06T13:32:29Z

Indeed, the reference does not include the DOI and so lacks a unique key. I have coded a solution that works quite well, but is not fast: extracting the title fields from the references (assuming it is the longest string, most often the case), and then computing the (cosine or levenshtein) similarity with the TI fields of the local corpus. I got very good results on my corpus. To make it faster and avoid problems with truncated titles (which I had in one instance), the similarity matching could be tried on truncated titles (e.g. 100 characters).

massimoaria · 2024-06-20T08:44:15Z

Thanks, I appreciate your suggestion.
We have already tried this solution but it is too computationally expensive to be used daily.
We also tried to identify publication years to divide references into subgroups (by year) and then compute similarity only among titles associated with the same publication year.
The solution is faster than the previous one but unsatisfactory again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in computing Local Citations with Scopus Input #467

Error in computing Local Citations with Scopus Input #467

jwhwaal commented May 30, 2024

massimoaria commented Jun 6, 2024

jwhwaal commented Jun 6, 2024

massimoaria commented Jun 20, 2024 •

edited

Loading

Error in computing Local Citations with Scopus Input #467

Error in computing Local Citations with Scopus Input #467

Comments

jwhwaal commented May 30, 2024

massimoaria commented Jun 6, 2024

jwhwaal commented Jun 6, 2024

massimoaria commented Jun 20, 2024 • edited Loading

massimoaria commented Jun 20, 2024 •

edited

Loading