Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should dedup_citations always return list? #199

Open
LukasWallrich opened this issue Aug 2, 2024 · 3 comments
Open

Should dedup_citations always return list? #199

LukasWallrich opened this issue Aug 2, 2024 · 3 comments
Labels
Internal Internal functionality Question Further information is requested

Comments

@LukasWallrich
Copy link
Collaborator

I believe we decided at some point that - in line with ASySD - dedup_citations should always return a list, with deduped citations in unique and (optionally) potential pairs for manual deduplication in another item. However, currently, it sometimes returns a dataframe (when manual = FALSE) and sometimes a list (when manual = TRUE) - which makes it less intuitive to use, and harder to write examples.

Does anyone object to changing it in that way? It means that we need to call dedup_result$unique to access the deduplicated citations, but can obviously assign that to unique_citations in any vignettes if we want to.

@LukasWallrich
Copy link
Collaborator Author

LukasWallrich commented Aug 2, 2024

I've just seen that this behaviour was framed as a bug in #137 - so we went back and forth on this.

If we keep it as is, we need to write in each example:

# Option A - if you used manual = TRUE during deduplication:

count_sources(dedup$unique)

# Option B - if you used manual = FALSE during deduplication:

count_sources(dedup)

Alternatively, I'd be happy to remove the manual option from dedup_citations, so that this always returns a dataframe - and offer a separate dedup_citations_with_manual that always returns a list? @TNRiley @kaitlynhair what do you think?

@TNRiley
Copy link
Collaborator

TNRiley commented Aug 5, 2024

While the split function seems cleaner in my mind, I think the split argument is better in line with ASySD.

It would be great to chat about this at our next meeting if we can get everyone on the call. I don't know if I can think of all the possible impacts or pros/cons outside of what @LukasWallrich mentioned, and I think there would be more. I'll email everyone and see if we can all make it.

@TNRiley TNRiley added Question Further information is requested Internal Internal functionality labels Aug 5, 2024
@LukasWallrich
Copy link
Collaborator Author

Wait for camaradesuk/ASySD#49 - ideally dedup_citations() should behave the same in both packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internal Internal functionality Question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants