-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check and improve CSV Export after Library Pagination #2924
Comments
After checking, we need no changes after the Library Pagination implementation.
@RafaPolit Please let me know if you disagree and/or if this affects the priority of this issue. |
I agreed con Rafa to test a little bit further and re check the scope of this issue. |
I love the spanglish version of "I agreed con Rafa"! :) Yes, please, report about what would be required to do batches of 10.000 for export of large collections. |
After checking, if you try to access a set of documents with a query like <offset:9990, limit: 30> the api will tell you that the results window is too large (>10000) and suggests you to use some of the ES API's meant for that purpose. |
After some research I'm listing my findings here:
|
To be revisited when there is more urgency. For the time being, the workaround is to create filters that produce smaller batches (probably already implemented in every collection) and do partial exports. |
I think this could be a higher priority issue. Case: As a user, I've run into this issue trying to export from the UPR info database to search for keywords related to water and sanitation in records that were not classified by the UPR database as water and sanitation. If I could understand how to paginate the API, I might try that route but it is a bit difficult as a user to figure out how to use the API, get a list of the entities for filters, etc. And how to paginate. For now, there is no clear way to download many records >10000 without many many manual steps. |
Hello @nickdickinson, thank you for your input. cc @RafaPolit @txau: |
I know this is not a "true" solution, but you could try to reach the "less than 10000" goal by adding extra filters in the right hand side panel. For example, once you have the filter you need that throws more than 10000 results, try to find another parameter that would create exclusive searches. Options:
This is, as mentioned above, a temporary workaround until this is solved properly. Hope this helps. |
Thanks @fnocetti and @RafaPolit. I appreciate you considering the issue as I can imagine it coming up more often for researchers. The workaround is ok if it is not a repeated task. I think also for the user it will be great to be able to download a CSV, even if it has more than 10,000 records. For now it is not preventing my main goal, which is to build a reporting database for Sanitation and Water for All, a coalition of countries and other stakeholders. We want to be able to report per country, what percent of water and sanitation recommendations have been "supported". This is easy to do manually with the filter for the Human Right to Water and Sanitation as it is only a few hundred recommendations. Presumably I could also use the API to refresh the database a few times per year. I noticed that there are just as many recommendations mentioning water and/or sanitation not classified as the HR to water and sanitation but do seem associated so basically I wanted to research this. Perhaps I will try to use the search queries. Thanks again. |
I have written a PoC for this and works nicely.
|
No description provided.
The text was updated successfully, but these errors were encountered: