Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: create batch-editing options #330

Open
jechols opened this issue Sep 4, 2024 · 1 comment
Open

Epic: create batch-editing options #330

jechols opened this issue Sep 4, 2024 · 1 comment
Assignees
Labels
epic Needs to be broken down and then closed

Comments

@jechols
Copy link
Member

jechols commented Sep 4, 2024

This ticket outlines what we would like to get done as part of the Scoop MVP in terms of making it easier for non-developers to pull, fix, and reingest batches.

I see a small number of high-level "things" we can probably make happen without violating the NDNP spec or making massive changes to ONI.

Each option could have its own dedicated UI instead of trying to force a general purpose "editor" feature in.

Stuff we have to keep in mind for all situations

The implementation, no matter which situation we see, will differ depending on batch status:

  • live means we still have the issues' files on disk, and nothing has been archived. This is the easiest scenario.
  • live_archived means we still have files on disk, but the batch has been archived. We still have stuff on disk, so we can fairly easily regenerate batches or allow re-curation of issues, but we'll have to at least mention the archive needs a fix. NCA can't directly touch the dark archive.
  • live_done is the toughest: we don't have files on disk anymore. We'll have to pull files from the live batch, which means we won't have TIFFs. This could complicate things a lot. Might have to require the user to actually copy down the archived batch or something.

We'll need to present batch information in all cases. For MVP, we only support batches that were generated by NCA, as processing non-NCA batches is a bigger task. If there's time, we could create a batch reader of some kind, but that's probably not likely.

Question: how to handle dark archives? Just make a note that people will have to fix that themselves?

Situations

Issue Removal

Some number of issues need to be removed from a batch, but most are fine. Maybe issues have higher-quality replacements or maybe they need metadata re-entered.

  • Show a list of issues and let the user pick which ones shouldn't be in the batch. Similar to the existing batch QC view, but for a live issue.
  • Using the live files somehow, generate a new batch with just the remaining files. Same name as the live batch, but one version higher.
  • Show user how to purge the old batch and load the new one.
  • Pull the bad issues into NCA for re-curation or just outright deletion. Similar, again, to the batch QC page.

Bulk edit

There is some kind of "search and replace" operation we need to run. It might span multiple batches. There are likely a variety of filters, not just a simple replace of every value matching some search.

Some examples:

  • All issues with LCCN A after publication date B need their LCCN changed to C
  • All issues with LCCN A need their MARC Org Code changed to B

These cases make a lot more sense to generate a batch patch rather than trying to pull issues and stuff them back into NCA. Especially the second case, given how MOCs work in NCA.

A batch patch will probably be something we need to standardize in some way. We'll probably want a general-purpose script that reads some kind of list of filters and directives, then finds and fixes batches appropriately. We'll need to document how to apply these on a reingest of data, and make it clear users with archived batches will need to preserve the patches with exactly the same amount of care they preserve their batches.

Delete Batch

An entire batch needs to be pulled and all issues just need to get back into NCA for some reason. Maybe it's a small batch that shouldn't have gone out yet (embargo rules were bad) or maybe the issues all need bulk edits, but in a way that just doesn't work well with whatever "batch patch" we come up with.

This scenario is the most time-consuming for users and would need some warnings. All issues would go back into NCA. They could keep their metadata, or be destroyed, but they're all basically treated as if they never were in a batch. They will have to be rebatched the same way any other issues are.

@jechols jechols added this to Scoop Sep 4, 2024
@jechols jechols added the epic Needs to be broken down and then closed label Sep 4, 2024
@jechols jechols self-assigned this Sep 4, 2024
@jechols jechols moved this to In progress in Scoop Sep 4, 2024
@jechols
Copy link
Member Author

jechols commented Sep 4, 2024

See #231 for historical plans around the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Needs to be broken down and then closed
Projects
Status: In progress
Development

No branches or pull requests

1 participant