Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity creation flow can produce orphan files (files pointing to non-existent entities) and db files without the S3 file #7520

Open
daneryl opened this issue Dec 3, 2024 · 0 comments

Comments

@daneryl
Copy link
Collaborator

daneryl commented Dec 3, 2024

Describe the bug
In some circumstances files can be created in the database pointing to a non-existent entity, this can happen because its possible to delete an entity while the file is still being processed, an entity deletion also deletes its files, but at the same time we have the file still in memory and being processed, when the processing ends we save the already deleted file again using upsert instead of update.

To Reproduce
Steps to reproduce the behavior:

  1. Start a blank state Uwazi
  2. Create an entity by uploading a PDF
  3. Delete the entity before the file processing ends (you can achieve this by uploading a big pdf or by explicitly hard coding a time out)
  4. Look at the DB files collection, you should see the file there but not the entity.

Expected behavior
File exists because we use upsert instead of explicit update, a file should not be updated if the file itself does not exists on the db anymore.

Additional context
This issue can produce different scenarios depending on where in the processing process the entity is deleted

  1. Delete entity before the file is saved to the db with processing status
    This produces 2 files without entity (document and thumbnail) but NO missing in storage situation (file exists in the DB but not in storage)

  2. Delete after file has been set to processing but before the actual processing starts
    This produces 2 files without entity (document and thumbnail) and the pdf will be missing in storage (file exists in the DB but not in storage) but not the thumbnail

There are more scenarios with different outputs, but everything points to the same root cause, deleting a file via deleting its entity while the docuement processing is not yet finished

@daneryl daneryl changed the title Entity creation flow can produce orphan files (files pointing to non-existent entities) Entity creation flow can produce orphan files (files pointing to non-existent entities) and db files without the S3 file Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants