Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dirac-dms-remove-catalog-replicas should remove LFN from file catalog if it deletes the last replica of a file #7076

Open
1 of 2 tasks
marianne013 opened this issue Jun 23, 2023 · 2 comments

Comments

@marianne013
Copy link
Contributor

marianne013 commented Jun 23, 2023

Related to #7075

While this is not the only way this can happen, we think the most likely cause for a database entry for a file with zero replicas was the following sequence of events:

  • A number of files were lost on disk
  • The file catalogue was tidied up using
    dirac-dms-remove-catalog-replicas specifying the misbehaving storage
    element
  • If a lost file was the last replica, dirac-dms-remove-catalog-replicas leaves an entry in the
    database behind
  • This entry can be removed with dirac-dms-remove-catalog-files, but given
    an (all too common) scenario of files being lost as a specific storage element
    dirac-dms-remove-catalog-replicas seems the obvious choice of tools to tidy up the catalogue as other replicas are unaffected.
  • This then would require an extra check by the user to see if it was
    the last replica. They will not be aware of this. The differences in
    output are also rather subtle (see below) and do not indicate a problem even to an experienced admin.

This scenario can be reproduced with the following sequence:

(base) gridpp_py3 > dirac-dms-add-file /t2k.org/user/d/reptest.txt reptest.txt UKI-LT2-IC-HEP-disk

Uploading /t2k.org/user/d/reptest.txt
Successfully uploaded file to UKI-LT2-IC-HEP-disk

[now sneakily delete file on disk to simulate storage meltdown]

Try to clean up the catalog:

(base) gridpp_py3 >
dirac-dms-remove-catalog-replicas /t2k.org/user/d/reptest.txt UKI-LT2-IC-HEP-disk
Successfully remove 1 catalog replicas at UKI-LT2-IC-HEP-disk

It still knows the LFN:

(base) gridpp_py3 > dirac-dms-lfn-replicas/t2k.org/user/d/reptest.txt
No output

That's rather subtle and you only notice that something is amiss once you realize that the output for a truly non-existent LFN is different:

(base) gridpp_py3 > dirac-dms-lfn-replicas /t2k.org/user/d/reptest.txt0
LFN                          StorageElement URL
===============================================
/t2k.org/user/d/reptest.txt0  Unknown        No such file or directory

Apply the nuclear option:

(base) gridpp_py3 > dirac-dms-remove-catalog-files /t2k.org/user/d/reptest.txt
Successfully removed 1 catalog files.

(base) lx04:2023_May_16_1234_gridpp_py3 > dirac-dms-lfn-replicas/t2k.org/user/d/reptest.txt
LFN                         StorageElement URL
==============================================
/t2k.org/user/d/reptest.txt Unknown        No such file or directory

There are probably other ways database entries can get into this state, but this is one of the more likely scenarios.

Can you please fix the following issues:

  • dirac-dms-remove-catalog-replicas should delete the file catalog entry if the replica it removes is the last of its kind (1 bonus point)
  • Instead of returning "No output" dirac-dms-lfn-replicas should return an error if there are zero replicas of a file as this is an error state (i.e. not foreseen in the DIRAC code) (1 bonus point)

Once your bonus stamp card is full, you can claim your free beer.

Tagging @sfayer so he knows it's all filed.

@chaen
Copy link
Contributor

chaen commented Jun 24, 2023

Part of it is in f6f0e37

For remove-catalog-replicas, why not just using dirac-dms-remove-replicas ? it will tell you if you are trying to delete the last replicas.

Also, this behavior is rather inconsistent with the rest of the system, which prevents you to call removeReplicas on the last replicas.

@marianne013
Copy link
Contributor Author

Sorry, my google filter ate the reply.
dirac-dms-remove-replicas was ruled out as it would not be able to remove the replica as the file is already gone on disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants