Skip to content

dirac-dms-remove-catalog-replicas should remove LFN from file catalog if it deletes the last replica of a file #7076

Open
@marianne013

Description

@marianne013

Related to #7075

While this is not the only way this can happen, we think the most likely cause for a database entry for a file with zero replicas was the following sequence of events:

  • A number of files were lost on disk
  • The file catalogue was tidied up using
    dirac-dms-remove-catalog-replicas specifying the misbehaving storage
    element
  • If a lost file was the last replica, dirac-dms-remove-catalog-replicas leaves an entry in the
    database behind
  • This entry can be removed with dirac-dms-remove-catalog-files, but given
    an (all too common) scenario of files being lost as a specific storage element
    dirac-dms-remove-catalog-replicas seems the obvious choice of tools to tidy up the catalogue as other replicas are unaffected.
  • This then would require an extra check by the user to see if it was
    the last replica. They will not be aware of this. The differences in
    output are also rather subtle (see below) and do not indicate a problem even to an experienced admin.

This scenario can be reproduced with the following sequence:

(base) gridpp_py3 > dirac-dms-add-file /t2k.org/user/d/reptest.txt reptest.txt UKI-LT2-IC-HEP-disk

Uploading /t2k.org/user/d/reptest.txt
Successfully uploaded file to UKI-LT2-IC-HEP-disk

[now sneakily delete file on disk to simulate storage meltdown]

Try to clean up the catalog:

(base) gridpp_py3 >
dirac-dms-remove-catalog-replicas /t2k.org/user/d/reptest.txt UKI-LT2-IC-HEP-disk
Successfully remove 1 catalog replicas at UKI-LT2-IC-HEP-disk

It still knows the LFN:

(base) gridpp_py3 > dirac-dms-lfn-replicas/t2k.org/user/d/reptest.txt
No output

That's rather subtle and you only notice that something is amiss once you realize that the output for a truly non-existent LFN is different:

(base) gridpp_py3 > dirac-dms-lfn-replicas /t2k.org/user/d/reptest.txt0
LFN                          StorageElement URL
===============================================
/t2k.org/user/d/reptest.txt0  Unknown        No such file or directory

Apply the nuclear option:

(base) gridpp_py3 > dirac-dms-remove-catalog-files /t2k.org/user/d/reptest.txt
Successfully removed 1 catalog files.

(base) lx04:2023_May_16_1234_gridpp_py3 > dirac-dms-lfn-replicas/t2k.org/user/d/reptest.txt
LFN                         StorageElement URL
==============================================
/t2k.org/user/d/reptest.txt Unknown        No such file or directory

There are probably other ways database entries can get into this state, but this is one of the more likely scenarios.

Can you please fix the following issues:

  • dirac-dms-remove-catalog-replicas should delete the file catalog entry if the replica it removes is the last of its kind (1 bonus point)
  • Instead of returning "No output" dirac-dms-lfn-replicas should return an error if there are zero replicas of a file as this is an error state (i.e. not foreseen in the DIRAC code) (1 bonus point)

Once your bonus stamp card is full, you can claim your free beer.

Tagging @sfayer so he knows it's all filed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions