crab report wrongly reports lumis as both failed and successful for Autom.Split #5328

belforte · 2024-07-15T07:57:56Z

ref: https://cms-talk.web.cern.ch/t/command-crab-report-not-generate-filed-files/43487

User submitted task 240704_211221:matheus_crab_EGammaEXO_Run2018A_20240704_231147 with Automatic splitting, which means that if a job at step one fails,
the lumisections that it did not process will be processed by a later stage.
This means that it is possible for a lumisection to have been assigned to a job that failed,
and then re-assigned to a job at a later stage that was successful.
In particular, the following lumisections [2] of run 316457 are both in
processedLumis.json and failedLumis.json, while they are not in notFinishedLumis.json.

[2]

{516, 12, 14, 30, 32, 66, 86, 601, 114, 116, 117, 1146, 1147, 129, 1155, 1156, 1159, 136, 1161, 1167, 656, 1169, 661, 662, 663, 664, 1177, 1180, 1181, 1185, 1190, 1192, 1194, 1195, 1196, 1197, 1198, 1199, 1207, 185, 1209, 723, 724, 225, 226, 228, 229, 232, 233, 234, 242, 271, 276, 280, 281, 283, 288, 294, 299, 307, 309, 823, 312, 824, 316, 326, 329, 330, 331, 332, 337, 341, 342, 351, 355, 356, 359, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 380, 394, 395, 396, 397, 398, 400, 402, 403, 407, 409, 410, 412, 416, 425, 429, 440, 448, 489, 492, 497, 499, 503, 504}

belforte · 2024-07-15T08:22:02Z

@mapellidario can you tell me how to reproduce the list above ? I can not find a failedLumis.json file what crab report writed

mapellidario · 2024-07-15T08:25:23Z

you need to use the --recovery=failed option, which will not produce any output unless you apply the PR that I submitted on Friday

belforte · 2024-07-15T08:26:36Z

but I can't find the code which writes that file !

belforte · 2024-07-15T08:27:31Z

anyhow the output is there

belforte · 2024-07-15T08:30:00Z

ok. found

belforte · 2024-07-15T08:37:14Z

this loop should be modified to only look at tail jobs when automatic splitting is used. Even then, when some tail fail, the input will be reprocessed in additional tail subdags, We need to somehow backtrace the job number hierarchy...

CRABClient/src/python/CRABClient/Commands/report.py

Lines 189 to 198 in 05e977b

    
           elif self.options.recovery == 'failed': 
        
               for jobid, status in reportData['jobList']: 
        
                   if status in ['failed']: 
        
                       for run, lumiRanges in lumisToProcessPerJob[jobid].items(): 
        
                           if run not in notProcessedLumis: 
        
                               notProcessedLumis[run] = [] 
        
                           for lumiRange in lumiRanges: 
        
                               notProcessedLumis[run].extend(range(lumiRange[0], lumiRange[1]+1)) 
        
               notProcessedLumis = LumiList(runsAndLumis=notProcessedLumis).getCompactList() 
        
               notProcLumisCalcMethMsg += " the lumis to process by jobs in status 'failed'."

I start to wonder if we should rather remove --failed option !

belforte · 2024-07-15T08:38:01Z

is there any bad side effect from using simply toProcess - Processed ?

mapellidario · 2024-07-15T09:34:07Z

I start to wonder if we should rather remove --failed option !

After spending a good 15min thinking about how to make this option work for automatic splitting, I came to the very same conclusion. If 2 people out of the 2 who thought about this came to the same conclusion i guess we have enough quorum to proceed! :)

And if we wonder about how impact on our users: well, it was broken, so nobody really could have ever used this feature!

is there any bad side effect from using simply toProcess - Processed ?

no, there is not. And if I understand correctly, this is exactly what the default crab report option does

CRABClient/src/python/CRABClient/Commands/report.py

Lines 171 to 172 in 65a8e7a

    
           if self.options.recovery == 'notFinished': 
        
               notProcessedLumis = BasicJobType.subtractLumis(lumisToProcess, processedLumis)

belforte · 2024-07-15T13:04:12Z

At the very least we need to deprecate --failed in case of automatic splitting. In a way, when there's a reliable list of failed jobs, the ability to directly show what lumis were to be processed in those jobs may be good.
POssibly this is also less work.

mapellidario mentioned this issue Jul 15, 2024

crab report wrongly reports lumis as both failed and successful for Autom.Split #5327

Closed

belforte self-assigned this Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crab report wrongly reports lumis as both failed and successful for Autom.Split #5328

crab report wrongly reports lumis as both failed and successful for Autom.Split #5328

belforte commented Jul 15, 2024 •

edited

Loading

belforte commented Jul 15, 2024

mapellidario commented Jul 15, 2024

belforte commented Jul 15, 2024

belforte commented Jul 15, 2024

belforte commented Jul 15, 2024

belforte commented Jul 15, 2024

belforte commented Jul 15, 2024

mapellidario commented Jul 15, 2024

belforte commented Jul 15, 2024

crab report wrongly reports lumis as both failed and successful for Autom.Split #5328

crab report wrongly reports lumis as both failed and successful for Autom.Split #5328

Comments

belforte commented Jul 15, 2024 • edited Loading

belforte commented Jul 15, 2024

mapellidario commented Jul 15, 2024

belforte commented Jul 15, 2024

belforte commented Jul 15, 2024

belforte commented Jul 15, 2024

belforte commented Jul 15, 2024

belforte commented Jul 15, 2024

mapellidario commented Jul 15, 2024

belforte commented Jul 15, 2024

belforte commented Jul 15, 2024 •

edited

Loading