Correct how the temp bed files are created#9
Correct how the temp bed files are created#9davep wants to merge 3 commits intoshenlab-sinai:masterfrom
Conversation
This fixes a problem where sep_chrom_bed could easily create clashed file names, while working to actually not create clashed file names. The problem is that, rather than using tempfile, the code was using seconds since the Unix epoch and then, if it looked like a clash was possible, adding a random number to the end of it. The problem here is that, mixed with a fork (which it would do), it would create multiple forks that all followed the sane random sequence. See: https://metacpan.org/pod/Parallel::ForkManager#USING-RAND()-IN-FORKED-PROCESSES for why. Long story short: the code that would try and ensure there wasn't a filename clash would almost guarantee that there was. So this change switches to using an actual temp file name generation function to create the temporary bed file names.
- Separate the chromosome from the rest of the name. - Have a slightly longer run of template characters. - Had .bed as a suffix. None of this should have the code really work any differently, but it should make the file names easier on the eye when looking at tmp.
|
Hello, I'm trying to use diffReps.pl script and have installed it using
And the result file So I refer to your advice and replace
Should I address this problem via specifying "TMPDIR" with a local path in line 268 of your modified Thanks for your time, |
|
@hsuh001 I'd suggest raising this as an issue for the author if I were you; I don't personally have any experience with this software and just happen to be a software developer who noticed a very particular issue with a bit of perl code. |
|
@davep hi, I have got this problem addressed and run |
This is slightly related to #4 and also to another issue that has been observed, the latter where some of the resulting bed files could end up with "corrupt" content, but never in a consistent way.
There appears to be a problem with how
sep_chrom_bedcreates the bed files. The naming seems designed to try and avoid clashes but, unfortunately, stands a good chance of ensuring clashes in some situations.The main problem is that, rather than using conventional methods of generating temp files, a file name is generated from
time, is checked for, and if it looks like it already exists a new name is generated withrand(the use of + rather than . in that being the cause of #4). Unfortunately, if multiple processes are being used, this can result in the same sequence of possible file names being generated.Note that
Parallel::ForkManageris used to handle parallel processes, and that it has this warning aboutrand.My thinking here is that it might be possible for more than one process to arrive on the same file name at the same time, and then both write to the same file and cause issues and apparent corruption.
This pull request contains changes that switch to using perl's own
tempfilefunction to generate the file names.