Correct how the temp bed files are created by davep · Pull Request #9 · shenlab-sinai/diffreps

davep · 2019-04-11T16:22:09Z

This is slightly related to #4 and also to another issue that has been observed, the latter where some of the resulting bed files could end up with "corrupt" content, but never in a consistent way.

There appears to be a problem with how sep_chrom_bed creates the bed files. The naming seems designed to try and avoid clashes but, unfortunately, stands a good chance of ensuring clashes in some situations.

The main problem is that, rather than using conventional methods of generating temp files, a file name is generated from time, is checked for, and if it looks like it already exists a new name is generated with rand (the use of + rather than . in that being the cause of #4). Unfortunately, if multiple processes are being used, this can result in the same sequence of possible file names being generated.

Note that Parallel::ForkManager is used to handle parallel processes, and that it has this warning about rand.

My thinking here is that it might be possible for more than one process to arrive on the same file name at the same time, and then both write to the same file and cause issues and apparent corruption.

This pull request contains changes that switch to using perl's own tempfile function to generate the file names.

This fixes a problem where sep_chrom_bed could easily create clashed file names, while working to actually not create clashed file names. The problem is that, rather than using tempfile, the code was using seconds since the Unix epoch and then, if it looked like a clash was possible, adding a random number to the end of it. The problem here is that, mixed with a fork (which it would do), it would create multiple forks that all followed the sane random sequence. See: https://metacpan.org/pod/Parallel::ForkManager#USING-RAND()-IN-FORKED-PROCESSES for why. Long story short: the code that would try and ensure there wasn't a filename clash would almost guarantee that there was. So this change switches to using an actual temp file name generation function to create the temporary bed file names.

- Separate the chromosome from the rest of the name. - Have a slightly longer run of template characters. - Had .bed as a suffix. None of this should have the code really work any differently, but it should make the file names easier on the eye when looking at tmp.

hsuh001 · 2020-09-24T03:57:56Z

Hello,

I'm trying to use diffReps.pl script and have installed it using cpanm diffReps-1.55.3.tar.gz with all the dependencies.
But when I am running it, I keep getting the error messages:

"Cannot delete file .1600851433.95585.chr7.bed: No such file or directory".

And the result file diff.nb.txt only includes headers, with no other results.

So I refer to your advice and replace MyShortRead.pm with your modified one. Something seems to get better that
the result file diff.nb.txt includes some results but only a part, e.g. only in "chr1". Noted that I still keep getting the error messages:

"Open bed file /tmp/diffreps-chr10-XHhRSQVv27.bed error: No such file or directory"
"Cannot delete file /tmp/diffreps-chr7-0NAoqUewWz.bed: No such file or directory"

Should I address this problem via specifying "TMPDIR" with a local path in line 268 of your modified MyShortRead.pm? Or any other ideas?

Thanks for your time,
Jing Xiao

davep · 2020-09-24T08:45:38Z

@hsuh001 I'd suggest raising this as an issue for the author if I were you; I don't personally have any experience with this software and just happen to be a software developer who noticed a very particular issue with a bit of perl code.

hsuh001 · 2020-09-24T10:19:29Z

@davep hi, I have got this problem addressed and run diffReps.pl script successfully. Thanks for your reply.

davep added 3 commits April 11, 2019 10:22

Tidy trailing whitespace

2172660

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct how the temp bed files are created#9

Correct how the temp bed files are created#9
davep wants to merge 3 commits intoshenlab-sinai:masterfrom
davep:bugfix

davep commented Apr 11, 2019

Uh oh!

hsuh001 commented Sep 24, 2020

Uh oh!

davep commented Sep 24, 2020

Uh oh!

hsuh001 commented Sep 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

davep commented Apr 11, 2019

Uh oh!

hsuh001 commented Sep 24, 2020

Uh oh!

davep commented Sep 24, 2020

Uh oh!

hsuh001 commented Sep 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants