-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allow soft-link with docker, allowing singularity to use soft-li… #6676
base: develop
Are you sure you want to change the base?
fix: allow soft-link with docker, allowing singularity to use soft-li… #6676
Conversation
…nking and a wider variety of cacheing strategies
Did you test this in real life? Due to the mounting system in containers soft-links may not work at all. This is why they are rightfully banned in docker. |
We're trying to run on HPC cluster and would prefer to lower the load on the filesystem as much as possible. If we use any of the hashing based caching mechanisms, it hits the filesystem hard which tends to slow everything down. Our production is currently running with "fingerprint" and hardlink with singularity containers. The samba mounts on the nodes can do 2Gbps and my cromwell server instance maxes it out pretty much right away. On top of that, doing that much IO over a GPFS mount lead to an increase in GPFS buffer size which balooned enough to kill cromwell server process. We'd like to use "path+modtime", so we'd prefer a softlink option. We tested this internally and it works as long as the target location is mounted within the singularity containers at the same location. We also think that cromwell should let the users softlink if they so choose, perhaps with a warning if they're running containers. |
Fingerprint just uses 10MB. And you can set it lower if you like. There is a fingerprint-size option. Did you try limiting the threads on your cromwell instance? You can set them like this:
This will limit the amount of threads to 3. So cromwell can only handle 3 files at the same time. That should massively reduce the load on your filestorage server. |
Oh yeah, you might also be interested in this feature: #4900 |
Thanks @rhpvorderman for the suggestions. We are running wdl pipelines for single cell workloads that have thousands of concurrent tasks working on a dozen files each. Just the filesystem metadata operations alone are an issue for the filesystem, let alone whether the amount of data fetched is small. We were already hitting a wall in job submission speed due to this issue, we've been running cromwell with these changes in production now without issues. Reducing the number of threads would also reduce the task throughput and limit performance. #4900 is not what we need because we dont want to waste time copying when we can just soft-link. I have little doubt this solution is the most optimal for our team. However, I understand your concerns about docker. We are happy to do a little extra work to make this PR palatable to your team, perhaps by adding warnings in the appropriate places? |
I am not part of the cromwell team, so it is not up to me whether this gets merged or not. However, allowing softlinks in containers will give errors for a lot of people who are not aware of the implementation details. Those people will post bug reports on the cromwell bug tracker. If this were to work, I guess the best way is to allow a config override "allow-softlinking-in-containers" with a huge warning in the documentation. That way the unaware will not get caught by surprise as active action needs to be taken to run into this error.
Offtopic: This is not necessarily always the case. Cromwell uses a very large number of threads by default if the server has a lot of cores. Even with the soft-linking strategy I would recommend playing with that setting a little. More threads is not necessarily better. Task and context switching are expensive operations too, not too mention the ability of the filesystem to handle multiple requests at once. |
…nking and a wider variety of cacheing strategies