-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Past, Present and Future of Open Science (Emergent session): Containers: Ticket to Valhalla or Ticket to the Inferno? #87
Comments
tagging @satra, @dnkennedy, @jaetzel and @pbellec as interested folks based on the Mattermost thread. Did I miss anyone/who else should be tagged? |
Great! Don't forget @stebo85 |
Thx @gllmflndn, was also thinking about @stebo85, but wasn't sure if he could make it given the time zones! @stebo85 is a there a time that would work for you? |
LGTM. I think Gael Varoquaux has pretty strong opinions about 'dangerous bandaid for masking bad software development practices' (could not locate the link, but I think he wrote a blog post about that a while back). It would also be great to have someone speak about reproducibility and containers. Maybe Valerie Hayot? I am happy to be dropped from the discussion, as I don't think I have expertise not covered by others. I guess I could play the role of devil's advocate, as I am not sold on the utility of containers as a software distribution tool. |
Please stay @pbellec, we need a diversity of opinions! Searching for Gael's blog post, I found this: |
tagging @GaelVaroquaux to check if he would be interested and has bandwidth to stop by. |
Happy to complain :). When exactly do you need me? |
thx @GaelVaroquaux, the rock of complains, hehe! Also tagging @hcp4715 who did a lot of work to introduce containers in his lab/institute and certainly has important and interesting points to add. |
@PeerHerholz, FYI, yesterday we have an excellent master student in China wrote a Chinese tutorial that covered the whole process from installing docker, to using heudiconv, and running fmriprep, in both Linux and Windows (you can imagine the frustrations he had experienced ;)). We put is on OSF: https://osf.io/naxgd/ |
Thx @hcp4715, cool! Following up on our conversation in Mattermost: we need a diverse set of experience levels, use cases and backgrounds in order to create a fruitful discussion. So far I think the following have been mentioned (thx @emdupre):
Please discuss and add further groups! |
It looks like there's already a pretty clear mapping:
If you want to add more, then I'd aim for the slots with only one person. But not sure how big you're envisioning this ! |
Any comments on time for this? While there are still a number of open slots? |
All but the 5 am EDT slot are ok for me, either day. 5 am EDT is possible, but very early for me.
|
5 am, 2 pm, and 3 pm EDT works for me (time zone CEST). |
wed: 2,3,9 EDT |
wed, 2 and 3pm EDT? |
No preference on my side, all times would work for me! |
5:00 am, 9pm and 10pm work for us in Australia :) |
Goodmorning from Asia Paciric-
I am marginally interested in containers, in the sense that I had not used
them before coming to Neuroscience, and my experience is surely they are great in
principle in terms of containing the full data and code, they are not user
friendly, and can become an end to themselves. When I saw it running -
docker reminded me of Cobol and it felt like jumping thirty years back,
before Visualbasic. It was painful for me to see hours, if not days weeks and months of
people in the lab trying to learn docker, which would not install, or be awkward. User
guides and tutorials exist aplenty yet people said they wanted to write a
tutorial. Was a bit of a waste of time and felt more like using docker,
getting stuck writing docker tutorials had become a fashionable core activity which was sidetracking resouces that should be devoted to neuroscience instead.
So my contribution to this discussion if you can take it on board is that
containers could be developed to be less geeky, more stable, less brittle - discussed
higher-end interfaces with Datalad panel for example-
That good practices should include multiple methods/tools to carry out
experiments, ultimately a researcher should not be forced to use a
container if they find it awkward, alternatives should be sought and
developed
P
On Fri, Jun 26, 2020 at 7:34 AM Paola Di Maio <[email protected]>
wrote:
…
On Fri, Jun 26, 2020 at 7:09 AM Steffen Bollmann ***@***.***>
wrote:
> 5:00 am, 9pm and 10pm work for us in Australia :)
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#87 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ACFKUCJXXO5XKDN7OHWGSZDRYPKITANCNFSM4OIPKFCQ>
> .
>
|
With huge apologies to the APAC time zone, due to the voting above, Wed 1 Jul 2020 7pm - 8pm (GMT) (Wed 7/1/2020 3:00 PM - 4:00 PM) is the time slot I requested. Can we come up with a way to extend the conversation (more or less formally?) to include the APAC later in the day, perhaps seeded by the initial discussions in the above time slot? I am sorry, this was one of the hardest part of agreeing to be the abstract submitter :-( |
@Starborn, your comments ring so true for me, both the initial reaction at seeing what accessing a container actually looks like, and the amount of time involved. Even the impulse for more and more tutorials ... |
pasting from mattermost. a few questions to consider for discussion
it may also be useful to create and evaluate a set of polls prior to the discussion |
Thanks for organizing this!! |
I would be a happy defense attorney for containers, hammers, chainsaws, and any other useful tool or tech! |
@guiomar :
"Efficiently" - not sure. But you can just place matlab inside and then expose license from outside. I know that
|
Thank you- such great work!
I should first get an account, I ll start looking for a suitable program
that I may be able to join remotely
will synch up after that
PDM
…On Sun, Jun 28, 2020 at 12:07 PM Steffen Bollmann ***@***.***> wrote:
@Starborn <https://github.com/Starborn>, I believe you need to have a
collaboration with an Australian institution to get access to a cluster
running CVL. Data integration into these platforms is crucial but not easy:
at the University of Queensland we also make this seamless to users enabled
by our underlying data management fabric, called MeDiCI (
https://rcc.uq.edu.au/data-storage) - it’s a system consisting of
multiple GPFS caches that automatically transports the data to the right
location.
In VNM we are planning to support multiple ways of getting your data, but
that is all work in progress and not ready yet. We are planning to
integrate datalad and at the moment you can already access your data on the
local disk via a mount point. Would be great to hear what your specific
application and use case is to see if we can soon enable that. Please feel
free to open an issue on https://github.com/NeuroDesk/vnm, describe
exactly what you would like to do and maybe we can get this working pretty
quickly.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#87 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFKUCKMSIVFXSXPR2BP6R3RY26WNANCNFSM4OIPKFCQ>
.
|
Hello All, |
@dnkennedy - was the time decided? and are there any todo's? |
This event has been scheduled to be run on 01.07.2020, 19:00- 20:00 UTC For more information, please go to https://ohbm.github.io/osr2020/schedule/emea |
OK, for better or worse, I've tried to distill what and how we might present this OSR Containers session. We have a handful of invited folks that cover a variety of application areas (software developers, container developers, consumers, and educators). Each presenter gets 4 minutes to briefly say something about the 'good', or the 'bad' or the 'good but difficult' issues with using containers in the 'their real world'. We will keep a time clock, and a scoreboard: https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. I then want to also open it up to the rest of the community for similar 4 minute statements about their good/bad/problem containers issues in their world. This will be about collecting these issues, not solving them (we do not have enough time here to argue/solve/discuss at much length the details of any of the issues themselves). With these proceedings, I posit that we can then, as a community (off line), attempt to develop a document along the lines @satra suggested and by way of doing that, discuss/argue/debate/resolve (I hope) the details of the various issues. Of course, having @satra 's points of discussion in mind can influence what good/bad/problem anyone brings up, but addressing those directly,I think, is too far reaching for a 1-hour session with community involvement... |
@dnkennedy - sounds like a plan! using the forum to listen to and aggregate different viewpoints would indeed be a great starting point. |
@satra Any TODO's you ask? Well, if folks can tolerate the design I put together, we need to promote the session and make sure that those who will be speaking know the 'ground rules' and scope. Some questions remain: should we let the speakers pre fill in their bullet points on the 'scoreboard'? I think there will only be one shared screen (mine) which can just be the 'scoreboard' with the community filling it in as we go. It MIGHT be possible for a speaker to provide me 1 slide or webpage that I could display. |
The @satra post-session community white paper topics for discussion (as alluded to above):
|
So, do @GaelVaroquaux @PeerHerholz @gllmflndn @satra @ValHayot @stebo85 @hcp4715 @jaetzel and @pbellec consent to the plan outlined above and in https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0? scheduled to be run on 01.07.2020, 19:00- 20:00 UTC I have to provide email addresses to the OSR to get the Zoom links sent to ya'll. I will be sharing my screen, with the 'scoreboard' of the aforementioned google spreadsheet. You can share with me 1 slide or one URL that I can try to show during your 4 minutes! Please stick to the 4 minutes, I will be draconian. Please try to stick to the enumeration of (any of the) good things, bad things, problem things and solution things about containers. You can pre-fill in the aforementioned scoreboard/spread sheet, if you want. These are all longer discussions for the future, in this session we are collecting... Feel free to also share links to other presentations, tools, resources that you want, even though we can not get into their details in this session's format. |
One important aspect I think that's missing is the |
One aspect which containers facilitate is standardization of the application interfaces: BIDS-Apps, Flywheel gears, Brainlife ABC apps, Boutiques; and even more generic Singularity SCI-F Apps (harmonization of entry points within single container). Although such APIs can be used without containers, IMHO abstraction away from "software distribution" aspect helped to concentrate on APIs, and now they are typically used only with the containers. I think some exposure to those and discussion on possible ways to improve interoperability (and metadata harmonization to facilitate discovery between associated platforms) would be a valuable topic. |
Questions concentrate around "research", but many participants and audience will also be "scientific software developers". So discussion of aspects related to software development where containers provided huge assistance IMHO is a worthwhile topic: use of containers for troubleshooting/debugging, continuous integration, etc. |
A bunch of good points above. 1) scale. I collapsed the (larger and smaller) scale into the section on Software Consumers of various scale, so that scale dimension can still be explored... @yarikoptic 's additional lovely points may need a whole additional emergent session to really get to. But, to the extent that these are some of the pointers to some of the 'good' of containers, make sure you get them into the 'good' column of the 'scoreboard'! |
Hi @gllmflndn @GaelVaroquaux @ValHayot @hcp4715. Please confirm that you're on board with this plan, and you have the zoom info. Sorry for the chaotic communication, too many channels of communication for my small internet-less brain... |
Yep! works for me / got the email. |
Another point to be considered, that I noted in mattermost last week and that is very important IMHO, is to prioritize numerical and algorithmic stability/reproducibility as the first resort to achieving reproducibility. When possible (might not always be), this would return better bang for the back IMHO, over "nuking" the app with tons of layers of containers (even if one doesn't see them), that adds to the complexity of the app as well as difficulty in usage. My experience with BIDS-App OPPNI and graynet/hiwenet partly contributed to the above point of view. Looking back, I feel standardizing HPC environments with the same stack would save a ton of effort and money, which moving the science forward. Just my 2¢. |
@dnkennedy Yes, got your email, thanks! Time is not ideal for me so if I miss the beginning of the session or lurk in the background, just skip me - or I'll try to send you a short summary of some of my thoughts on the topic. |
Hi. In an above comment I put an incorrect link to the 'scorecard'. I corrected it above, but am repeating the correct 'scorecard' link here: https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. Apologies to anyone who tried that above link and didn't get let into an internal doc that was just my reconstruction of the Mattermost /Town hall container discussion thread before it moved to to the containers channel. |
OK, @gllmflndn Would love you input and thoughts regarding the containerization of all things SPM and beyond... Either in person, or at least in the scorecard doc (https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0). |
@dnkennedy Thanks, just seeing the scorecard now - to be honest, my thoughts (and beyond) seem to be nicely covered by @satra and @GaelVaroquaux. |
@gllmflndn It's ok to reiterate a little, that way we effectively "+1" some of the common topics that are important to multiple folks. If it's easier, I guess you can annotate the other points with a "+1" in some other way... |
Hello again. Yesterdays session was the 'fun' part. Now, the 'hard' work starts of trying to sift and consolidate the raw observations, in order to see what came out. Any good ideas about how to proceed? Can we get volunteers to take a column each (C (Good), D (Bad), E (Problems), F (Solutions)) to distill into a bullet list of points (with a counter of how many times a similar thing came up)? [vertical integration]. Then we can follow that up with a horizontal integration... |
And also make it available for comments. E.g. although I agreed with @GaelVaroquaux about "Encourage bad behavior from tool developer perspective (not worrying about portability, dependences)" I later reconsidered it: I saw many projects where trying to create a Dockerfile lead developers to realize shortcomings of their build process/infrastructure and have them addressed. So it is again the stick of two ends and not all "black and white". |
Hello
Like with other sessins this week
I did not participate but remain very interested to learn what was said,
Please share the summaries! Look forward
PDM
…On Thu, Jul 2, 2020 at 11:56 PM Yaroslav Halchenko ***@***.***> wrote:
And also make it available for comments. E.g. although I agreed with
@GaelVaroquaux <https://github.com/GaelVaroquaux> about "Encourage bad
behavior from tool developer perspective (not worrying about portability,
dependences)" I later reconsidered it: I saw many projects where trying to
create a Dockerfile lead developers to realize shortcomings of their build
process/infrastructure and have them addressed. So it is again the stick of
two ends and not all "black and white".
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#87 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFKUCLJTAB5LFF2RH33AYDRZSU4FANCNFSM4OIPKFCQ>
.
|
Hi @Starborn ; the raw notes from the session are at https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0. The whole community is invited to help refactor these raw notes into a more coherent set of observations and then a more formal 'best practices' recommendation. |
Thank you this is great
will sure follow up
…On Fri, Jul 3, 2020 at 9:53 AM David Kennedy ***@***.***> wrote:
Hi @Starborn <https://github.com/Starborn> ; the raw notes from the
session are at
https://docs.google.com/spreadsheets/d/1LzAHP9RIkuSBG-fB_6Jza_vUngoc_4jmEXqZymxo588/edit#gid=0.
The whole community is invited to help refactor these raw notes into a more
coherent set of observations and then a more formal 'best practices'
recommendation.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#87 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFKUCLAP74JUGW5XOQYEWTRZU2YNANCNFSM4OIPKFCQ>
.
|
A thought that stuck with me following the online discussion was If all the people that are spending time on making FreeSurfer (*) containers would contribute a bit to improving FreeSurfers release/packaging/installation/deployment/infrastructure mechanisms, would that not be much more effective? (*) you can insert your favourite software here instead of FreeSurfer, but it was one that was explicitly mentioned I think that for many computer scientists is more interesting to spend the time on "your own" software/container than on someone else's open-source project. This reflects a problem with the academic incentive structure, which does not favour contributions to "someone else's" projects or software. The same problem would not only apply to analysis software, but also to the containers from other people. |
This sentiment resonates with me. Is that to say, there really should only be 1 FreeSurfer 6.0 container (again, taking a 'random' example), and it should live in some well known standard place, and everyone should use that unless there is a really good reason to make an new FreeSurfer 6.0 container, then fine, document why, and put it in a standard place? |
even for freesurfer there are many use cases: neurodocker distributes a minimized freesurfer just for recon-all while most of these big packages have many needs. the freesurfer group themselves now release a version of freesurfer as a whole container. yes, whole installations can (and are) be(ing) distributed by people who develop the software. but there are many use cases for container construction (e.g., fmriprep, giraffe.tools, optimize size for running/shipping). take a look at the ga4gh registry of containers to see what can be done to help users. i think in this area they did a really good job: https://dockstore.org/ |
Containers: Ticket to Valhala or Ticket to the Inferno?
By David Kennedy, University of Massachusetts Medical School
Abstract
The containerization of neuroimaging analysis workflows has quickly become a hot topic in the OSR and beyond. But with great power comes great responsibility. Containers sometimes get presented as the 'end all and be all' by some and as a 'dangerous bandaid for masking bad software development practices' by others. What's the poor researcher to do? In this session we hope to have a pleasant discussion of the pros and cons, useful application areas, and practical logistics about using containers in the 'real world'.
We propose to present this as a round table with input from a number of perspectives, then followed by a dialog and public discussion aimed at determining where the community stands regarding 'best practices' and use of containers. The round table may include (subject to confirmation and further discussion): Jo Etzel, Pierre Bellec, Peer Herholz, Satra Ghosh, Agah Karakuzu.
Useful Links
https://github.com/ReproNim/neurodocker
https://ww5.aievolution.com/hbm1901/index.cfm?do=abs.viewAbs&abs=4639
Tagging @dnkennedy
The text was updated successfully, but these errors were encountered: