Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

05 Call Digest [ 2022-nasa-champions ] #13

Open
jules32 opened this issue May 6, 2022 · 0 comments
Open

05 Call Digest [ 2022-nasa-champions ] #13

jules32 opened this issue May 6, 2022 · 0 comments
Labels
digest Cohort Call Digests

Comments

@jules32
Copy link
Contributor

jules32 commented May 6, 2022

Hi @NASA-Openscapes/2022-nasa-champions-team !

Thanks for a great fifth (final!) Cohort Call last week. We talked about everyones' pathways forward and plans for more open and inclusive data science to migrate Earthdata workflows to the Cloud. Thank you all for sharing your progress with the group. We are so impressed with how much you all have learned through this cohort! And we're looking forward with this cohort to being able to support you with what you need - we're dedicated to helping migrate your workflows to the cloud. 

In the next two months, we'll be supporting you as you need help through a few ways: 

  • Slack. Our #2022-nasa-champions channel will be our primary place to get help asynchronously. Please check in, share links, screenshots, error messages, questions (and wins!) and we (NASA Openscapes) will review code, test things, confer with colleagues, and share back (we'll reply and use 👀 to let everyone know progress). When it makes sense to have video calls to screenshare and talk, we'll share the video link in the channel and anyone is welcome to join as well to watch or ask questions. We can also schedule these as additional co-working sessions, see next -

  • Co-working. on alternating Thursdays at 11:30-1pm PT. We'll continue to have breakout rooms for you to work with Mentors, and also will use these times to teach/screenshare, and will record. Some topics we're working on: 

    • Practicing GitHub workflows and teaching others on your team

    • Cloud spatial subsetting

    • Environment management for creating cloud computing space that is reproducible and scalable (e.g. docker images)

    • Dask/Pangeo software stack to enable scalable processing

    • Cloud costs and setup

    • NetCDF to Zarr

  • 1:1 chats. We are available if you or your team wants to chat. We know that identifying what you need is hard, and talking and screensharing how you work can help you think things through, and through this we'll likely see places to help. We'll be checking in with your teams - and please know this is an offer for anyone in the cohort, not only leads. We're all here to help each other learn.

  • Survey. This is an anonymous survey we ask each Champions Cohort to complete so we can improve future cohorts - and for our Cohort it's also another chance to share what you need. We'd appreciate your feedback - please fill out this survey by May 20.

Hope you have a great weekend - below is a light digest of Call 05.

Cheers,

The NASA-Openscapes Mentors @NASA-Openscapes/mentors-2021 @erinmr @jules32

Digest: Cohort Call 05 [ 2022-nasa-champions ]

Cohort Folder - contains agendas, video recordings, pathways folder

Cohort webpage: https://nasa-openscapes.github.io/2022-nasa-champions 

Goals: Each team shared their Pathways and we discussed next steps.

A few lines from shared notes in the Agenda doc during Pathway shares:

  • use MatLab & moving into python

  • no more emailing code! working to share code via GitHub

  • "Roiling chaos" and legacy code - how do you transition?

    • Language choice not as impt as the commitment to being more open and having things in the right place
  • challenge: only 3 people in Openscapes but team is actually large. How might we get others to work in GitHub

  • "kindness" is key

  • now Jupyter Notebooks are shared, not just local

  • Kodi created earthdata R function and will work on documentation and CRAN

    • function keeps track of data provenance - wraps up complicated analyses
  • bringing new folks in - there's so much tech stuff - how do we help folks with onboarding

  • need to track which version of data they gave people cause they have lots of open collabs

  • challenge of bridging uses of R and python by different communities they deal with cloud

  • Pathways & Landscape resonates

  • hard to share openly if you don't feel safe. kindness = :-)

  • understanding power of markdown and ability to put it on GitHub

  • Cloud needs, cloud barriers - We'd need to set up billing, - administrative and budgeting needs 

    • Setting up infrastructure is a huge challenge - beyond using the tools themselves
  • Costing in the cloud is a question that comes up a lot , as the cloud paradigm unfolds. I wonder if one approach to understanding cost and setting things up "on your own" (at own institution) is to first become more familiar and comfortable with doing data analysis in the cloud, through "training icloud accounts like Openscapes/2i2c, so taking advantage of programs like this. That could allow a better understanding of how much cloud computing you may need, what environment is required... which then allows you to go and do this within your labs/at your institutioins, may have a better idea what to ask for and set up. I also wonder if there is a parallel here setting up cloud computing to how we've traditionally been setting up and paying for HPC environments 

    • "I like the idea of learning about "guardrails" so you don't mess up ($$$) on the cloud -- akin to benefits of adopting version control but for cloud credits"
  • ‎Realized "how informal I am" in proj setup and this could be improved to work with teams

  • xarray is the magic access package, but slow in cloud -- working on optimization with GES-DISC wizards and gurus

  • if using NASA ESDIS data, could use the zarr-eosdis-store e.g. https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/09_Zarr_Access.html

  • Aaron Friesz is working on a prototype with kerchunk & Alexis's colleague Chris has a *very* similar prototype going with fsspec. Stay tuned!

  • Erin Robinson has really helped me - Matlab & Fortran code > running in the Cloud

  • This example can be generalized to a use case where one needs to use cloud-archived data with non-cloud data (e.g. GCM output, in-situ data, other institution's data etc)

  • Allan: I like the idea of learning about "guardrails" so you don't mess up ($$$) on the cloud -- akin to benefits of adopting version control but for cloud credits 

Friendly learning resources

@jules32 jules32 added the digest Cohort Call Digests label May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
digest Cohort Call Digests
Projects
None yet
Development

No branches or pull requests

1 participant