NetCDF files are used for storing self-describing data in a concise and consistent way. If you have never encountered a NetCDF file before, they can seem daunting. If you want a quick and friendly way of looking at the data you should try out Panoply. Or see if the files that you are interested in are already hosted on an in-browser viewing tool such as ERDDAP
If you are ready to dive in and get started with analysis, then choose your language, get set up, and check out the introductory materials.
This repository contains introductory materials for Python, R, and Matlab. These are the giants in data analysis in the scientific and engineering communities. As an overview: Python is an opensource language with a broad base of support and can be used for development as well as data analysis, R is an opensource statistics-focused language, and Matlab is a proprietary language with great support and easy startup (no installing packages). Choosing which to use can be hard, but there is no reason to limit yourself to one. If you really dive in, you will find that the best tool for the job isn't always just one of the languages, but some combination. If you are interested in reading more about the nitty gritty of the difference between R and Python check out this datacamp blog post, Python and Matlab check out this pyzo blog post.
For the purposes of this introduction, all the code in this repository is run in Jupyter notebooks. Jupyter is my favorite development environment and lends itself to linear explanations, but there are different and often more typical environments depending on the language you choose to use. All of the commands in the notebooks will work just the same no matter where you use them, whether at the command line or in a console window within an integrated development environment (IDE). Just copy and paste the lines that you want to run from the notebook and make sure that if your code is dependent on outside functions those files are saved in the same folder where you are working. Descriptions of different environments and instructions for setup are included below.
There are several packages that you will want to install before you start analysis with python. If you are already confortable with python, then install netCDF4 and xarray, both are available via conda or pip. If you don't alreay have them installed, you will also want to install matplotlib for graphing, and pandas for data analysis.
If you aren't comfortable already, then the easiest way is to get miniconda. If you don't know which version of Python to pick, then I recommend Python 3, but all these examples will also work in Python 2.7 (for more info on this decision look here.
Now you have Python, and a tool for managing packages in Python: conda
. To use conda well, we will create a new environment into which we will install all the packages for our project. To learn more about environments check out conda/envs. For now just go to the command line and run:
$ conda create --name myproject netCDF4 xarray matplotlib pandas
Now you have an environment called myproject and this environment contains all the packages that we will use in the rest of this tutorial. You will be prompted with how to activate this enviroment using either $ activate myproject
or $ source activate myproject
depending on your operating system. Activate it now and you should see that the command line prompt is now preceded by your environment name in parenteses. It should look like this:
(myproject) $
Jupyter is my favorite development environment, but there are many other environments in which you can run Python (some popular ones: Spyder, PyCharm). To install Jupyter just run:
(myproject) $ conda install jupyter
To start it, open a command window in the folder in which you want to store your notebooks and run:
(myproject) $ jupyter notebook
There is also always the option of running Python directly from the terminal. If you choose that route, this is what your work environment will look like:
As with python, there is one package that you will need, and several others that will make analysis easier. If you are comfortable with R, then install ncdf4. To make time management easier install xts, and to make plotting easier install ggplot2.
If you don't yet have R, download it and follow the install instructions.
If you want to work from the command line run:
$ install.packages('ncdf4')
$ install.packages('xts')
$ install.packages('ggplot2')
If you want a more user-friendly environment, try the free version of RStudio and use the package manager to install ncdf4, xts, and ggplot2. There are some nice tools for inspecting data tables and saving plots in a more familiar way. This is what your work environment will look like if you choose to use RStudio:
If instead you choose to run R directly from the terminal, your environment will look more like this:
Unlike the other two languages, matlab requires a license. If you already have Matlab you are all set to move forward.
Matlab comes with its own user interface which has some helpful tools for inspecting data in table form and changing how plots look. If you use the Matlab user interface, this is what your work environment will look like:
If for whatever reason you are interested in getting Jupyter set up to run Matlab, just download miniconda and run:
$ conda install jupyter -c ioos matlab_kernel
To start it, open a command window in the folder in which you want to store your notebooks and run:
$ jupyter notebook
You should notice a Matlab option when selecting which kernel to use in a notebook.