This session covered data obtention and some procedures of data preparation.
Commands, functions, and methods:
!wget
- Linux shell command for downloading datapd.read.csv()
- read csv filesdf.head()
- take a look of the dataframedf.head().T
- take a look of the transposed dataframedf.columns
- retrieve column names of a dataframedf.columns.str.lower()
- lowercase all the lettersdf.columns.str.replace(' ', '_')
- replace the space separatordf.dtypes
- retrieve data types of all seriesdf.index
- retrive indices of a dataframepd.to_numeric()
- convert a series values to numerical values. Theerrors=coerce
argument allows making the transformation despite some encountered errors.df.fillna()
- replace NAs with some value(df.x == "yes").astype(int)
- convert x series of yes-no values to numerical values.
The entire code of this project is available in this jupyter notebook.
The notes are written by the community. If you see an error here, please create a PR with a fix. |