Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add python DA #3332

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions python/data-analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: Python Data Analysis
description: How to use Python to analyze data.

sections:
'0':
- pyda-introduction
- pyda-analysis-environments
- pyda-initializing-and-cleaning-datasets
- pyda-analyzing
- pyda-analyzing-ii
- pyda-analyzing-iii
- pyda-analyzing-iv
- pyda-da-tips

next:
- python:functional-programming
13 changes: 13 additions & 0 deletions python/data-analysis/pyda-analysis-environments/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: Analysis Environments

description: Get familiar with different analysis environments.

insights:
- pyda-what-are-analysis-environments
- pyda-ipython-vs-shell-vs-scripts
- pyda-different-tools-to-use
- pyda-notebooks
- pyda-creating-your-first-notebook

aspects:
- introduction
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
author: Stefan-Stojanovic

type: normal

category: how-to

links:
- >-
[Markdown Guide](https://www.markdownguide.org/basic-syntax/){website}

---

# Creating Your First Notebook

---
## Content

Open your terminal and type `jupyter-lab` if you installed the application locally or go to [their website](https://jupyter.org/try) to open the online version.

Next, to create a new notebook go to `File - > New -> Notebook`.

This will give you an empty notebook with a single empty cell.

At the top, you can choose what kind of cell this will be. There are two types; `Code` and `Text/Markdown` cells.

`Code` is used for any code you want to write and run.

`Text/Markdown` is used for adding text as well as elements like headers, paragraphs, and others.

![preview](https://img.enkipro.com/2b3ab5584c545906ee8ccbf7119ea3e9.png)

The first two cells are `Text/Markdown` cells. The first one contains markdown and the second one contains that same markdown but *executed* to show the resulting text.

> 💡 To learn more about markdown, check the *Learn More* section.

The third cell is a `Code` cell. The `[1]:` to the left of it means that it was the first `Code` cell that was executed. This cell prints `Hello World` to the console, which is shown underneath it.

The final two cells both have a `[2]:` before them.

The first `[2]:` is shown because that was the second `Code` line that was executed. The next `[2]:` is shown because it matches the output of the previous cell. We returned the variable `x` and its resulting output is `"I am a String"`.

Any `Code` cell that returns something will have a matching cell after it with the same number.

> 💡 The cell numbers don't have to be in order! If you run the same cell again or run a different cell, the number will increase.

Once you're done playing with the example, save the notebook.

> 💡 We will be using the same notebook in the next few workouts to import and analyze a dataset.
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
author: Stefan-Stojanovic

type: normal

category: must-know

links:
- '[Google colab](https://colab.research.google.com/notebooks/basic_features_overview.ipynb#scrollTo=JyG45Qk3qQLS){documentation}'
- '[VScode Jupyter Notebooks](https://code.visualstudio.com/docs/python/jupyter-support){website}'
- '[Spyder Notebook](https://github.com/spyder-ide/spyder-notebook){documentation}'
- '[Binder](https://mybinder.org/){website}'
- '[Jupyter.org](https://jupyter.org/){website}'
- '[Cocalc](https://cocalc.com/){website}'


---

# Python Coding Tools

---
## Content

Before we start coding, we have to decide which tool we're going to use.

One important category split between most tools is whether they're accessible offline or online.

The main difference is that online tools are immediately available (assuming you have an internet connection) and usually come with all the necessary libraries[1] included (no need to install anything).

Offline tools need to be installed first, including any library you might need, but can afterwards be used without internet.

> 💡 A popular kind of Python coding tool (available both offline and online) is called a *notebook*.

Notebooks offer interactive Python environments that can combine code with other visual elements such as text, charts, and images.

One offline tool that lets you use notebooks is VScode[2]

As for online tools, here're two:

| Name | Unique Feature |
|--------------|-------------------------------------|
| Google Colab | Real-time collaboration |
| Cocalc | Collaboration based on edit history |

For some in-depth information on these tools and more, check out the **Learn More** section.

> 💡 We mentioned in the first insight that we'll be using Jupyter Lab in this course. All of our examples will be available online. If you prefer to run Jupyter Lab offline, here's how you can install it using `pip`[3] or **anaconda**[4].

> 💡 All of the tools mentioned above have support for Jupyter notebooks. It doesn't matter which tool you choose to work on, you can still follow along.

---

## Footnotes
[1: Libraries]
Think of libraries as external programs you can use to quickly get some functionality you don't want to write by yourself.

For example, if you wanted to draw plots and charts, you'd typically use an existing charting library instead of writing all the code for graphical and spatial calculations yourself.

Most online Python environments come with popular libraries pre-installed.

[2: VSCode]
VScode is an advanced text editor with extensions that let you edit, modify, delete, create and run notebooks.

[3:pip]
pip is a package manager for Python.

> 💡 When installing through `pip` you first have to make sure `pip` is upgraded to the latest version.

To upgrade:
```python
# use pip to upgrade itself :)
pip install --upgrade pip
```

If you don't have pip, download the latest python installer from the [official website](https://www.python.org/downloads/) and make sure the checkbox for `pip` is ticked on.

> 💡 Along with Jupyter we'll also install iPython.

```python
pip install --upgrade jupyter ipython
```

To run Jupyter, type this in your terminal:
```sh
jupyter-lab
```

[4:Anaconda]
Anaconda is a package and environment manager for Python and R. It provides a graphical user interface and a terminal.

To download Anaconda visit [their official website](https://www.anaconda.com/products/individual) link.

> 💡 Along with Jupyter we'll also install iPython.

Installing through Anaconda:
```sh
conda install jupyter ipython
```

To run Jupyter, type this in your terminal:
```sh
jupyter-lab
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
author: Stefan-Stojanovic

type: normal

category: must-know

practiceQuestion:
formats:
- fill-in-the-gap
context: standalone
revisionQuestion:
formats:
- fill-in-the-gap
context: standalone

---

# IPython (and Scripts)

---
## Content

### IPython

The Python language comes with a basic interpreter and a [REPL](https://www.enki.com/glossary/general/repl) that lets us write and run Python programs.

iPython is an enhanced version of that with user-friendly features such as:
- Auto-completion
- Support for Data Visualization
- Multi-line editing
- Syntax highlighting
- and more

The IPython shell is usually the recommended shell as it runs your Python code just like the normal Python shell does while also providing a richer set of features on top[1].

The IPython interpreter, as well as the basic Python interpreter, are both interactive shells that are accessed through the terminal via the `python`/`ipython` commands.

If we save code into a file, we call that file a *script*.

If we give the name of our file to the Python shell, we can have it execute the code for us.

```sh
# scripts with python code
# are usually saved with
# the .py extension
python my_script.py
```

> 💡 Scripts are executed in the same way as regular, command-line code.


---
## Practice

??? are command-line tools used to execute code.

??? are pieces of code saved into files.

??? are sessions on the computer used to communicate code to shells.

- Shells
- Scripts
- Terminals

---
## Revision

When you put save code in a file, what do you typically call that file?

???.

- a script
- a terminal
- text
- an interpreter

---
## Footnotes
[1: Multi Line Execution]

Here is the same code run on the basic interpreter vs IPython:

Basic interpreter:

![windows-10-example](https://img.enkipro.com/cb342ec6c5fb4860fee889d907ee176b.png)

IPython:

![ipython-example](https://img.enkipro.com/02420b736677cad5a5d5d8bcaac54bf4.png)

> 💡 In iPython, you can re-run any part of code you've already run with or without modification to the code.


iPython lets us store multi-line code blocks behind special `Line [N]` variable names. If you look at the iPython example above, all of the code was run on `Line [1]`. This lets us split code into sections and re-run or re-use any section at any time.

On the other hand, in the regular interpreter, we had to write all lines of code one by one.

In any terminal, pressing the ⬆️ key would give us our last executed line.

If we press ⬆️ in iPython, it would give us last executed `Line`.

> ⚠️ The multi-line editing feature is not available within the iPython terminal and only available in notebooks (more on this to come later)

You can think of notebooks as interactive Python environments that can combine code execution, rich text, charts, and rich media.

Loading