This repository provides a concise presentation of some necessary tools for a data-centric python development environment.
It has been created for the University of Athens, Department of Economics, MSc. in Business Administration, Analytics and Information Systems Postgraduate Program.
The course has been created, is being curated and taught by Thanasis Argyriou, @linkedin.
It is a crash-course on some necessary development tools, as part of the "Research Methods" seminars of the third semester.
This is not a complete guide, but rather a quick start tutorial on principles and foundational concepts with practical examples.
All tools are used as examples, not as necessary endorsements.
The material is in the form of a GitHub repository and is also available on e-class: Research Seminars sub-course.
Excellent working knowledge of the previous second semester courses is assumed:
Necessary prerequisites are:
- creating python virtual environments and installing python packages,
- working with Python editors and Jupyter Notebooks,
- Markdown language and,
- Windows (or Linux) Command Line Interface.
- Understanding of basic concepts of relational and non-relational databases.
- Understanding of basic concepts of working from "relative" or "absolute" OS paths.
Students are kindly encouraged to register and read the material before the first lecture.
Important things to do before the first lecture:
There are great free resources for students at GitHub Education.
- Explore the offers at the GitHub student developer pack.
- Sign up for a GitHub account.
- Install PyCharm IDE.
- Activate GitHub copilot.
- Register for
Digital Ocean
credits. - Get a
Name Cheap
domain name.
The material is presented in four live (3 hours long) seminars and three (optional, self-paced and advanced) "asynchronous" lectures.
Each lecture requires 9 hours for self-study and 8 hours for practice on tasks/assignments.
The pace is intensive by design and each lecture requires good working knowledge of all the previous ones.
Therefore, it is highly recommended to complete the tasks and assignments before each next lecture and attend all lectures.
Support is provided on a personal basis via the e-class platform, the GitHub repository and "on-demand" meetings.
Announcements are made via the e-class platform.
At the end of the course, students should be able to:
- Work on a python virtual environment.
- Use a version control system like git, manage local and remote repositories and work collectively on a repo.
- Use a python package manager (poetry) to pin python version and package dependencies.
- Use a python IDE like PyCharm to develop python code.
- Integrate and use GitHub copilot to help with code suggestions, completions, documentation.
- Integrate and use python linter (ruff) and a code formatter (black) to check code quality, syntax and style.
- Use Streamlit to create interactive web apps.
- Use MongoDB and pymongo to work on non-relational databases.
- Use SQL and SQLAlchemy to interact with relational databases.
- Do all the above on a remote linux server.
- Mission Accomplished: Combine all the above to work on a more holistic/productive/efficient/comprehensive development environment.
There will be a final 90 minutes exam in the lab, based on the material of the first four lectures.
At the exam you will be asked to:
a) use git and GitHub to:
- Create a new GitHub repository and clone the remote repository to your local machine.
- Create a .gitignore and a README markdown file.
- Push the code to the remote repository as frequently as you deem necessary.
Correct usage of Git (2.5 points)
b) use venv and poetry to:
- Create a new python virtual environment.
- Install the necessary python packages using poetry.
- Start a new python project using pyCharm and use the existing virtual environment interpreter.
- The code should be automatically formatted with black and checked with ruff (integrated with pyCharm).
Correct usage of python dev tools (2.5 points)
- Create a new python function and module, that will be called from the main script.
- Create a new python script that will call the function from the module.
- The script will be run from the command line.
Correct project structure (2.5 points)
- The script will create an interactive web app with Streamlit with the data provided.
- Add extra functionality and features to the Streamlit App we created in the App.
- Add one plot that you think is useful for the data provided.
- You may use any python library you like to aggregate or calculate useful metrics for the data.
Correct usage of streamlit and creation of proper plots (2.5 points)
Use of AI is mandatory and correct usage of prompting will be part of the exam.
You are encouraged to use freely any source of info that works best for you.
Assuming everything delivered is correct, there is an extra checkbox concerning the exam grades:
The three higher grades will be:
10 for the first one to finish, 9.5 for the second, 9 for the third, 8.5 for the fourth, provided that they finish 15 minutes before the end of the exam.
Everyone else who finishes everything at least 15 minutes before the end of the exam will get at least 8.5.
The lectures material is presented in detail in a separate file: README_lectures_outline.md
Please read the lectures outline before the first lecture.
- Lecture 1: Version control tools
- Lecture 2: Python development tools, virtual environment and package management
- Lecture 3: Modular development. User defined functions and "modules".
- Lecture 4: Building interactive web apps with Python
- Lecture 5: MongoDB tools (optional, self-paced): Non-Relational Databases
- Lecture 6: Asynchronous education (optional, self-paced): Remote linux servers
- Lecture 7: Asynchronous education (optional, self-paced): SQL tools
Thank you very much for your interest and have fun on your journey on programming and data science!
You finished the course with flying colors and enjoyed it?
Please check out and contribute to the Run4more Public Analytics Repository.
It is a baby project in its first steps that is being developed as an "Academy" for working on real data and tasks from the Run4more StartUp!
In the very near future, data from more StartUps will be added and the project will be expanded to become a "StartUps Analytics Academy".