A short, clear one-liner describing what this project does and why it exists.
Explain the problem this project addresses, the approach taken, and key outcomes in 2–4 sentences. If applicable, mention important methods, datasets, or benchmarks.
ℹ️ Hint: Link to any key paper or dataset, and mention where detailed analysis lives (e.g., specific notebooks).
The repository should separate raw/processed data, notebooks, and generated artifacts.
.
├── data/ # Raw, interim, and processed data (see recommendations below)
├── notebooks/ # Jupyter notebooks for exploration, training, analysis
├── results/ # Generated outputs (figures, tables, metrics, models)
└── README.md
💡 Tip: Keep raw, immutable data in
data/raw, derived data indata/processed, and temporary caches indata/interim.
- Python 3.11+ recommended (Mention your specific version here).
- One of: pip or uv (a fast Python package manager).
- JupyterLab or VSCode with the Python extension to run notebooks.
ℹ️ Hint: On macOS, you can install Python 3.11+ via Homebrew:
brew install python@3.11.
Create a virtual environment and install dependencies:
python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt💡 Tip: Use
python3.11explicitly to avoid accidentally using an older interpreter.
Install uv and set up the environment quickly:
# Install uv (choose one)
brew install uv
# or:
curl -Ls https://astral.sh/uv/install.sh | sh
# Create venv and install deps
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt⚡ Hint:
uvis much faster than pip and supportspip-compatible commands likeuv pip install,uv pip freeze, anduv run.
# pip environment
jupyter lab
# or
jupyter notebook
# uv environment
uv run jupyter labEnsure the selected kernel points to your .venv or uv environment.
💡 Tip: In Jupyter, use Kernel → Change Kernel to pick the correct environment.
Recommended subfolders (create as needed):
data/raw/— immutable, original data dumpsdata/interim/— temporary, intermediate filesdata/processed/— cleaned, ready-to-model data
Consider adding a short data/README.md describing sources, schema, and preprocessing steps.
If data is downloaded or processed, provide scripts/notebooks and commands here. For example:
# examples (adapt to your project layout)
python scripts/download_data.py --out data/raw
python scripts/prepare_data.py --in data/raw --out data/processedℹ️ Hint: Document required environment variables in a
.env.examplefile and never commit secrets.
Add .gitignore patterns for large or sensitive artifacts:
# Data (raw and intermediate should not be versioned)
data/raw/
data/interim/
# Large or derived artifacts
results/models/
results/logs/
# Secrets
*.env
⚠️ Warning: Never commit credentials, API keys, or private datasets. Use environment variables and secure storage.
If multiple notebooks exist, number them to indicate order, e.g.:
01-data-exploration.ipynb
02-preprocess.ipynb
03-train-model.ipynb
04-evaluate.ipynb
Describe the expected inputs/outputs of each notebook and where artifacts are saved (e.g., results/ or data/processed/).
💡 Tip: Keep notebooks short and focused. Split long pipelines into clear steps.
To ensure reproducible results, fix seeds for all libraries you use. Example (Python, NumPy, optional PyTorch):
import os
import random
import numpy as np
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
# Optional: if using PyTorch
try:
import torch
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
except ImportError:
passAlso set any framework-specific determinism flags where applicable.
Pin exact versions for reproducibility and record them with the results.
pip:
pip freeze > environment.txtuv:
uv pip freeze > environment.txtℹ️ Hint: Save seeds, config files, and key hyperparameters alongside results for traceability.
Make notebooks easy to read, rerun, and review:
- Start with a top-level title and a short objective (what this notebook does).
- Include sections for:
- Inputs (paths, parameters, environment variables)
- Outputs (files saved, metrics generated)
- Steps (data load → preprocess → train → evaluate)
- Prefer descriptive Markdown cells over long inline comments.
- Use clear headings (e.g.,
## Load Data,## Train Model) and keep code cells small. - Record runtime notes (e.g., hardware, run time, dataset version).
- Check notebooks run well top-to bottom
# Clear outputs before committing (helps diffs and reduces repo noise)
jupyter nbconvert --clear-output --inplace notebooks/*.ipynb💡 Tip: Use consistent naming, e.g.,
03-train-model.ipynb. Avoid committing large outputs; save them underresults/.
Recommended practices:
- Commit early and often with meaningful messages.
- Keep a clean
.gitignore(see Data Privacy and Large Files). - If notebooks change frequently, consider Jupytext pairing for reviewability.
# Initialize and push a new repository
git init
git add .
git commit -m "Initial commit: project structure and README"
git branch -M main
git remote add origin https://github.com/your-org/your-repo.git
git push -u origin main
# Typical feature flow
git checkout -b feature/your-change
git add -A
git commit -m "Describe what changed and why"
git push -u origin feature/your-changeℹ️ Hint: Use
pre-committo enforce formatting and clear notebook outputs automatically.
- Place figures, tables, and serialized models under
results/. - Optionally include a summary table of key metrics here.
- Suggested layout:
results/figures/— PNG/SVG plotsresults/tables/— CSV/Markdown tablesresults/models/— checkpoints or final modelsresults/logs/— training logs or run metadata
💡 Tip: Save a short
results/README.mdsummarizing the latest key outcomes. You can copy/paste from/to the report and keep this file version controlled.