Skip to content

JRC-COMBINE/project-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Project Name ✨

A short, clear one-liner describing what this project does and why it exists.

Python 3.11+ Jupyter uv Supported


Overview

Explain the problem this project addresses, the approach taken, and key outcomes in 2–4 sentences. If applicable, mention important methods, datasets, or benchmarks.

ℹ️ Hint: Link to any key paper or dataset, and mention where detailed analysis lives (e.g., specific notebooks).


Project Structure

The repository should separate raw/processed data, notebooks, and generated artifacts.

.
├── data/              # Raw, interim, and processed data (see recommendations below)
├── notebooks/         # Jupyter notebooks for exploration, training, analysis
├── results/           # Generated outputs (figures, tables, metrics, models)
└── README.md

💡 Tip: Keep raw, immutable data in data/raw, derived data in data/processed, and temporary caches in data/interim.


Getting Started

Prerequisites

  • Python 3.11+ recommended (Mention your specific version here).
  • One of: pip or uv (a fast Python package manager).
  • JupyterLab or VSCode with the Python extension to run notebooks.

ℹ️ Hint: On macOS, you can install Python 3.11+ via Homebrew: brew install python@3.11.

Setup with pip

Create a virtual environment and install dependencies:

python3.11 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

💡 Tip: Use python3.11 explicitly to avoid accidentally using an older interpreter.

Setup with uv

Install uv and set up the environment quickly:

# Install uv (choose one)
brew install uv
# or:
curl -Ls https://astral.sh/uv/install.sh | sh

# Create venv and install deps
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

⚡ Hint: uv is much faster than pip and supports pip-compatible commands like uv pip install, uv pip freeze, and uv run.

Run Notebooks

# pip environment
jupyter lab
# or
jupyter notebook

# uv environment
uv run jupyter lab

Ensure the selected kernel points to your .venv or uv environment.

💡 Tip: In Jupyter, use Kernel → Change Kernel to pick the correct environment.


Data

Layout and Conventions

Recommended subfolders (create as needed):

  • data/raw/ — immutable, original data dumps
  • data/interim/ — temporary, intermediate files
  • data/processed/ — cleaned, ready-to-model data

Consider adding a short data/README.md describing sources, schema, and preprocessing steps.

Data Preparation

If data is downloaded or processed, provide scripts/notebooks and commands here. For example:

# examples (adapt to your project layout)
python scripts/download_data.py --out data/raw
python scripts/prepare_data.py --in data/raw --out data/processed

ℹ️ Hint: Document required environment variables in a .env.example file and never commit secrets.

Data Privacy and Large Files

Add .gitignore patterns for large or sensitive artifacts:

# Data (raw and intermediate should not be versioned)
data/raw/
data/interim/

# Large or derived artifacts
results/models/
results/logs/

# Secrets
*.env

⚠️ Warning: Never commit credentials, API keys, or private datasets. Use environment variables and secure storage.


Notebooks & Reproducibility

Recommended Execution Order

If multiple notebooks exist, number them to indicate order, e.g.:

01-data-exploration.ipynb
02-preprocess.ipynb
03-train-model.ipynb
04-evaluate.ipynb

Describe the expected inputs/outputs of each notebook and where artifacts are saved (e.g., results/ or data/processed/).

💡 Tip: Keep notebooks short and focused. Split long pipelines into clear steps.

Randomness and Seeds

To ensure reproducible results, fix seeds for all libraries you use. Example (Python, NumPy, optional PyTorch):

import os
import random
import numpy as np

SEED = 42
random.seed(SEED)
np.random.seed(SEED)

# Optional: if using PyTorch
try:
    import torch
    torch.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
except ImportError:
    pass

Also set any framework-specific determinism flags where applicable.

Capturing Environments

Pin exact versions for reproducibility and record them with the results.

pip:

pip freeze > environment.txt

uv:

uv pip freeze > environment.txt

ℹ️ Hint: Save seeds, config files, and key hyperparameters alongside results for traceability.


Documentation in Notebooks

Make notebooks easy to read, rerun, and review:

  • Start with a top-level title and a short objective (what this notebook does).
  • Include sections for:
    • Inputs (paths, parameters, environment variables)
    • Outputs (files saved, metrics generated)
    • Steps (data load → preprocess → train → evaluate)
  • Prefer descriptive Markdown cells over long inline comments.
  • Use clear headings (e.g., ## Load Data, ## Train Model) and keep code cells small.
  • Record runtime notes (e.g., hardware, run time, dataset version).
  • Check notebooks run well top-to bottom
# Clear outputs before committing (helps diffs and reduces repo noise)
jupyter nbconvert --clear-output --inplace notebooks/*.ipynb

💡 Tip: Use consistent naming, e.g., 03-train-model.ipynb. Avoid committing large outputs; save them under results/.


Git & Version Control

Recommended practices:

  • Commit early and often with meaningful messages.
  • Keep a clean .gitignore (see Data Privacy and Large Files).
  • If notebooks change frequently, consider Jupytext pairing for reviewability.
# Initialize and push a new repository
git init
git add .
git commit -m "Initial commit: project structure and README"
git branch -M main
git remote add origin https://github.com/your-org/your-repo.git
git push -u origin main

# Typical feature flow
git checkout -b feature/your-change
git add -A
git commit -m "Describe what changed and why"
git push -u origin feature/your-change

ℹ️ Hint: Use pre-commit to enforce formatting and clear notebook outputs automatically.


Results

  • Place figures, tables, and serialized models under results/.
  • Optionally include a summary table of key metrics here.
  • Suggested layout:
    • results/figures/ — PNG/SVG plots
    • results/tables/ — CSV/Markdown tables
    • results/models/ — checkpoints or final models
    • results/logs/ — training logs or run metadata

💡 Tip: Save a short results/README.md summarizing the latest key outcomes. You can copy/paste from/to the report and keep this file version controlled.

About

A template to start a new research project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors