CARP Analytics Python

A high-performance Python library for processing and analysing data from CARP (Copenhagen Research Platform) studies.

[!BETA] The CARP package is at the beta phase and the APIs and methods might change more often.

Features

Schema Discovery: Automatically scans and infers the schema of the data
Data Grouping: Efficiently groups data by any field (e.g., data type, device ID) into separate files
Parquet Export: Convert JSON data to Parquet for faster subsequent analysis
Participant Management: Link and track participants across multiple study phases
Visualization: Generate location heatmaps and other visualizations
Pandas Integration: Seamlessly work with DataFrames

Installation

pip install carp-analytics-python

With Optional Dependencies

# For pandas/parquet support
pip install carp-analytics-python[pandas]

# For visualization support
pip install carp-analytics-python[viz]

# For scientific computing (numpy, scipy, scikit-learn)
pip install carp-analytics-python[science]

# Install everything
pip install carp-analytics-python[all]

Development Installation

git clone https://github.com/carp-dk/carp-analytics-python.git
cd carp-analytics-python

# Using uv (recommended)
uv sync

# Or using pip
pip install -e .

Quick Start

from carp import CarpDataStream

# Initialize with a data file
data = CarpDataStream("data/study-phase-1/data-streams.json")

# Scan and print the schema
data.print_schema()

# Convert to Parquet for faster analysis
data.convert_to_parquet("output_parquet")

# Load data as a DataFrame
df = data.get_dataframe("dk.cachet.carp.stepcount", "output_parquet")
print(df.head())

Working with Participants

from carp import CarpDataStream

# Load data from multiple phases
data = CarpDataStream([
    "data/phase-1/data-streams.json",
    "data/phase-2/data-streams.json",
])

# Print participant summary
data.print_participants()

# Access participant data via email
participant = data.participant("[email protected]")

# Get participant info
print(participant.info())

# Get available data types for this participant
participant.print_data_types()

# Get a DataFrame of step count data
df = participant.dataframe("dk.cachet.carp.stepcount", "output_parquet")

Data Export

# Export specific data type to JSON
data.export_to_json("heartbeat_data.json", data_type="dk.cachet.carp.heartbeat")

# Group data by data type
data.group_by_field("dataStream.dataType.name", "output_by_type")

# Group data by participant
data.group_by_participant("output_by_participant")

Visualization

# Generate location heatmap for a participant
participant = data.participant("[email protected]")
participant.visualize.location(output_file="user_locations.html")

Command Line Interface

The package includes a CLI for common operations:

# Show schema of data files
carp schema data/study/data-streams.json

# Convert JSON to Parquet
carp convert data/study/data-streams.json -o output_parquet

# Count items in data files
carp count data/study/data-streams.json

# List participants
carp participants data/study/data-streams.json

# Export filtered data
carp export data/study/data-streams.json -o output.json -t dk.cachet.carp.stepcount

# Group data by field
carp group data/study/data-streams.json -f dataStream.dataType.name -o grouped_output

API Reference

`CarpDataStream`

The main class for working with CARP data streams.

Method	Description
`scan_schema()`	Scan and infer the data schema
`print_schema()`	Print the inferred schema as a table
`convert_to_parquet(output_dir)`	Convert JSON to Parquet files
`get_dataframe(data_type, parquet_dir)`	Load data as a pandas DataFrame
`export_to_json(output_path, data_type)`	Export data to JSON file
`group_by_field(field_path, output_dir)`	Group data by a specific field
`participant(email)`	Access participant data via fluent API
`print_participants()`	Print participant summary table

`ParticipantAccessor`

Fluent API for accessing individual participant data.

Method	Description
`info()`	Get participant information as a dictionary
`print_info()`	Print participant info as a table
`all_data(data_type)`	Generator for all participant data
`data_types()`	Get all unique data types
`dataframe(data_type, parquet_dir)`	Get data as a pandas DataFrame
`visualize.location()`	Generate location heatmap

Requirements

Python 3.10+
ijson (for streaming JSON parsing)
rich (for terminal output)
tqdm (for progress bars)

Optional:

pandas, pyarrow (for DataFrame and Parquet support)
matplotlib, folium (for visualization)
numpy, scipy, scikit-learn (for scientific computing)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/featA)
Commit your changes (git commit -m 'Add some featA')
Push to the branch (git push origin feature/featA)
Open a Pull Request

Licence

This project is licensed under the MIT Licence - see the Licence file for details.

Acknowledgments

CARP - Copenhagen Research Platform

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
examples		examples
src/carp		src/carp
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CARP Analytics Python

Features

Installation

With Optional Dependencies

Development Installation

Quick Start

Working with Participants

Data Export

Visualization

Command Line Interface

API Reference

`CarpDataStream`

`ParticipantAccessor`

Requirements

Contributing

Licence

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

carp-dk/carp-analytics-python

Folders and files

Latest commit

History

Repository files navigation

CARP Analytics Python

Features

Installation

With Optional Dependencies

Development Installation

Quick Start

Working with Participants

Data Export

Visualization

Command Line Interface

API Reference

CarpDataStream

ParticipantAccessor

Requirements

Contributing

Licence

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`CarpDataStream`

`ParticipantAccessor`

Packages