Multiple aspect trajectory indexing with Hilbert curve enhancements for efficient S²KP queries

This repository contains a Python implementation of indexing and query algorithms for Social Spatio-temporal Keyword Pattern (S²KP) queries over Multiple Aspect Trajectory (MAT) data, supporting the following episode-level constraints:

Spatial: episode bounding box (minx, maxx, miny, maxy)
Temporal: episode timeframe (tmin, tmax)
Social rating: rating range (rating_min, rating_max)
Textual: episode keyword set

It supports S²KP queries over MAT data and provides multiple index structures for efficient candidate pruning and sequence verification over MAT episode sequences.

What’s inside

1) Episode-based Multiple aspect Trajectory (EMT) Dual index

A Python implementation of EMT Dual Index (Gryllakis et. al, 2022).

build_emt_dual_index(...) builds two complementary trees over regions-of-interest (ROI) nodes:

Spatial Box Sort (SBS) Tree a kd-tree–like structure over spatial bounds, used to prune by ROI.
Episode Sort (ES Tree) a kd-tree–like structure over rating/time bounds, used to prune by social/temporal constraints.

Each EMTNode aggregates:

union spatial bounds over its episodes
union rating bounds
union time bounds
an inverted index keyword -> [episodes]

Query entrypoint:

s2kp_query(emt_index, query_sequence, keyword_freq_map)

A keyword-frequency heuristic decides whether to start the search from SBS (spatial) or ES (social/temporal), then verifies the full episode sequence.

2) Hilbert Episode-based Multiple aspect Trajectory (HEMT) Dual index

build_hilbert_emt_index(...) extends EMT with Hilbert curve keys for spatial locality:

Computes per-node Hilbert key intervals over episode centers.
Uses those intervals to prune SBS traversal early when a Hilbert interval is available.

Query entrypoint:

hemt_query(hemt_index, query_sequence, keyword_freq_map)

3) Episode-based Multiple aspect trajectory R-tree (EMR-tree)

EMR tree is an enhanced Python implementation for MAT data from Episode-based Semantic trajectory R-tree (ESR) that is used for semantic trajectory data (Gryllakis et. al, 2017).

build_emr_tree(...) builds a hierarchical structure over episodes:

Leaves hold up to max_leaf_size episodes (sorted by a 4D center heuristic).
Internal nodes store union bounds across children.
Each node maintains an inverted file (keyword -> episode IDs) for fast keyword pruning.

A separate STR histogram is used for selectivity estimation:

build_str_histogram(...)
estimate_selectivity_episode(...)

Query entrypoint (selectivity-driven):

emr_sskp_query(root, query_sequence, hist_str, global_if, traj_eps)

The query selects the most selective episode abstraction in the sequence, filters candidate leaf nodes, then verifies remaining episodes forward/backward.

Data model

Each trajectory episode is expected to be a dict with at least:

episode = {
  "trajectory_id": int,
  "episode_id": int,
  "spatial_bounds": (minx, maxx, miny, maxy),
  "timeframe": (tmin, tmax),
  "social_rankings": (rmin, rmax),
  "keywords": set([...]),
  "next_episode": episode_or_None,
  "prev_episode": episode_or_None,
}

Query format (episode abstractions)

Queries are sequences of “episode abstractions” (constraints). Supported fields:

roi_bounds: '*' or (x1, x2, y1, y2)
timeframe: '*' or (tmin, tmax)
rating_min: '*' or number
rating_max: '*' or number
keywords: '*', None, or an iterable of keywords The repository normalizes all abstractions internally via normalize_ea(...) to a canonical schema (roi_bounds, tmin/tmax, rating_min/rating_max, keywords).

Example:

query_sequence = [
  {"roi_bounds": (0, 10, 0, 10), "timeframe": (0, 100), "rating_min": 3, "rating_max": 5, "keywords": {"park"}},
  {"roi_bounds": "*",            "timeframe": (50, 200), "rating_min": 2, "rating_max": 5, "keywords": {"cafe"}},
]

Minimal usage

Build EMT (or HEMT)

emt = build_emt_dual_index(trajectories_with_episodes)
kw_freq = compute_global_keyword_frequency(emt)

Run a sequence query

matches = s2kp_query(emt, query_sequence, kw_freq)
print(matches)
Hilbert-enhanced variant:
hemt = build_hilbert_emt_index(trajectories_with_episodes, p=16)
kw_freq = compute_global_keyword_frequency(hemt)

matches = hemt_query(hemt, query_sequence, kw_freq)
#EMR variant:
root = build_emr_tree(trajectories_with_episodes, max_leaf_size=20)
hist = build_str_histogram(trajectories_with_episodes, B=10)

#Build a global inverted file for selectivity estimation
global_if = {}
for t in trajectories_with_episodes:
    for ep in t["episodes"]:
        ep_id = (ep["trajectory_id"], ep["episode_id"])
        for kw in ep["keywords"]:
            global_if.setdefault(kw, set()).add(ep_id)

matches = emr_sskp_query(root, query_sequence, hist, global_if, trajectories_with_episodes)
print(matches)

Synthetic dataset format

This dataset contains 2D trajectory points grouped into trajectories (tracks). Each row represents one observation (one point) along a trajectory with time, coordinates, rating, keyword coordinates and keywords.

File structure

One row = one timestep for one trajectory
Rows belonging to the same trajectory share the same trajectory_id
Within a trajectory, timestep is expected to be monotonic (typically increasing)

Columns (CSV)

Each line has 8 comma-separated fields:

trajectory_id (int) Identifier of the trajectory / track (e.g., 0).
time (int) Time index or frame number within the trajectory (e.g., 29, 54, 97).
x (float) X coordinate in the dataset’s coordinate system.
y (float) Y coordinate in the dataset’s coordinate system.
rating (float) Social rating.
kx (float) Keyword X coordinate.
ky (float) Keyword Y coordinate.
keyword (string) related keywords (e.g., industry, complex, …).

Example

trajectory_id,time, x, y, rating, kx, ky
0,29,34037.70761806609,9833.138607576757,8.4,3.206948757171631,34.005638122558594,industry
0,54,34782.25592839948,9270.427745288769,9.6,-9.44599723815918,16.407695770263672,complex
0,97,36103.710020125865,8309.520759541527,9.1,-19.66252326965332,-37.67576599121094,delicious

Citation

If you use this code or reproduce these results, please cite: F.Gryllakis, N. Pelekis, C. Doulkeridis and Y. Theodoridis, "Multiple aspect trajectory indexing with Hilbert curve enhancements for efficient S²KP queries", 2025. (Submitted for publication.)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
HEMT_SSKP.py		HEMT_SSKP.py
README.md		README.md
synthetic_dataset.7z		synthetic_dataset.7z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiple aspect trajectory indexing with Hilbert curve enhancements for efficient S²KP queries

What’s inside

1) Episode-based Multiple aspect Trajectory (EMT) Dual index

2) Hilbert Episode-based Multiple aspect Trajectory (HEMT) Dual index

3) Episode-based Multiple aspect trajectory R-tree (EMR-tree)

Data model

Query format (episode abstractions)

Minimal usage

Synthetic dataset format

File structure

Columns (CSV)

Citation

About

Uh oh!

Releases

Packages

Languages

DataStories-UniPi/MAT-indexing

Folders and files

Latest commit

History

Repository files navigation

Multiple aspect trajectory indexing with Hilbert curve enhancements for efficient S²KP queries

What’s inside

1) Episode-based Multiple aspect Trajectory (EMT) Dual index

2) Hilbert Episode-based Multiple aspect Trajectory (HEMT) Dual index

3) Episode-based Multiple aspect trajectory R-tree (EMR-tree)

Data model

Query format (episode abstractions)

Minimal usage

Synthetic dataset format

File structure

Columns (CSV)

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages