Skip to content

DataStories-UniPi/MAT-indexing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Multiple aspect trajectory indexing with Hilbert curve enhancements for efficient S²KP queries

This repository contains a Python implementation of indexing and query algorithms for Social Spatio-temporal Keyword Pattern (S²KP) queries over Multiple Aspect Trajectory (MAT) data, supporting the following episode-level constraints:

  • Spatial: episode bounding box (minx, maxx, miny, maxy)
  • Temporal: episode timeframe (tmin, tmax)
  • Social rating: rating range (rating_min, rating_max)
  • Textual: episode keyword set

It supports S²KP queries over MAT data and provides multiple index structures for efficient candidate pruning and sequence verification over MAT episode sequences.


What’s inside

1) Episode-based Multiple aspect Trajectory (EMT) Dual index

A Python implementation of EMT Dual Index (Gryllakis et. al, 2022).

build_emt_dual_index(...) builds two complementary trees over regions-of-interest (ROI) nodes:

  • Spatial Box Sort (SBS) Tree a kd-tree–like structure over spatial bounds, used to prune by ROI.
  • Episode Sort (ES Tree) a kd-tree–like structure over rating/time bounds, used to prune by social/temporal constraints.

Each EMTNode aggregates:

  • union spatial bounds over its episodes
  • union rating bounds
  • union time bounds
  • an inverted index keyword -> [episodes]

Query entrypoint:

  • s2kp_query(emt_index, query_sequence, keyword_freq_map)

A keyword-frequency heuristic decides whether to start the search from SBS (spatial) or ES (social/temporal), then verifies the full episode sequence.


2) Hilbert Episode-based Multiple aspect Trajectory (HEMT) Dual index

build_hilbert_emt_index(...) extends EMT with Hilbert curve keys for spatial locality:

  • Computes per-node Hilbert key intervals over episode centers.
  • Uses those intervals to prune SBS traversal early when a Hilbert interval is available.

Query entrypoint:

  • hemt_query(hemt_index, query_sequence, keyword_freq_map)

3) Episode-based Multiple aspect trajectory R-tree (EMR-tree)

EMR tree is an enhanced Python implementation for MAT data from Episode-based Semantic trajectory R-tree (ESR) that is used for semantic trajectory data (Gryllakis et. al, 2017).

build_emr_tree(...) builds a hierarchical structure over episodes:

  • Leaves hold up to max_leaf_size episodes (sorted by a 4D center heuristic).
  • Internal nodes store union bounds across children.
  • Each node maintains an inverted file (keyword -> episode IDs) for fast keyword pruning.

A separate STR histogram is used for selectivity estimation:

  • build_str_histogram(...)
  • estimate_selectivity_episode(...)

Query entrypoint (selectivity-driven):

  • emr_sskp_query(root, query_sequence, hist_str, global_if, traj_eps)

The query selects the most selective episode abstraction in the sequence, filters candidate leaf nodes, then verifies remaining episodes forward/backward.


Data model

Each trajectory episode is expected to be a dict with at least:

episode = {
  "trajectory_id": int,
  "episode_id": int,
  "spatial_bounds": (minx, maxx, miny, maxy),
  "timeframe": (tmin, tmax),
  "social_rankings": (rmin, rmax),
  "keywords": set([...]),
  "next_episode": episode_or_None,
  "prev_episode": episode_or_None,
}

Query format (episode abstractions)

Queries are sequences of “episode abstractions” (constraints). Supported fields:

  • roi_bounds: '*' or (x1, x2, y1, y2)
  • timeframe: '*' or (tmin, tmax)
  • rating_min: '*' or number
  • rating_max: '*' or number
  • keywords: '*', None, or an iterable of keywords The repository normalizes all abstractions internally via normalize_ea(...) to a canonical schema (roi_bounds, tmin/tmax, rating_min/rating_max, keywords).

Example:

query_sequence = [
  {"roi_bounds": (0, 10, 0, 10), "timeframe": (0, 100), "rating_min": 3, "rating_max": 5, "keywords": {"park"}},
  {"roi_bounds": "*",            "timeframe": (50, 200), "rating_min": 2, "rating_max": 5, "keywords": {"cafe"}},
]

Minimal usage

  1. Build EMT (or HEMT)
emt = build_emt_dual_index(trajectories_with_episodes)
kw_freq = compute_global_keyword_frequency(emt)
  1. Run a sequence query
matches = s2kp_query(emt, query_sequence, kw_freq)
print(matches)
Hilbert-enhanced variant:
hemt = build_hilbert_emt_index(trajectories_with_episodes, p=16)
kw_freq = compute_global_keyword_frequency(hemt)

matches = hemt_query(hemt, query_sequence, kw_freq)
#EMR variant:
root = build_emr_tree(trajectories_with_episodes, max_leaf_size=20)
hist = build_str_histogram(trajectories_with_episodes, B=10)

#Build a global inverted file for selectivity estimation
global_if = {}
for t in trajectories_with_episodes:
    for ep in t["episodes"]:
        ep_id = (ep["trajectory_id"], ep["episode_id"])
        for kw in ep["keywords"]:
            global_if.setdefault(kw, set()).add(ep_id)

matches = emr_sskp_query(root, query_sequence, hist, global_if, trajectories_with_episodes)
print(matches)

Synthetic dataset format

This dataset contains 2D trajectory points grouped into trajectories (tracks). Each row represents one observation (one point) along a trajectory with time, coordinates, rating, keyword coordinates and keywords.

File structure

  • One row = one timestep for one trajectory
  • Rows belonging to the same trajectory share the same trajectory_id
  • Within a trajectory, timestep is expected to be monotonic (typically increasing)

Columns (CSV)

Each line has 8 comma-separated fields:

  • trajectory_id (int) Identifier of the trajectory / track (e.g., 0).
  • time (int) Time index or frame number within the trajectory (e.g., 29, 54, 97).
  • x (float) X coordinate in the dataset’s coordinate system.
  • y (float) Y coordinate in the dataset’s coordinate system.
  • rating (float) Social rating.
  • kx (float) Keyword X coordinate.
  • ky (float) Keyword Y coordinate.
  • keyword (string) related keywords (e.g., industry, complex, …).

Example

trajectory_id,time, x, y, rating, kx, ky
0,29,34037.70761806609,9833.138607576757,8.4,3.206948757171631,34.005638122558594,industry
0,54,34782.25592839948,9270.427745288769,9.6,-9.44599723815918,16.407695770263672,complex
0,97,36103.710020125865,8309.520759541527,9.1,-19.66252326965332,-37.67576599121094,delicious

Citation

If you use this code or reproduce these results, please cite: F.Gryllakis, N. Pelekis, C. Doulkeridis and Y. Theodoridis, "Multiple aspect trajectory indexing with Hilbert curve enhancements for efficient S²KP queries", 2025. (Submitted for publication.)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages