Skip to content

Critical: Cross-Dataset Translation Failure in Abnormal Trajectory Generation #63

@matercomus

Description

@matercomus

🚨 Issue Summary

Status: Critical Bug
Type: Data Pipeline Failure
Priority: P0
Affected Component: Cross-dataset evaluation pipeline

The HOSER distillation evaluation pipeline failed completely during abnormal OD pair trajectory generation for the BJUT_Beijing cross-dataset evaluation. The root cause is a road network translation error that prevented BJUT_Beijing road IDs from being translated to Beijing HOSER road IDs, causing all generation attempts to fail with validation errors.

🔍 Error Details

Primary Error

ValueError: Destination 86683 is not reachable from any origin. 
OD pair (18567, 86683) is invalid.

Location: gene.py:2076-2081 in generate_trajectories_programmatic()
Phase: abnormal_od_generate

Secondary Error (Translation Phase)

ERROR - ❌ Phase road_network_translate failed: int() argument must be a string, a bytes-like object or a real number, not 'dict'

Location: Pipeline phase execution at line 1667
Impact: Prevented road network mapping from being applied

Timeline

  1. Nov 17, 13:46: Previous successful run created road_mapping_beijing_to_bjut_beijing.json
  2. Nov 19, 17:15:45: Current run attempted translation but failed with type error
  3. Nov 19, 17:26:13: Abnormal OD pairs extracted from BJUT_Beijing dataset (untranslated)
  4. Nov 19, 17:26:15: Generation failed trying to use BJUT_Beijing IDs in Beijing network

🔬 Root Cause Analysis

The Problem

The pipeline extracts abnormal OD pairs from the target dataset (BJUT_Beijing) but fails to translate them back to the source dataset (Beijing HOSER) before trajectory generation.

Data Flow Issue

Expected Flow:

BJUT_Beijing OD pairs → Reverse Translation → Beijing HOSER IDs → Trajectory Generation

Actual Flow:

BJUT_Beijing OD pairs → [TRANSLATION FAILED] → BJUT_Beijing IDs → VALIDATION ERROR

Evidence

  • Mapping file exists: road_mapping_beijing_to_bjut_beijing.json (2.4MB, 51,579 mappings)
  • OD pair (18567, 86683) contains BJUT_Beijing road IDs
  • Destination 86683 does NOT exist in Beijing HOSER road network
  • Translation phase failed to process the mapping file due to type conversion error

Technical Root Cause

  1. Translation Phase Bug: Type conversion error when processing mapping file
  2. Missing Reverse Translation: No logic to translate target dataset IDs back to source dataset
  3. Validation Without Context: Generation validates OD pairs against source network without considering cross-dataset scenario

🛠️ Fix Strategy

Immediate Fix (Without Pipeline Re-run)

Step 1: Create Reverse Mapping

import json

# Load forward mapping (Beijing → BJUT_Beijing)
with open('road_mapping_beijing_to_bjut_beijing.json', 'r') as f:
    forward_mapping = json.load(f)

# Create reverse mapping (BJUT_Beijing → Beijing)
reverse_mapping = {}
for beijing_id, info in forward_mapping.items():
    bjut_id = str(info['target_road_id'])
    reverse_mapping[bjut_id] = beijing_id

Step 2: Translate OD Pairs

# Load BJUT_Beijing OD pairs
with open('abnormal_od_pairs_bjut_beijing.json', 'r') as f:
    od_data = json.load(f)

# Translate to Beijing HOSER IDs
translated_pairs = []
for category, pairs in od_data['od_pairs_by_category'].items():
    for origin, dest in pairs:
        origin_str, dest_str = str(origin), str(dest)
        if origin_str in reverse_mapping and dest_str in reverse_mapping:
            beijing_origin = int(reverse_mapping[origin_str])
            beijing_dest = int(reverse_mapping[dest_str])
            translated_pairs.append([beijing_origin, beijing_dest])

Step 3: Validate and Generate

# Create Beijing OD pairs file
beijing_od_pairs = {
    "dataset": "Beijing",
    "od_pairs_by_category": {
        "wang_temporal_delay": translated_pairs
    }
}

# Save and use for generation
with open('abnormal_od_pairs_beijing.json', 'w') as f:
    json.dump(beijing_od_pairs, f)

Long-term Fix (Pipeline Code)

Fix 1: Road Network Translation Type Error

  • File: python_pipeline.py - road network translation phase
  • Issue: int() argument must be a string, a bytes-like object or a real number, not 'dict'
  • Fix: Add proper type checking and conversion before int() calls

Fix 2: Add Reverse Translation Logic

  • Location: abnormal_od_extract phase
  • Issue: No reverse translation of target dataset OD pairs
  • Fix: Automatically translate BJUT_Beijing OD pairs to Beijing HOSER IDs

Fix 3: Cross-Dataset Validation

  • Location: generate_trajectories_programmatic function
  • Issue: Validation doesn't consider cross-dataset context
  • Fix: Add dataset-aware validation logic

📊 Impact Assessment

Severity

  • Critical (P0): Complete failure of cross-dataset evaluation functionality
  • User Impact: Cannot evaluate HOSER models on BJUT_Beijing dataset
  • Data Loss: No trajectory files generated for abnormal OD pairs

Affected Areas

  • ✅ Cross-dataset evaluation workflow
  • ✅ Abnormal trajectory generation
  • ✅ Trajectory generation validation
  • ⚠️ Road network translation phase

Success Criteria

  • OD pairs successfully translated from BJUT_Beijing to Beijing HOSER
  • All translated OD pairs pass validation in Beijing network
  • Trajectory generation completes without validation errors
  • Generated trajectories saved to gene_abnormal/Beijing/seed42/
  • Evaluation phase processes generated trajectories

🔧 Files to Modify

Critical Files

  1. python_pipeline.py - Main pipeline logic (road network translation, OD extraction)
  2. gene.py - Trajectory generation and validation (line 2076-2081)

Data Files

  1. abnormal_od_pairs_beijing.json - Translated OD pairs (create new)
  2. road_mapping_beijing_to_bjut_beijing.json - Existing mapping file

Configuration

  1. config/abnormal_detection_statistical.yaml - Cross-dataset settings

🧪 Testing Strategy

Unit Tests

  • Test reverse mapping creation
  • Test OD pair translation accuracy
  • Test cross-dataset validation logic
  • Test type conversion error handling

Integration Tests

  • Run full pipeline on test datasets
  • Verify translation works for known OD pairs
  • Confirm generated trajectories are valid

Manual Testing

  • Validate translated pairs against Beijing network
  • Run generation with corrected OD pairs
  • Verify evaluation metrics compute correctly

📝 Additional Notes

Context

  • This is a research pipeline for knowledge distillation (LM-TAD → HOSER)
  • Cross-dataset evaluation tests model generalization to different road networks
  • BJUT_Beijing is an alternative road network dataset for Beijing region

Related Issues

  • May affect other cross-dataset evaluation scenarios (Porto, other cities)
  • Could impact abnormal detection on translated road networks

Next Steps

  1. Implement manual OD pair translation (1-2 hours)
  2. Fix pipeline code to prevent recurrence (4-6 hours)
  3. Add comprehensive tests (2-3 hours)
  4. Verify fix with full pipeline run (1-2 hours)

Reporter: Error Analysis System
Logs: /home/matt/Dev/HOSER/hoser-distill-beijing/hoser-beijing-eval-resume-20251117_141730.log
Configuration: config/abnormal_detection_statistical.yaml

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions