-
Notifications
You must be signed in to change notification settings - Fork 0
Description
🚨 Issue Summary
Status: Critical Bug
Type: Data Pipeline Failure
Priority: P0
Affected Component: Cross-dataset evaluation pipeline
The HOSER distillation evaluation pipeline failed completely during abnormal OD pair trajectory generation for the BJUT_Beijing cross-dataset evaluation. The root cause is a road network translation error that prevented BJUT_Beijing road IDs from being translated to Beijing HOSER road IDs, causing all generation attempts to fail with validation errors.
🔍 Error Details
Primary Error
ValueError: Destination 86683 is not reachable from any origin.
OD pair (18567, 86683) is invalid.
Location: gene.py:2076-2081 in generate_trajectories_programmatic()
Phase: abnormal_od_generate
Secondary Error (Translation Phase)
ERROR - ❌ Phase road_network_translate failed: int() argument must be a string, a bytes-like object or a real number, not 'dict'
Location: Pipeline phase execution at line 1667
Impact: Prevented road network mapping from being applied
Timeline
- Nov 17, 13:46: Previous successful run created
road_mapping_beijing_to_bjut_beijing.json - Nov 19, 17:15:45: Current run attempted translation but failed with type error
- Nov 19, 17:26:13: Abnormal OD pairs extracted from BJUT_Beijing dataset (untranslated)
- Nov 19, 17:26:15: Generation failed trying to use BJUT_Beijing IDs in Beijing network
🔬 Root Cause Analysis
The Problem
The pipeline extracts abnormal OD pairs from the target dataset (BJUT_Beijing) but fails to translate them back to the source dataset (Beijing HOSER) before trajectory generation.
Data Flow Issue
Expected Flow:
BJUT_Beijing OD pairs → Reverse Translation → Beijing HOSER IDs → Trajectory Generation
Actual Flow:
BJUT_Beijing OD pairs → [TRANSLATION FAILED] → BJUT_Beijing IDs → VALIDATION ERROR
Evidence
- Mapping file exists:
road_mapping_beijing_to_bjut_beijing.json(2.4MB, 51,579 mappings) - OD pair
(18567, 86683)contains BJUT_Beijing road IDs - Destination
86683does NOT exist in Beijing HOSER road network - Translation phase failed to process the mapping file due to type conversion error
Technical Root Cause
- Translation Phase Bug: Type conversion error when processing mapping file
- Missing Reverse Translation: No logic to translate target dataset IDs back to source dataset
- Validation Without Context: Generation validates OD pairs against source network without considering cross-dataset scenario
🛠️ Fix Strategy
Immediate Fix (Without Pipeline Re-run)
Step 1: Create Reverse Mapping
import json
# Load forward mapping (Beijing → BJUT_Beijing)
with open('road_mapping_beijing_to_bjut_beijing.json', 'r') as f:
forward_mapping = json.load(f)
# Create reverse mapping (BJUT_Beijing → Beijing)
reverse_mapping = {}
for beijing_id, info in forward_mapping.items():
bjut_id = str(info['target_road_id'])
reverse_mapping[bjut_id] = beijing_idStep 2: Translate OD Pairs
# Load BJUT_Beijing OD pairs
with open('abnormal_od_pairs_bjut_beijing.json', 'r') as f:
od_data = json.load(f)
# Translate to Beijing HOSER IDs
translated_pairs = []
for category, pairs in od_data['od_pairs_by_category'].items():
for origin, dest in pairs:
origin_str, dest_str = str(origin), str(dest)
if origin_str in reverse_mapping and dest_str in reverse_mapping:
beijing_origin = int(reverse_mapping[origin_str])
beijing_dest = int(reverse_mapping[dest_str])
translated_pairs.append([beijing_origin, beijing_dest])Step 3: Validate and Generate
# Create Beijing OD pairs file
beijing_od_pairs = {
"dataset": "Beijing",
"od_pairs_by_category": {
"wang_temporal_delay": translated_pairs
}
}
# Save and use for generation
with open('abnormal_od_pairs_beijing.json', 'w') as f:
json.dump(beijing_od_pairs, f)Long-term Fix (Pipeline Code)
Fix 1: Road Network Translation Type Error
- File:
python_pipeline.py- road network translation phase - Issue:
int() argument must be a string, a bytes-like object or a real number, not 'dict' - Fix: Add proper type checking and conversion before int() calls
Fix 2: Add Reverse Translation Logic
- Location:
abnormal_od_extractphase - Issue: No reverse translation of target dataset OD pairs
- Fix: Automatically translate BJUT_Beijing OD pairs to Beijing HOSER IDs
Fix 3: Cross-Dataset Validation
- Location:
generate_trajectories_programmaticfunction - Issue: Validation doesn't consider cross-dataset context
- Fix: Add dataset-aware validation logic
📊 Impact Assessment
Severity
- Critical (P0): Complete failure of cross-dataset evaluation functionality
- User Impact: Cannot evaluate HOSER models on BJUT_Beijing dataset
- Data Loss: No trajectory files generated for abnormal OD pairs
Affected Areas
- ✅ Cross-dataset evaluation workflow
- ✅ Abnormal trajectory generation
- ✅ Trajectory generation validation
⚠️ Road network translation phase
Success Criteria
- OD pairs successfully translated from BJUT_Beijing to Beijing HOSER
- All translated OD pairs pass validation in Beijing network
- Trajectory generation completes without validation errors
- Generated trajectories saved to
gene_abnormal/Beijing/seed42/ - Evaluation phase processes generated trajectories
🔧 Files to Modify
Critical Files
python_pipeline.py- Main pipeline logic (road network translation, OD extraction)gene.py- Trajectory generation and validation (line 2076-2081)
Data Files
abnormal_od_pairs_beijing.json- Translated OD pairs (create new)road_mapping_beijing_to_bjut_beijing.json- Existing mapping file
Configuration
config/abnormal_detection_statistical.yaml- Cross-dataset settings
🧪 Testing Strategy
Unit Tests
- Test reverse mapping creation
- Test OD pair translation accuracy
- Test cross-dataset validation logic
- Test type conversion error handling
Integration Tests
- Run full pipeline on test datasets
- Verify translation works for known OD pairs
- Confirm generated trajectories are valid
Manual Testing
- Validate translated pairs against Beijing network
- Run generation with corrected OD pairs
- Verify evaluation metrics compute correctly
📝 Additional Notes
Context
- This is a research pipeline for knowledge distillation (LM-TAD → HOSER)
- Cross-dataset evaluation tests model generalization to different road networks
- BJUT_Beijing is an alternative road network dataset for Beijing region
Related Issues
- May affect other cross-dataset evaluation scenarios (Porto, other cities)
- Could impact abnormal detection on translated road networks
Next Steps
- Implement manual OD pair translation (1-2 hours)
- Fix pipeline code to prevent recurrence (4-6 hours)
- Add comprehensive tests (2-3 hours)
- Verify fix with full pipeline run (1-2 hours)
Reporter: Error Analysis System
Logs: /home/matt/Dev/HOSER/hoser-distill-beijing/hoser-beijing-eval-resume-20251117_141730.log
Configuration: config/abnormal_detection_statistical.yaml