|
| 1 | +# Improved Caching Strategy for OpenSensor API |
| 2 | + |
| 3 | +## Problem Analysis |
| 4 | + |
| 5 | +The original caching implementation had a fundamental flaw: it was caching function results that included timestamp parameters in the cache key. This meant: |
| 6 | + |
| 7 | +```python |
| 8 | +# OLD: Cache key included ALL parameters including timestamps |
| 9 | +@redis_cache(ttl_seconds=300) |
| 10 | +def get_device_info_cached(device_id: str): |
| 11 | + # Cache key: opensensor:get_device_info_cached:md5(device_id + start_date + end_date + resolution) |
| 12 | +``` |
| 13 | + |
| 14 | +**Issues:** |
| 15 | +- Different time ranges = different cache keys |
| 16 | +- Different resolutions = different cache keys |
| 17 | +- Cache hit rate was essentially 0% for time-series data |
| 18 | +- Wasted Redis memory with duplicate device lookups |
| 19 | + |
| 20 | +## Solution: Granular Caching Strategy |
| 21 | + |
| 22 | +### 1. **Device Metadata Caching** (Long TTL - 24 hours) |
| 23 | +```python |
| 24 | +# Cache device info separately from query parameters |
| 25 | +cache_key = f"opensensor:device_meta:{device_id}" |
| 26 | +# TTL: 24 hours (device metadata rarely changes) |
| 27 | +``` |
| 28 | + |
| 29 | +### 2. **Pipeline Result Caching** (Medium TTL - 15 minutes) |
| 30 | +```python |
| 31 | +# Cache MongoDB aggregation results by pipeline hash |
| 32 | +pipeline_hash = md5(core_pipeline_without_pagination) |
| 33 | +cache_key = f"opensensor:pipeline:{pipeline_hash}" |
| 34 | +# TTL: 15 minutes (good balance for time-series data) |
| 35 | +``` |
| 36 | + |
| 37 | +### 3. **Aggregated Data Chunks** (Short TTL - 30 minutes) |
| 38 | +```python |
| 39 | +# Cache pre-aggregated data by time buckets |
| 40 | +cache_key = f"opensensor:agg:{data_type}:{device_id}:{time_bucket}:{resolution}" |
| 41 | +# TTL: 30 minutes (for frequently accessed data ranges) |
| 42 | +``` |
| 43 | + |
| 44 | +## Key Improvements |
| 45 | + |
| 46 | +### ✅ **Cache Key Independence** |
| 47 | +- Device metadata cached independently of query parameters |
| 48 | +- Pipeline results cached by content hash, not input parameters |
| 49 | +- Time-based bucketing for aggregated data |
| 50 | + |
| 51 | +### ✅ **Intelligent TTL Strategy** |
| 52 | +- **Device metadata**: 24 hours (rarely changes) |
| 53 | +- **Pipeline results**: 15 minutes (balance freshness vs performance) |
| 54 | +- **Aggregated chunks**: 30 minutes (frequently accessed ranges) |
| 55 | + |
| 56 | +### ✅ **Smart Cache Invalidation** |
| 57 | +```python |
| 58 | +# Invalidate relevant caches when new data arrives |
| 59 | +def _record_data_to_ts_collection(collection, environment, user=None): |
| 60 | + # ... insert data ... |
| 61 | + device_id = environment.device_metadata.device_id |
| 62 | + sensor_cache.invalidate_device_cache(device_id) |
| 63 | +``` |
| 64 | + |
| 65 | +### ✅ **Size-Aware Caching** |
| 66 | +```python |
| 67 | +# Don't cache results larger than 1MB |
| 68 | +if result and len(json.dumps(result, default=str)) < 1024 * 1024: |
| 69 | + sensor_cache.cache_pipeline_result(pipeline_hash, result, ttl_minutes) |
| 70 | +``` |
| 71 | + |
| 72 | +### ✅ **Graceful Degradation** |
| 73 | +- Falls back to direct database queries when Redis unavailable |
| 74 | +- No service disruption if caching fails |
| 75 | + |
| 76 | +## Implementation Details |
| 77 | + |
| 78 | +### Cache-Aware Device Lookup |
| 79 | +```python |
| 80 | +def cache_aware_device_lookup(device_id: str) -> Tuple[List[str], str]: |
| 81 | + # Try cache first |
| 82 | + cached_metadata = sensor_cache.get_device_metadata(device_id) |
| 83 | + if cached_metadata: |
| 84 | + return cached_metadata['device_ids'], cached_metadata['device_name'] |
| 85 | + |
| 86 | + # Cache miss - fetch and cache |
| 87 | + api_keys, _ = get_api_keys_by_device_id(device_id) |
| 88 | + device_ids, device_name = reduce_api_keys_to_device_ids(api_keys, device_id) |
| 89 | + |
| 90 | + metadata = { |
| 91 | + 'device_ids': device_ids, |
| 92 | + 'device_name': device_name, |
| 93 | + 'cached_at': datetime.utcnow().isoformat() |
| 94 | + } |
| 95 | + sensor_cache.cache_device_metadata(device_id, metadata) |
| 96 | + return device_ids, device_name |
| 97 | +``` |
| 98 | + |
| 99 | +### Cache-Aware Aggregation |
| 100 | +```python |
| 101 | +def cache_aware_aggregation(collection, pipeline: List[dict], cache_ttl_minutes: int = 15) -> List[dict]: |
| 102 | + # Generate hash excluding pagination |
| 103 | + pipeline_hash = sensor_cache.generate_pipeline_hash(pipeline) |
| 104 | + |
| 105 | + # Try cache |
| 106 | + cached_result = sensor_cache.get_pipeline_result(pipeline_hash) |
| 107 | + if cached_result is not None: |
| 108 | + return cached_result |
| 109 | + |
| 110 | + # Execute and cache |
| 111 | + result = list(collection.aggregate(pipeline)) |
| 112 | + if result and len(json.dumps(result, default=str)) < 1024 * 1024: |
| 113 | + sensor_cache.cache_pipeline_result(pipeline_hash, result, cache_ttl_minutes) |
| 114 | + |
| 115 | + return result |
| 116 | +``` |
| 117 | + |
| 118 | +## Performance Benefits |
| 119 | + |
| 120 | +### Expected Improvements: |
| 121 | +- **Cache Hit Rate**: 80-90% for device lookups (vs ~0% before) |
| 122 | +- **Response Time**: 10-50ms improvement for cached requests |
| 123 | +- **Database Load**: Significant reduction in MongoDB aggregation queries |
| 124 | +- **Memory Efficiency**: No duplicate device metadata across cache entries |
| 125 | + |
| 126 | +### Cache Effectiveness by Use Case: |
| 127 | + |
| 128 | +1. **Dashboard Loading**: High hit rate for device metadata |
| 129 | +2. **Time Series Charts**: Medium hit rate for common time ranges |
| 130 | +3. **Real-time Updates**: Smart invalidation ensures data freshness |
| 131 | +4. **API Pagination**: Same core data cached across pages |
| 132 | + |
| 133 | +## Migration Strategy |
| 134 | + |
| 135 | +### Phase 1: ✅ **Implement New Strategy** |
| 136 | +- [x] Create `cache_strategy.py` with improved caching logic |
| 137 | +- [x] Update `collection_apis.py` to use new caching functions |
| 138 | +- [x] Add cache invalidation to data recording functions |
| 139 | +- [x] Create comprehensive tests |
| 140 | + |
| 141 | +### Phase 2: **Deploy and Monitor** |
| 142 | +- [ ] Deploy to staging environment |
| 143 | +- [ ] Monitor cache hit rates via `/cache/stats` endpoint |
| 144 | +- [ ] Verify performance improvements |
| 145 | +- [ ] Monitor Redis memory usage |
| 146 | + |
| 147 | +### Phase 3: **Optimize and Scale** |
| 148 | +- [ ] Fine-tune TTL values based on usage patterns |
| 149 | +- [ ] Add more granular cache invalidation |
| 150 | +- [ ] Consider implementing cache warming for popular devices |
| 151 | + |
| 152 | +## Monitoring and Debugging |
| 153 | + |
| 154 | +### Cache Statistics Endpoint |
| 155 | +```bash |
| 156 | +GET /cache/stats |
| 157 | +``` |
| 158 | + |
| 159 | +Returns: |
| 160 | +```json |
| 161 | +{ |
| 162 | + "status": "connected", |
| 163 | + "opensensor_keys": 1250, |
| 164 | + "redis_version": "6.2.0", |
| 165 | + "used_memory": "15.2M", |
| 166 | + "keyspace_hits": 8420, |
| 167 | + "keyspace_misses": 1580, |
| 168 | + "hit_rate": "84.2%" |
| 169 | +} |
| 170 | +``` |
| 171 | + |
| 172 | +### Cache Management |
| 173 | +```bash |
| 174 | +# Clear all cache |
| 175 | +POST /cache/clear |
| 176 | + |
| 177 | +# Invalidate specific patterns |
| 178 | +POST /cache/invalidate |
| 179 | +{"pattern": "device_meta:*"} |
| 180 | +``` |
| 181 | + |
| 182 | +### Debugging Cache Behavior |
| 183 | +```python |
| 184 | +# Enable debug logging |
| 185 | +import logging |
| 186 | +logging.getLogger('opensensor.cache_strategy').setLevel(logging.DEBUG) |
| 187 | +``` |
| 188 | + |
| 189 | +## Backward Compatibility |
| 190 | + |
| 191 | +- ✅ **Zero Breaking Changes**: All existing API endpoints work unchanged |
| 192 | +- ✅ **Graceful Fallback**: Works without Redis (falls back to direct DB queries) |
| 193 | +- ✅ **Incremental Adoption**: Can be deployed alongside existing caching |
| 194 | + |
| 195 | +## Future Enhancements |
| 196 | + |
| 197 | +### Potential Optimizations: |
| 198 | +1. **Cache Warming**: Pre-populate cache for popular devices |
| 199 | +2. **Compression**: Compress large cached results |
| 200 | +3. **Distributed Caching**: Shard cache across multiple Redis instances |
| 201 | +4. **Predictive Caching**: Cache likely-to-be-requested time ranges |
| 202 | +5. **Cache Analytics**: Track which data types benefit most from caching |
| 203 | + |
| 204 | +### Advanced Features: |
| 205 | +1. **Time-based Expiration**: Expire cache entries based on data age |
| 206 | +2. **Smart Prefetching**: Prefetch adjacent time ranges |
| 207 | +3. **Cache Hierarchy**: Multi-level caching (Redis + in-memory) |
| 208 | +4. **Query Optimization**: Cache intermediate aggregation steps |
| 209 | + |
| 210 | +## Conclusion |
| 211 | + |
| 212 | +This improved caching strategy addresses the fundamental issue with timestamp-dependent cache keys by: |
| 213 | + |
| 214 | +1. **Separating concerns**: Device metadata vs query results |
| 215 | +2. **Using content-based hashing**: Pipeline results cached by content, not parameters |
| 216 | +3. **Implementing intelligent TTLs**: Different expiration times for different data types |
| 217 | +4. **Adding smart invalidation**: Cache cleared when new data arrives |
| 218 | + |
| 219 | +The result is a much more effective caching system that should significantly improve API performance while maintaining data freshness and consistency. |
0 commit comments