Skip to content

Commit 52fb81b

Browse files
committed
Iterate on caching strategy
1 parent 9e768a9 commit 52fb81b

File tree

4 files changed

+711
-12
lines changed

4 files changed

+711
-12
lines changed

IMPROVED_CACHING_STRATEGY.md

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
# Improved Caching Strategy for OpenSensor API
2+
3+
## Problem Analysis
4+
5+
The original caching implementation had a fundamental flaw: it was caching function results that included timestamp parameters in the cache key. This meant:
6+
7+
```python
8+
# OLD: Cache key included ALL parameters including timestamps
9+
@redis_cache(ttl_seconds=300)
10+
def get_device_info_cached(device_id: str):
11+
# Cache key: opensensor:get_device_info_cached:md5(device_id + start_date + end_date + resolution)
12+
```
13+
14+
**Issues:**
15+
- Different time ranges = different cache keys
16+
- Different resolutions = different cache keys
17+
- Cache hit rate was essentially 0% for time-series data
18+
- Wasted Redis memory with duplicate device lookups
19+
20+
## Solution: Granular Caching Strategy
21+
22+
### 1. **Device Metadata Caching** (Long TTL - 24 hours)
23+
```python
24+
# Cache device info separately from query parameters
25+
cache_key = f"opensensor:device_meta:{device_id}"
26+
# TTL: 24 hours (device metadata rarely changes)
27+
```
28+
29+
### 2. **Pipeline Result Caching** (Medium TTL - 15 minutes)
30+
```python
31+
# Cache MongoDB aggregation results by pipeline hash
32+
pipeline_hash = md5(core_pipeline_without_pagination)
33+
cache_key = f"opensensor:pipeline:{pipeline_hash}"
34+
# TTL: 15 minutes (good balance for time-series data)
35+
```
36+
37+
### 3. **Aggregated Data Chunks** (Short TTL - 30 minutes)
38+
```python
39+
# Cache pre-aggregated data by time buckets
40+
cache_key = f"opensensor:agg:{data_type}:{device_id}:{time_bucket}:{resolution}"
41+
# TTL: 30 minutes (for frequently accessed data ranges)
42+
```
43+
44+
## Key Improvements
45+
46+
### **Cache Key Independence**
47+
- Device metadata cached independently of query parameters
48+
- Pipeline results cached by content hash, not input parameters
49+
- Time-based bucketing for aggregated data
50+
51+
### **Intelligent TTL Strategy**
52+
- **Device metadata**: 24 hours (rarely changes)
53+
- **Pipeline results**: 15 minutes (balance freshness vs performance)
54+
- **Aggregated chunks**: 30 minutes (frequently accessed ranges)
55+
56+
### **Smart Cache Invalidation**
57+
```python
58+
# Invalidate relevant caches when new data arrives
59+
def _record_data_to_ts_collection(collection, environment, user=None):
60+
# ... insert data ...
61+
device_id = environment.device_metadata.device_id
62+
sensor_cache.invalidate_device_cache(device_id)
63+
```
64+
65+
### **Size-Aware Caching**
66+
```python
67+
# Don't cache results larger than 1MB
68+
if result and len(json.dumps(result, default=str)) < 1024 * 1024:
69+
sensor_cache.cache_pipeline_result(pipeline_hash, result, ttl_minutes)
70+
```
71+
72+
### **Graceful Degradation**
73+
- Falls back to direct database queries when Redis unavailable
74+
- No service disruption if caching fails
75+
76+
## Implementation Details
77+
78+
### Cache-Aware Device Lookup
79+
```python
80+
def cache_aware_device_lookup(device_id: str) -> Tuple[List[str], str]:
81+
# Try cache first
82+
cached_metadata = sensor_cache.get_device_metadata(device_id)
83+
if cached_metadata:
84+
return cached_metadata['device_ids'], cached_metadata['device_name']
85+
86+
# Cache miss - fetch and cache
87+
api_keys, _ = get_api_keys_by_device_id(device_id)
88+
device_ids, device_name = reduce_api_keys_to_device_ids(api_keys, device_id)
89+
90+
metadata = {
91+
'device_ids': device_ids,
92+
'device_name': device_name,
93+
'cached_at': datetime.utcnow().isoformat()
94+
}
95+
sensor_cache.cache_device_metadata(device_id, metadata)
96+
return device_ids, device_name
97+
```
98+
99+
### Cache-Aware Aggregation
100+
```python
101+
def cache_aware_aggregation(collection, pipeline: List[dict], cache_ttl_minutes: int = 15) -> List[dict]:
102+
# Generate hash excluding pagination
103+
pipeline_hash = sensor_cache.generate_pipeline_hash(pipeline)
104+
105+
# Try cache
106+
cached_result = sensor_cache.get_pipeline_result(pipeline_hash)
107+
if cached_result is not None:
108+
return cached_result
109+
110+
# Execute and cache
111+
result = list(collection.aggregate(pipeline))
112+
if result and len(json.dumps(result, default=str)) < 1024 * 1024:
113+
sensor_cache.cache_pipeline_result(pipeline_hash, result, cache_ttl_minutes)
114+
115+
return result
116+
```
117+
118+
## Performance Benefits
119+
120+
### Expected Improvements:
121+
- **Cache Hit Rate**: 80-90% for device lookups (vs ~0% before)
122+
- **Response Time**: 10-50ms improvement for cached requests
123+
- **Database Load**: Significant reduction in MongoDB aggregation queries
124+
- **Memory Efficiency**: No duplicate device metadata across cache entries
125+
126+
### Cache Effectiveness by Use Case:
127+
128+
1. **Dashboard Loading**: High hit rate for device metadata
129+
2. **Time Series Charts**: Medium hit rate for common time ranges
130+
3. **Real-time Updates**: Smart invalidation ensures data freshness
131+
4. **API Pagination**: Same core data cached across pages
132+
133+
## Migration Strategy
134+
135+
### Phase 1: ✅ **Implement New Strategy**
136+
- [x] Create `cache_strategy.py` with improved caching logic
137+
- [x] Update `collection_apis.py` to use new caching functions
138+
- [x] Add cache invalidation to data recording functions
139+
- [x] Create comprehensive tests
140+
141+
### Phase 2: **Deploy and Monitor**
142+
- [ ] Deploy to staging environment
143+
- [ ] Monitor cache hit rates via `/cache/stats` endpoint
144+
- [ ] Verify performance improvements
145+
- [ ] Monitor Redis memory usage
146+
147+
### Phase 3: **Optimize and Scale**
148+
- [ ] Fine-tune TTL values based on usage patterns
149+
- [ ] Add more granular cache invalidation
150+
- [ ] Consider implementing cache warming for popular devices
151+
152+
## Monitoring and Debugging
153+
154+
### Cache Statistics Endpoint
155+
```bash
156+
GET /cache/stats
157+
```
158+
159+
Returns:
160+
```json
161+
{
162+
"status": "connected",
163+
"opensensor_keys": 1250,
164+
"redis_version": "6.2.0",
165+
"used_memory": "15.2M",
166+
"keyspace_hits": 8420,
167+
"keyspace_misses": 1580,
168+
"hit_rate": "84.2%"
169+
}
170+
```
171+
172+
### Cache Management
173+
```bash
174+
# Clear all cache
175+
POST /cache/clear
176+
177+
# Invalidate specific patterns
178+
POST /cache/invalidate
179+
{"pattern": "device_meta:*"}
180+
```
181+
182+
### Debugging Cache Behavior
183+
```python
184+
# Enable debug logging
185+
import logging
186+
logging.getLogger('opensensor.cache_strategy').setLevel(logging.DEBUG)
187+
```
188+
189+
## Backward Compatibility
190+
191+
-**Zero Breaking Changes**: All existing API endpoints work unchanged
192+
-**Graceful Fallback**: Works without Redis (falls back to direct DB queries)
193+
-**Incremental Adoption**: Can be deployed alongside existing caching
194+
195+
## Future Enhancements
196+
197+
### Potential Optimizations:
198+
1. **Cache Warming**: Pre-populate cache for popular devices
199+
2. **Compression**: Compress large cached results
200+
3. **Distributed Caching**: Shard cache across multiple Redis instances
201+
4. **Predictive Caching**: Cache likely-to-be-requested time ranges
202+
5. **Cache Analytics**: Track which data types benefit most from caching
203+
204+
### Advanced Features:
205+
1. **Time-based Expiration**: Expire cache entries based on data age
206+
2. **Smart Prefetching**: Prefetch adjacent time ranges
207+
3. **Cache Hierarchy**: Multi-level caching (Redis + in-memory)
208+
4. **Query Optimization**: Cache intermediate aggregation steps
209+
210+
## Conclusion
211+
212+
This improved caching strategy addresses the fundamental issue with timestamp-dependent cache keys by:
213+
214+
1. **Separating concerns**: Device metadata vs query results
215+
2. **Using content-based hashing**: Pipeline results cached by content, not parameters
216+
3. **Implementing intelligent TTLs**: Different expiration times for different data types
217+
4. **Adding smart invalidation**: Cache cleared when new data arrives
218+
219+
The result is a much more effective caching system that should significantly improve API performance while maintaining data freshness and consistency.

0 commit comments

Comments
 (0)