Improve structure and add change_functions.py#26
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a comprehensive module for calculating time series percentage changes and moving averages. The module provides multiple implementations optimized for different use cases, including pandas-based and numpy-based functions.
- Adds standardized percentage change calculations with proper edge case handling (NaN, zero values)
- Implements both pandas and numpy versions for different performance requirements
- Provides specialized functions for common time periods (1-day, 7-day, 30-day changes)
| else: | ||
| # Use time-based shift for irregular frequencies | ||
| time_shift = pd.Timedelta(hours=change_period_hours) | ||
| old_ma = ma.shift(freq=time_shift) |
There was a problem hiding this comment.
The shift(freq=...) parameter is deprecated in newer pandas versions. Use ma.shift(periods=1, freq=time_shift) or consider using ma.tshift(time_shift) for time-based shifting.
| old_ma = ma.shift(freq=time_shift) | |
| old_ma = ma.shift(periods=1, freq=time_shift) |
|
|
||
| old_value = series.shift(days) | ||
| value = series | ||
|
|
||
| # Handle different cases | ||
| result = pd.Series(index=series.index, dtype=float) | ||
|
|
||
| # Both NaN -> NaN | ||
| both_nan = pd.isna(value) | pd.isna(old_value) | ||
| result[both_nan] = np.nan | ||
|
|
||
| # Both zero -> 0% change | ||
| both_zero = (old_value == 0) & (value == 0) & ~both_nan | ||
| result[both_zero] = 0 | ||
|
|
||
| # Old value is zero, new value is not -> 100% increase | ||
| old_zero_new_nonzero = (old_value == 0) & (value != 0) & ~both_nan | ||
| result[old_zero_new_nonzero] = 1 | ||
|
|
||
| # Normal case: calculate percentage change | ||
| normal_case = ~both_nan & ~both_zero & ~old_zero_new_nonzero | ||
| result[normal_case] = (value[normal_case] / old_value[normal_case]) - 1 | ||
|
|
||
| return result | ||
|
|
||
|
|
There was a problem hiding this comment.
The function compute_nd_change_vectorized duplicates most of the logic from compute_nd_change. Consider having one function call the other to reduce code duplication and improve maintainability.
| old_value = series.shift(days) | |
| value = series | |
| # Handle different cases | |
| result = pd.Series(index=series.index, dtype=float) | |
| # Both NaN -> NaN | |
| both_nan = pd.isna(value) | pd.isna(old_value) | |
| result[both_nan] = np.nan | |
| # Both zero -> 0% change | |
| both_zero = (old_value == 0) & (value == 0) & ~both_nan | |
| result[both_zero] = 0 | |
| # Old value is zero, new value is not -> 100% increase | |
| old_zero_new_nonzero = (old_value == 0) & (value != 0) & ~both_nan | |
| result[old_zero_new_nonzero] = 1 | |
| # Normal case: calculate percentage change | |
| normal_case = ~both_nan & ~both_zero & ~old_zero_new_nonzero | |
| result[normal_case] = (value[normal_case] / old_value[normal_case]) - 1 | |
| return result | |
| return compute_nd_change_vectorized(series, days) |
| both_nan = np.isnan(current_values) | np.isnan(old_values) | ||
| both_zero = (old_values == 0) & (current_values == 0) & ~both_nan | ||
| old_zero_new_nonzero = (old_values == 0) & (current_values != 0) & ~both_nan | ||
| normal_case = ~both_nan & ~both_zero & ~old_zero_new_nonzero & (old_values != 0) |
There was a problem hiding this comment.
The condition (old_values != 0) in the normal_case mask is redundant since old_zero_new_nonzero already handles the case where old_values == 0. This extra condition could be removed for clarity.
| normal_case = ~both_nan & ~both_zero & ~old_zero_new_nonzero & (old_values != 0) | |
| normal_case = ~both_nan & ~both_zero & ~old_zero_new_nonzero |
| # Then compute the change - use frequency-aware shift | ||
| if hasattr(ma.index, "freq") and ma.index.freq is not None: | ||
| # If we have a regular frequency, calculate periods needed | ||
| freq_seconds = ma.index.freq.delta.total_seconds() |
There was a problem hiding this comment.
Accessing ma.index.freq.delta may raise an AttributeError for some frequency types that don't have a delta attribute. Consider using pd.Timedelta(ma.index.freq).total_seconds() for safer frequency handling.
| freq_seconds = ma.index.freq.delta.total_seconds() | |
| freq_seconds = pd.Timedelta(ma.index.freq).total_seconds() |
No description provided.