-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ARIMA Normalization Functionality #89
base: master
Are you sure you want to change the base?
Conversation
gs_quant/timeseries/arima.py
Outdated
@@ -0,0 +1,246 @@ | |||
# Copyright 2020 Goldman Sachs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the file should live in one of the packages - either statistics or econometrics
gs_quant/timeseries/arima.py
Outdated
self.best_params = {} | ||
|
||
|
||
def _evaluate_arima_model(self, X: Union[pd.Series, pd.DataFrame], arima_order: Tuple[int, int, int], train_size: float, freq: str) -> Tuple[float, dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train size should be an int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it so it could take in a float, int or None (similar to what scikit-learn does).
- If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set 0.75 (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
Is that too complicated or should we just use int to simplify things
gs_quant/timeseries/arima.py
Outdated
best_ma_coef = ma_coef | ||
best_resid = resid | ||
except Exception as e: | ||
print(' {}'.format(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls raise exception, remove print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certain combinations of (p, q, d) will raise the following exception: Estimation requires the inclusion of least one AR term, MA term, a constant or an exogenous variable.
Raise exception will then break the training loop. Maybe it's a better idea to just print the error and move on to the next combination of (p, q, d)?
ARIMA here is used without the moving averages component to normalize and forecast time series data.
An ARIMA model is selected from 9 possible combinations: (0,0,0), (1,0,0), (2,0,0), (0,1,0), (1,1,0), (2,1,0), (0,2,0), (1,2,0), (2,2,0). The time series is split into train and test sets and an ARIMA model is fit for every combination on the training set. The model with the lowest mean-squared error (MSE) on the test set is selected as the best model. The original times series can then be transformed by the best model.