ARX with exogenous variable forecasts residual variance is not as expected #749

luchungi · 2024-10-25T08:42:59Z

import numpy as np
import pandas as pd
import arch
from arch.univariate import Normal as ARCHNormal

seed = 42
np.set_printoptions(precision=3, suppress=True, linewidth=100, floatmode='fixed')

test_df = pd.DataFrame({'y':np.array([0,7,2,5,9,3,8]).astype(float), 'x':np.arange(7).astype(float)})
y = test_df['y'].values
x = test_df['x'].values

test_model = arch.arch_model(y=test_df['y'], x=test_df['x'], mean='ARX', vol='GARCH', lags=1, p=1, q=0, rescale=False)
y_coeff = 0.3   # lag 1 coefficient for AR(1) process
x_coeff = 0.2   # coefficient for exogenous variable
res_coeff = 0.5 # lag 1 coefficient for volatility process
start = 1       # start from second observation
test_model.distribution = ARCHNormal(seed=seed)
test_res = test_model.fix([0., y_coeff, x_coeff, 0., res_coeff]) # constants set to zero
sims = test_res.forecast(horizon=1, start=start, method='simulation', simulations=1, x=x[:,np.newaxis], align='origin')
means = sims.mean.values.squeeze()
vars = sims.variance.values.squeeze()
expected_means = y_coeff*y + x_coeff*x # calculate expected means
eps = y[start:] - expected_means[start-1:-1] if start > 0 else y[1:] - expected_means[:-1] # calculate residuals

# eps = eps - x_coeff # NOTE: adding this line aligns the residuals with the variance otherwise, it does not match

values = sims.simulations.values.squeeze()
print(f'y:              {y[start:]}')
print(f'x:              {x[start:]}')
print(f'Expected means: {expected_means[start:]}')
print(f'Actual means:   {means}')
print(f'Residuals:      {eps}')
print(f'Expected vars:  {res_coeff*(eps**2)}')
print(f'Actual vars:    {vars}')

The above code tries to simulate a AR(1) with 1 exogenous variable with a GARCH(1,0) volatility process with horizon=1 and start=1. Unexpectedly, the residual variance does not align with expectations unless the coefficient for the exogenous variable is subtracted from the residual from the previous time step i.e. eps = eps - x_coeff.

The text was updated successfully, but these errors were encountered:

bashtage · 2024-10-25T09:05:13Z

Thanks for the report. When I try and copy and paste the code above, I see ValueError: could not broadcast input array from shape (6,) into shape (6,1). So might need some digging.

luchungi · 2024-10-25T09:12:21Z

I realised I am not using the latest version. Let me update and check again before coming back to either update or close the issue.

bashtage · 2024-10-25T09:13:11Z

I think it is NumPy changes on strictness of broadcasting. This issue has already helped me find a future bug.

luchungi · 2024-10-25T09:21:19Z

Glad to hear it helped!
For this particular case, I think line 1033 in mean.py of arch.univariate has an issue when # simulations=1. The squeeze() will remove the # simulations dimension. If you change the following line to simulations=2 then the code above will run.
sims = test_res.forecast(horizon=1, start=start, method='simulation', simulations=2, x=x[:,np.newaxis], align='origin')

bashtage · 2024-10-25T10:36:01Z

Yup. That is exactly the problem. Still need to work through he actual issue here

luchungi · 2024-10-27T03:48:20Z

To add a bit more detail of the expected behaviour which is that in the set up above, we should have
$\sigma_t^2 = \alpha \epsilon_{t-1}^2$
with $\epsilon_{t-1} = y_{t-1} - E[y_{t-1}] = y_{t-1} - (\phi y_{t-2} + \beta x_{t-1})$
where res_coeff $=\alpha$, y_coeff $=\phi$ and x_coeff $=\beta$
However, it seems that what is being calculated is $\epsilon_{t-1} = y_{t-1} - E[y_{t-1}] - \beta$
Let me know if my expectation is incorrect.

bashtage · 2024-10-29T12:56:44Z

When I manually compute the means and variances, I get the same as the arch package computes.

When I run this code (after yours):

resids = test_res._resid
for i in range(1, y.shape[0]):
    _m = 0 + y_coeff * y[i] + x_coeff * x[i]
    # i-1 here since there is no resid for y[0]
    _v = 0.0 + res_coeff * test_res._resid[i-1] ** 2
    print(f"{i}: {_m:0.4f} {_v:0.4f}")

print(f'Actual means:   {means}')
print(f'Actual vars:    {vars}')

I get

1: 2.3000 23.1200
2: 1.0000 0.1250
3: 2.1000 7.2200
4: 3.5000 22.4450
5: 1.9000 0.2450
6: 3.6000 17.4050
Actual means:   [2.300 1.000 2.100 3.500 1.900 3.600]
Actual vars:    [23.120  0.125  7.220 22.445  0.245 17.405]

Systematically find an fix bugs that affect fixed results closes #749

bashtage · 2024-10-29T23:39:43Z

The code appears to be correct. I think you have the timing of the x variable off by 1.

Systematically find an fix bugs that affect fixed results closes #749

luchungi · 2024-10-30T09:39:34Z

Yes, the calculation from the residual from test_res._resid to variance seems correct. However, I do not understand how these residual values are derived. I expected the following
$\epsilon_{t-1} = y_{t-1} - E[y_{t-1}] = y_{t-1} - (\phi y_{t-2} + \beta x_{t-1})$ since the constants are set to 0.

I changed the coefficients and values of x in the code below to get integer outcomes.

import numpy as np
import pandas as pd
import arch
from arch.univariate import Normal as ARCHNormal

seed = 42

test_df = pd.DataFrame({'y':np.array([0,7,2,5,9,3,8]).astype(float), 'x':np.array([1,2,4,2,1,9,6]).astype(float)})
y = test_df['y'].values
x = test_df['x'].values

test_model = arch.arch_model(y=test_df['y'], x=test_df['x'], mean='ARX', vol='GARCH', lags=1, p=1, q=0, rescale=False)
y_coeff = 1.0   # lag 1 coefficient for AR(1) process
x_coeff = 1.0   # coefficient for exogenous variable
res_coeff = 1.0 # lag 1 coefficient for volatility process
start = 1       # start from second observation
test_model.distribution = ARCHNormal(seed=seed)
test_res = test_model.fix([0., y_coeff, x_coeff, 0., res_coeff]) # constants set to zero
sims = test_res.forecast(horizon=1, start=start, method='simulation', simulations=1, x=x[:,np.newaxis], align='origin')
means = sims.mean.values.squeeze()
vars = sims.variance.values.squeeze()
residuals = test_res._resid
expected_means = y_coeff*y + x_coeff*x # calculate expected means

values = sims.simulations.values.squeeze()
print(f'y:                  {y[start:]}')
print(f'x:                  {x[start:]}')
print(f'Expected means:     {expected_means[start:]}')
print(f'Actual means:       {means}')
print(f'Actual Residuals:   {residuals}')

which gets us

y:                  [7. 2. 5. 9. 3. 8.]
x:                  [2. 4. 2. 1. 9. 6.]
Expected means:     [ 9.  6.  7. 10. 12. 14.]
Actual means:       [ 9.  6.  7. 10. 12. 14.]
Actual Residuals:   [ -9.   1.   3. -15.  -1.]

Actual means and Actual residuals are from sims.mean and test_res._resid.
Take the -15 residual value for example. The highest mean in this series is 14 and all the values of y are positive so there should not be a residual value of -15 anywhere unless my expectation of how the residual is calculated is wrong?

Systematically find an fix bugs that affect fixed results closes #749

bashtage · 2024-10-30T16:20:13Z

You need to shift the data to be aligned right to see where the -15 comes from. Since the y is lagged, the correctly aligned series are

y:                  [7,  2, 5, 9,   3,  8]
y_lag:              [0,  7, 2, 5,   9,  3]
x:                  [2,  4, 2, 1,   9,  6]
exp_val:            [2, 11, 4, 6,  18,  9]
resid:              [5, -9, 1, 3, -15, -1]

Which matches the output. Alignment of the series is the key to understanding what is going on.

bashtage · 2024-10-30T16:20:41Z

I have closed since I completed the PR, but feel free to continue to post if the isn't clear.

luchungi · 2024-10-30T22:59:22Z

Thanks for the clarification. Your calculation makes perfect sense but in this case, it means sims.mean does not actually contain the expected value which you calculated as exp_val?

bashtage · 2024-10-31T10:14:22Z

Probably need one more check that the X is being handled correctly in the forecast. There are tests for it, but alignment is always one of the trickiest parts.

bashtage · 2024-11-01T08:56:59Z

Here is the simplest explnation.

import numpy as np
import pandas as pd
import arch
from arch.univariate import Normal as ARCHNormal

seed = 42

test_df = pd.DataFrame(
    {
        "y": np.array([0, 7, 2, 5, 9, 3, 8]).astype(float),
        "x": np.array([0, 1, 2, 3, 4, 5, 6]).astype(float),
    }
)
y = test_df["y"].values
x = test_df["x"].values

test_model = arch.arch_model(
    y=test_df["y"],
    x=test_df["x"],
    mean="ARX",
    vol="GARCH",
    lags=0,
    p=1,
    q=0,
    rescale=False,
)
# LS mean y = a + b X, ARCH(1) variance
x_coeff = 1.0  # coefficient for exogenous variable
res_coeff = 1.0  # lag 1 coefficient for volatility process
start = 1  # start from second observation
test_model.distribution = ARCHNormal(seed=seed)
# Parameters are a=0, b=1, omega=0, alpha=1
test_res = test_model.fix([0.0, x_coeff, 0.0, res_coeff])  # constants set to zero
sims = test_res.forecast(
    horizon=1,
    start=start,
    method="simulation",
    simulations=1,
    x=x[:, np.newaxis],
    align="origin",
)

You can see the model is an ARX with no lags of y and just X, so that the mean model is $y_t = a + b x_t + \epsilon_t$. When we forecast starting at 1 (inclusive) and give it the X data (as x_oos), we see that it uses x[1] for the first forecast, x[2] for the second, and so on.

I still need to verify that this is the correct behavior with the docs, but this is how it is working. The docs are a bit confusing and so some examples showing the correct behavior shoudl be added (and verified).

Also see this example with start=2

sims = test_res.forecast(
    horizon=1,
    start=2,
    method="simulation",
    simulations=1,
    x=x_oos,
    align="origin",
)
print(sims.mean)

Add additional test Fix bug that occurs when simulations=1 closes #749

bashtage · 2024-11-01T16:03:37Z

I have added tests in #752 and the results are correct. The way x variables are handled depends on the shape of x. In the case you are using, since x has the same shape as the original y, the forecast for period 1 uses y[0] and x[1]. For period 2 it uses y[1] and x[2], and so on.

luchungi · 2024-11-02T01:38:49Z

However, if you add a lag for y, the calculations for sims.mean are not correctly aligned as seen in #749 (comment) vs #749 (comment)

If you look at how to get the same values as sims.mean in #749 (comment)
expected_means = y_coeff*y + x_coeff*x
The alignment is different from what you described e.g. it is sims.mean[1] = y_coeff * y[1] + x_coeff * x[1] instead of sims.mean[1] = y_coeff * y[0] + x_coeff * x[1]. I agree with your calculation in #749 (comment) makes more sense in terms of how x and y are aligned and all the values being forecasted i.e. sims.simulations.values seem to be using this alignment so it is only sims.mean that is off.

bashtage · 2024-11-04T19:16:28Z

Hmm - I swear I wrote a reply earlier. Seems to have gotten eaten. I have re double-checked things and they are correct. The challenge is that the x alignment in modeling and in forecasting is not the same. I have attempted to explain it a bit more in the notebook. The key difference is that if you use x in the modeling stage, you need to use x.shift(-1) when forecasting to recover the conditional mean.

luchungi · 2024-11-05T01:52:08Z

You are right! This whole section in your link is missing from the equivalent chapter in the docs here. This all makes sense now. Thanks.

bashtage added a commit that referenced this issue Oct 29, 2024

BUG: Fix bugs in fixed result

839e734

Systematically find an fix bugs that affect fixed results closes #749

bashtage mentioned this issue Oct 29, 2024

BUG: Fix bugs in fixed result #750

Merged

bashtage added a commit that referenced this issue Oct 30, 2024

BUG: Fix bugs in fixed result

7c11b78

Systematically find an fix bugs that affect fixed results closes #749

bashtage added a commit that referenced this issue Oct 30, 2024

BUG: Fix bugs in fixed result

cf90d37

Systematically find an fix bugs that affect fixed results closes #749

bashtage added a commit that referenced this issue Oct 30, 2024

BUG: Fix bugs in fixed result

ff7524e

Systematically find an fix bugs that affect fixed results closes #749

bashtage added a commit that referenced this issue Oct 30, 2024

BUG: Fix bugs in fixed result

52423a1

Systematically find an fix bugs that affect fixed results closes #749

bashtage added a commit that referenced this issue Oct 30, 2024

BUG: Fix bugs in fixed result

d430c5f

Systematically find an fix bugs that affect fixed results closes #749

bashtage closed this as completed in #750 Oct 30, 2024

bashtage reopened this Oct 31, 2024

bashtage added a commit that referenced this issue Nov 1, 2024

TST: Add tests for fcasts with exog

abd020e

Add additional test Fix bug that occurs when simulations=1 closes #749

bashtage mentioned this issue Nov 1, 2024

TST: Add tests for fcasts with exog #752

Merged

bashtage closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARX with exogenous variable forecasts residual variance is not as expected #749

ARX with exogenous variable forecasts residual variance is not as expected #749

luchungi commented Oct 25, 2024

bashtage commented Oct 25, 2024

luchungi commented Oct 25, 2024

bashtage commented Oct 25, 2024

luchungi commented Oct 25, 2024

bashtage commented Oct 25, 2024

luchungi commented Oct 27, 2024

bashtage commented Oct 29, 2024 •

edited

Loading

bashtage commented Oct 29, 2024

luchungi commented Oct 30, 2024

bashtage commented Oct 30, 2024

bashtage commented Oct 30, 2024

luchungi commented Oct 30, 2024

bashtage commented Oct 31, 2024

bashtage commented Nov 1, 2024

bashtage commented Nov 1, 2024

luchungi commented Nov 2, 2024

bashtage commented Nov 4, 2024

luchungi commented Nov 5, 2024

ARX with exogenous variable forecasts residual variance is not as expected #749

ARX with exogenous variable forecasts residual variance is not as expected #749

Comments

luchungi commented Oct 25, 2024

bashtage commented Oct 25, 2024

luchungi commented Oct 25, 2024

bashtage commented Oct 25, 2024

luchungi commented Oct 25, 2024

bashtage commented Oct 25, 2024

luchungi commented Oct 27, 2024

bashtage commented Oct 29, 2024 • edited Loading

bashtage commented Oct 29, 2024

luchungi commented Oct 30, 2024

bashtage commented Oct 30, 2024

bashtage commented Oct 30, 2024

luchungi commented Oct 30, 2024

bashtage commented Oct 31, 2024

bashtage commented Nov 1, 2024

bashtage commented Nov 1, 2024

luchungi commented Nov 2, 2024

bashtage commented Nov 4, 2024

luchungi commented Nov 5, 2024

bashtage commented Oct 29, 2024 •

edited

Loading