The discrepancy between the Python statsmodels standard errors (slide 31 of https://ionides.github.io/531w26/05/slides.pdf) vs the R arima standard errors (slide 31 of https://ionides.github.io/531w25/05/slides.pdf) turns out to be a difference in how the Fisher information is approximated.
Python uses empirical Fisher information, defined as $\sum_{n=1}^N \nabla \ell_n(\theta) \nabla\ell_n(\theta)^T$ whereas R uses the observed Fisher information, defined as $-(\nabla \nabla^T) \sum_{n=1}^N \ell_n(\theta)$, where $\ell_n$ is the conditional log-likelihood for $y_n$ given $y_{1:n-1}$. Both are calculated at the MLE, $\theta=\hat\theta$. Both are asymptotically justified as estimates of the expected Fisher information. Both are untrustworthy in the absence of a good quadratic approximation to the log-likelihood, as we find in this example. However, the methods apparently fail in different ways.
Python statsmodels reports a small standard error for $\phi_2$ whereas R arima reports small standard errors for both $\phi_2$ and $\theta_1$.
In order to make the point that the Fisher standard errors can be too small, it is better to do a profile for $\phi_2$ when using statsmodels. I'll experiment with this in a revision of the notes.
The discrepancy between the Python statsmodels standard errors (slide 31 of https://ionides.github.io/531w26/05/slides.pdf) vs the R arima standard errors (slide 31 of https://ionides.github.io/531w25/05/slides.pdf) turns out to be a difference in how the Fisher information is approximated.
Python uses empirical Fisher information, defined as$\sum_{n=1}^N \nabla \ell_n(\theta) \nabla\ell_n(\theta)^T$ whereas R uses the observed Fisher information, defined as $-(\nabla \nabla^T) \sum_{n=1}^N \ell_n(\theta)$ , where $\ell_n$ is the conditional log-likelihood for $y_n$ given $y_{1:n-1}$ . Both are calculated at the MLE, $\theta=\hat\theta$ . Both are asymptotically justified as estimates of the expected Fisher information. Both are untrustworthy in the absence of a good quadratic approximation to the log-likelihood, as we find in this example. However, the methods apparently fail in different ways.
Python statsmodels reports a small standard error for$\phi_2$ whereas R arima reports small standard errors for both $\phi_2$ and $\theta_1$ .
In order to make the point that the Fisher standard errors can be too small, it is better to do a profile for$\phi_2$ when using statsmodels. I'll experiment with this in a revision of the notes.