appendix1.tex

\chapter{Methods for Manipulating Paleomagnetic Data on A Globe in Chapter
2}\label{appen4chp2}

\section{Test if A Coeval Pole Pair Is Distinguishable with the Bootstrap}
Whether a pair of coeval mean poles are statistically distinguishable from each
other is investigated, as it can be determined by checking if the confidence
intervals of their bootstrap means (based on two poles' uncertainty attributes)
overlap~\citep{T91}. The 95\% confidence bounds of the Cartesian coordinates of
the bootstrap means are determined and compared. If the poles are
distinguishable, the confidence bounds along at least one coordinate axis do not
overlap. Otherwise, if the confidence bounds along all the three coordinate axes
overlap, the poles are indistinguishable~\citep{T91}. The actual test used is
dependent on the number of paleopoles ($\mathbf{N}$) used to calculate the mean
pole in the APWP$\colon$

\begin{itemize}
  \item when $\mathbf{N}>25$ a simple bootstrap~\citep{T91} generates a
    pseudo-mean pole from $\mathbf{N}$ directions drawn randomly from the
    original set of paleopoles. 1000 such simple Bootstraps are implemented here.
  \item when $1<\mathbf{N}\leq25$ a parametric bootstrap~\citep{T91} generates a
    pseudo-mean from $\mathbf{N}$ directions drawn from a \emph{Fisher}
    distribution with the same precision parameter $\kappa$ and $\mathbf{N}$ as
    the mean pole. 1000 such parametric Bootstraps are implemented here.
  \item when $\mathbf{N}=1$, a pseudo-mean is drawn from a bivariate normal
    distribution, defined by the properties of the associated A95 uncertainty
    circle or $\mathbf{dm}/\mathbf{dp}$ ellipse (see the following
    Section~\ref{sec:biv}). Here 1000 samples are drawn from such a normal
    distribution.
  \item if $\mathbf{N}$ is not given, because for example sometimes the pole
    could be an interpolated result, a negligible A95 like 0.1\degree\ or
    0\degree\ is assigned and the same sampling way as used for the
    $\mathbf{N}=1$ case is applied here. This is for the situation when only one
    of the coeval poles is interpolated, and one would like to keep this pair of
    poles. Note that if the coeval poles are both interpolated, we suggest
    directly removing this pair of poles.
\end{itemize}

\paragraph{Special cases} Sometimes, like in those cases in Fig.~\ref{fig:2traj}
and Fig.~\ref{fig:T12Fig13a}, we have complete access to the parameters of the
mean poles, e.g.\ $\mathbf{N}$ and \emph{Fisher} precision parameter $\kappa$, and also
the paleopoles. However this is not necessarily true. If, for instance, we only
have access to the path with only its mean poles and spatial uncertainties, we
can keep the way of doing bootstrap sampling consistent for all the mean poles,
and just draw bootstrapped means from a bivariate normal distribution based on
each spatial uncertainty's geometry. This is implemented through arbitrarily
setting $\mathbf{N}$=1. The consistency of bootstrap sampling makes it
independent of the state of knowledge of the underlying dataset and even the
underlying method used to calculate the uncertainty. This means the method can
be more generalisable beyond APWPs, because the metrics and the significance
testing procedure are more broadly applicable to comparison of other
trajectories with associated spatial uncertainties, such as hurricane tracks and
bird migration routes.

The final results for each coeval pole pair of all the seven APWP pairs
(Fig.~\ref{fig:2traj}), are given in the sub-folder ``0.result\_tables'', which
is contained in the main ``data'' folder. The results for length and angular
differences are listed starting from the rows for the second and third poles
respectively, simply because one pole can not compose a APWP segment and at
least three poles could constitute an APWP orientation change.

\section{Bivariate Sampling}\label{sec:biv}
For some specific poles of the APWP, e.g., only one paleomagnetic pole makes up
that ``mean'', i.e., $N=1$, or even there is no paleomagnetic pole in that
specific bin (i.e., $N=0$) but an interpolated pole that might be given by
authors at that specific age, the bivariate normal distribution is used to
generate random samples based on its uncertainty ellipse's semi axes and the
major axis' azimuth, then we use the cumulative distributions of Cartesian
coordinates of those random samples to see if the confidence intervals overlap.

However, here the scenario is not a two dimensional (2D) domain, but rather a
spherical surface. Directly simulating random points for an ellipse on a
sphere is a complicated problem~\citep{K82}. An analogue approach is proposed
here as follows. First, random points of a 2D bivariate normal distribution are
generated with NumPy's random sampling routine
``multivariate\_normal''~\citep{W11}. The lengths of the uncertainty ellipse's
semi-major and semi-minor axes are used as about 1.96 standard deviations of the
bivariate normal distribution. The center of the ellipse is located at the
intersection of the equator (0\degree\ latitude) and the prime meridian
(0\degree\ longitude) with its major axis lying equator-ward (blue point cloud in
Fig.~\ref{fig:ellip_sim}). Then according to the actual pole coordinates (red
star in Fig.~\ref{fig:ellip_sim}), an Euler rotation~\citep{G99} (black star and
blue angle arc) can be calculated along the great circle (progressing from blue
to red) from the location (0\degree, 0\degree) to the actual pole location. After
those random points (blue points) are rotated using the same Euler rotation to
the new locations (red point cloud in Fig.~\ref{fig:ellip_sim}), this elliptical
cloud (red point cloud) then is adjusted to its actual azimuth (i.e., the
major-axis azimuth of the pole's uncertainty ellipse; the red dashed line
rotated to the yellow dashed line using the red star as the Euler pole shown in
Fig.~\ref{fig:ellip_sim}).

Note that directly using NumPy's ``random.multivariate\_normal'' or
``random.normal'' routine (2D calculations) and spherical trigonometry to draw
random points for an elliptical uncertainty distorts the point cloud out of a
bivariate normal distribution, especially at high-latitude areas~\cite[see the
examples given by their Figure 7]{P18} and makes the simulation inaccurate.
This analogue approach avoids producing declination and inclination
vectors beforehand and directly generates random pole vectors, which saves the
transformation from declination and inclination to pole and further helps keep
us away from the distortion.

\begin{figure}[!ht]
\includegraphics[width=1.0\linewidth]{../../paper/tex/ComputGeosci/figures/rd.pdf}
\caption[How N$<=$1 uncertainty ellipse is simulated]{Example of modeling random
points for an ellipse uncertainty on the Earth's surface. Sample points (blue)
from a bivariate normal distribution centered at the intersection of the equator
and the prime meridian are rotated to their new locations (red points) together
with the uncertainty ellipse center (i.e.\ the 0\degree\ longitude 0\degree\ latitude
point prior to the rotation) exactly rotated to its actual pole coordinate (red
star), then adjusted to the true orientation (yellow dashed
line).}\label{fig:ellip_sim}\end{figure}

\section{Synchronization}
This algorithm is developed for comparing time-synchronized APWPs. In other
words, the compared APWPs should have the same timestamps. If the number of
their timestamps are different, the unpaired $pole(s)$ would be removed to make
the timestamps the same before the comparison. APWPs with a pole interpolated
for pairing an unpaired pole can be processed by our tool, as we noted earlier,
but it is not recommended for a valid analysis. For example, for paleomagnetic
APWPs, sometimes there are no paleopoles for a given time window ($N=0$);
sometimes a mean pole is an interpolated result.

\subsection{Equally Treated Random Weights}
Assigning equally likely (not necessarily equal in value) random values to
$W_s,W_a,W_l$ is also tested. Three uniformly distributed random numbers with a
given sum 1 are generated for, for example, 10 000 times here, and then are
substituted into the $\mathcal{CPD}$ formula for deriving the seven APWP
pairs' $D_{full},D_{0-100Ma}$ etc.\ to check the possibility that ``one pair is
superior to the other pair'' (Fig.~\ref{fig:rall}).

\begin{figure}[!ht]
\centering
\includegraphics[width=.9\linewidth]{../../paper/tex/ComputGeosci/figures/rAll.png}
\caption[Comparisons of Pairs a-g with random weights involved]{Differences of
$\mathcal{CPD}s$ between \emph{Pair}~\textbf{a}, \emph{Pair}~\textbf{b},
\emph{Pair}~\textbf{c}, \emph{Pair}~\textbf{d}, \emph{Pair}~\textbf{e},
\emph{Pair}~\textbf{f} and \emph{Pair}~\textbf{g}, when 10 000 sets of three
uniformly random weights (with
their sum 1) are applied. If the difference $D$ is positive, the subtrahend pair
ranks higher in similarity, and if it is negative, the minuend pair ranks
higher. The $y$ axis in each upper plot is for the percentage $P$ that the
subtrahend pair owns higher similarity.}\label{fig:rall}
\end{figure}

The full-path results (Fig.~\ref{fig:rall}) again re-verify Order (4) and
the results shown in Fig.~\ref{fig:sspercni}. Although the possibility that
\emph{Pair}~\textbf{d} is more similar than \emph{Pair}~\textbf{e} is not
significant (around 50\%), the possibility that
\emph{Pairs}~\textbf{f},\textbf{g} are more similar than
\emph{Pairs}~\textbf{c},\textbf{d},\textbf{e} is significant (more than 95\%),

All the sub-path results (Fig.~\ref{fig:rall}) are explicable using the results
shown in Fig.~\ref{fig:sspercni}. For example, for 0\textendash100 Ma, both
\emph{Pair}~\textbf{a} and \emph{Pair}~\textbf{b} are assigned values of zero
for all the three metrics $d_s^{0-100Ma}$, $d_l^{0-100Ma}$ and $d_a^{0-100Ma}$
(Fig.~\ref{fig:sspercni},~\subref{fig:sd2ni},~\subref{fig:ld2ni}
and~\subref{fig:ad2ni}), which means they are always undifferentiated.