Skip to content

Commit

Permalink
Merge branch 'development'
Browse files Browse the repository at this point in the history
  • Loading branch information
WeiqunZhang committed Feb 1, 2019
2 parents d56f966 + d202588 commit 4fe38fc
Show file tree
Hide file tree
Showing 301 changed files with 15,389 additions and 10,485 deletions.
205 changes: 105 additions & 100 deletions Docs/sphinx_documentation/source/AmrCore.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,107 +33,12 @@

\end{center}

The Advection Equation
======================

We seek to solve the advection equation on a multi-level, adaptive grid structure:

.. math:: \frac{\partial\phi}{\partial t} = -\nabla\cdot(\phi{\bf U}).

The velocity field is a specified divergence-free (so the flow field is incompressible)
function of space and time. The initial scalar field is a
Gaussian profile. To integrate these equations on a given level, we use a simple conservative update,

.. math:: \frac{\phi_{i,\,j}^{n+1}-\phi_{i,\,j}^n}{\Delta t} = \frac{(\phi u)_{i+^1\!/_2,\,j}^{n+^1\!/_2}-(\phi u)_{i-^1\!/_2,\,j}^{n+^1\!/_2}}{\Delta x} + \frac{(\phi v)_{i,\,j+^1\!/_2}^{n+^1\!/_2} - (\phi v)_{i,\,j-^1\!/_2}^{n+^1\!/_2}}{\Delta y},

where the velocities on faces are prescribed functions of space and time, and the scalars on faces
are computed using a Godunov advection integration scheme. The fluxes in this case are the face-centered,
time-centered “:math:`\phi u`” and “:math:`\phi v`” terms.

We use a subcycling-in-time approach where finer levels are advanced with smaller
time steps than coarser levels, and then synchronization is later performed between levels.
More specifically, the multi-level procedure can most
easily be thought of as a recursive algorithm in which, to advance level :math:`\ell`,
:math:`0\le\ell\le\ell_{\rm max}`, the following steps are taken:

- Advance level :math:`\ell` in time by one time step, :math:`\Delta t^{\ell}`, as if it is
the only level. If :math:`\ell>0`, obtain boundary data (i.e. fill the level :math:`\ell` ghost cells)
using space- and time-interpolated data from the grids at :math:`\ell-1` where appropriate.

- If :math:`\ell<\ell_{\rm max}`

- Advance level :math:`(\ell+1)` for :math:`r` time steps with :math:`\Delta t^{\ell+1} = \frac{1}{r}\Delta t^{\ell}`.

- Synchronize the data between levels :math:`\ell` and :math:`\ell+1`.

.. raw:: latex

\begin{center}

.. _fig:subcycling:

.. figure:: ./AmrCore/figs/subcycling.png
:width: 4in

Schematic of subcycling-in-time algorithm.

.. raw:: latex

\end{center}

Specifically, for a 3-level simulation, depicted graphically in the figure
showing the :ref:`fig:subcycling` above:

#. Integrate :math:`\ell=0` over :math:`\Delta t`.

#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.

#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.

#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.

#. Synchronize levels :math:`\ell=1,2`.

#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.

#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.

#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.

#. Synchronize levels :math:`\ell=1,2`.

#. Synchronize levels :math:`\ell=0,1`.



For the scalar field, we keep track volume and time-weighted fluxes at coarse-fine interfaces.
We accumulate area and time-weighted fluxes in :cpp:`FluxRegister` objects, which can be
thought of as special boundary FABsets associated with coarse-fine interfaces.
Since the fluxes are area and time-weighted (and sign-weighted, depending on whether they
come from the coarse or fine level), the flux registers essentially store the extent by
which the solution does not maintain conservation. Conservation only happens if the
sum of the (area and time-weighted) fine fluxes equals the coarse flux, which in general
is not true.

The idea behind the level :math:`\ell/(\ell+1)` synchronization step is to correct for sources of
mismatch in the composite solution:

#. The data at level :math:`\ell` that underlie the level :math:`\ell+1` data are not synchronized with the level :math:`\ell+1` data.
This is simply corrected by overwriting covered coarse cells to be the average of the overlying fine cells.

#. The area and time-weighted fluxes from the level :math:`\ell` faces and the level :math:`\ell+1` faces
do not agree at the :math:`\ell/(\ell+1)` interface, resulting in a loss of conservation.
The remedy is to modify the solution in the coarse cells immediately next to the coarse-fine interface
to account for the mismatch stored in the flux register (computed by taking the coarse-level divergence of the
flux register data).


.. _ss:amrcore:

AmrCore Source Code
===================
AmrCore Source Code: Details
============================

Here we provide a high-level overview of the source code in ``amrex/Src/AmrCore.``
Here we provide more information about the source code in ``amrex/Src/AmrCore.``

AmrMesh and AmrCore
-------------------
Expand Down Expand Up @@ -299,16 +204,22 @@ Note that at the coarsest level,
the interior and domain boundary (which can be periodic or prescribed based on physical considerations)
need to be filled. At the non-coarsest level, the ghost cells can also be interior or domain,
but can also be at coarse-fine interfaces away from the domain boundary.
AMReX_FillPatchUtil.cpp/H contains two primary functions of interest.
:cpp:`AMReX_FillPatchUtil.cpp/H` contains two primary functions of interest.

#. :cpp:`FillPatchSingleLevel()` fills a :cpp:`MultiFab` and its ghost region at a single level of
refinement. The routine is flexible enough to interpolate in time between two MultiFabs
associated with different times.

#. :cpp:`FillPatchTwoLevels()` fills a MultiFab and its ghost region at a single level of
#. :cpp:`FillPatchTwoLevels()` fills a :cpp:`MultiFab` and its ghost region at a single level of
refinement, assuming there is an underlying coarse level. This routine is flexible enough to interpolate
the coarser level in time first using :cpp:`FillPatchSingleLevel()`.

Note that :cpp:`FillPatchSingleLevel()` and :cpp:`FillPatchTwoLevels()` call the
single-level routines :cpp:`MultiFab::FillBoundary` and :cpp:`FillDomainBoundary()`
to fill interior, periodic, and physical boundary ghost cells. In principle, you can
write a single-level application that calls :cpp:`FillPatchSingleLevel()` instead
of using :cpp:`MultiFab::FillBoundary` and :cpp:`FillDomainBoundary()`.

A :cpp:`FillPatchUtil` uses an :cpp:`Interpolator`. This is largely hidden from application codes.
AMReX_Interpolater.cpp/H contains the virtual base class :cpp:`Interpolater`, which provides
an interface for coarse-to-fine spatial interpolation operators. The fillpatch routines described
Expand Down Expand Up @@ -390,6 +301,100 @@ the class :cpp:`ParGDBBase` (in ``amrex/Src/Particle/AMReX_ParGDB``).
Example: Advection_AmrCore
==========================

The Advection Equation
----------------------

We seek to solve the advection equation on a multi-level, adaptive grid structure:

.. math:: \frac{\partial\phi}{\partial t} = -\nabla\cdot(\phi{\bf U}).

The velocity field is a specified divergence-free (so the flow field is incompressible)
function of space and time. The initial scalar field is a
Gaussian profile. To integrate these equations on a given level, we use a simple conservative update,

.. math:: \frac{\phi_{i,\,j}^{n+1}-\phi_{i,\,j}^n}{\Delta t} = \frac{(\phi u)_{i+^1\!/_2,\,j}^{n+^1\!/_2}-(\phi u)_{i-^1\!/_2,\,j}^{n+^1\!/_2}}{\Delta x} + \frac{(\phi v)_{i,\,j+^1\!/_2}^{n+^1\!/_2} - (\phi v)_{i,\,j-^1\!/_2}^{n+^1\!/_2}}{\Delta y},

where the velocities on faces are prescribed functions of space and time, and the scalars on faces
are computed using a Godunov advection integration scheme. The fluxes in this case are the face-centered,
time-centered “:math:`\phi u`” and “:math:`\phi v`” terms.

We use a subcycling-in-time approach where finer levels are advanced with smaller
time steps than coarser levels, and then synchronization is later performed between levels.
More specifically, the multi-level procedure can most
easily be thought of as a recursive algorithm in which, to advance level :math:`\ell`,
:math:`0\le\ell\le\ell_{\rm max}`, the following steps are taken:

- Advance level :math:`\ell` in time by one time step, :math:`\Delta t^{\ell}`, as if it is
the only level. If :math:`\ell>0`, obtain boundary data (i.e. fill the level :math:`\ell` ghost cells)
using space- and time-interpolated data from the grids at :math:`\ell-1` where appropriate.

- If :math:`\ell<\ell_{\rm max}`

- Advance level :math:`(\ell+1)` for :math:`r` time steps with :math:`\Delta t^{\ell+1} = \frac{1}{r}\Delta t^{\ell}`.

- Synchronize the data between levels :math:`\ell` and :math:`\ell+1`.

.. raw:: latex

\begin{center}

.. _fig:subcycling:

.. figure:: ./AmrCore/figs/subcycling.png
:width: 4in

Schematic of subcycling-in-time algorithm.

.. raw:: latex

\end{center}

Specifically, for a 3-level simulation, depicted graphically in the figure
showing the :ref:`fig:subcycling` above:

#. Integrate :math:`\ell=0` over :math:`\Delta t`.

#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.

#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.

#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.

#. Synchronize levels :math:`\ell=1,2`.

#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.

#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.

#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.

#. Synchronize levels :math:`\ell=1,2`.

#. Synchronize levels :math:`\ell=0,1`.



For the scalar field, we keep track volume and time-weighted fluxes at coarse-fine interfaces.
We accumulate area and time-weighted fluxes in :cpp:`FluxRegister` objects, which can be
thought of as special boundary FABsets associated with coarse-fine interfaces.
Since the fluxes are area and time-weighted (and sign-weighted, depending on whether they
come from the coarse or fine level), the flux registers essentially store the extent by
which the solution does not maintain conservation. Conservation only happens if the
sum of the (area and time-weighted) fine fluxes equals the coarse flux, which in general
is not true.

The idea behind the level :math:`\ell/(\ell+1)` synchronization step is to correct for sources of
mismatch in the composite solution:

#. The data at level :math:`\ell` that underlie the level :math:`\ell+1` data are not synchronized with the level :math:`\ell+1` data.
This is simply corrected by overwriting covered coarse cells to be the average of the overlying fine cells.

#. The area and time-weighted fluxes from the level :math:`\ell` faces and the level :math:`\ell+1` faces
do not agree at the :math:`\ell/(\ell+1)` interface, resulting in a loss of conservation.
The remedy is to modify the solution in the coarse cells immediately next to the coarse-fine interface
to account for the mismatch stored in the flux register (computed by taking the coarse-level divergence of the
flux register data).

Code Structure
--------------

Expand Down
48 changes: 47 additions & 1 deletion Docs/sphinx_documentation/source/AsyncIter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,56 @@ Initially, an object of RGIter (i.e. rgi) is instantiated, taking vectors of Fil
Based on these arguments, a task dependency graph spanning two AMR levels will be established.
Next, isValid() asks the runtime system for FABs that have received all dependent data.
When there is such a FAB, the computations in the loop body can execute on the FAB's data.
When the computations on a FAB finishes, the ++ operator is called.
When the computations on a FAB finish, the ++ operator is called.
We overload this operator to traverse to the next runnable FAB.

Note: RGIter also supports data tiling.
Specifically, we overload the ++ operator so that it will traverse data tiles in a FAB before it goes to next FAB if the tiling flag in the FAB is enabled.
Instead of applying the computations in the loop body on the entire FAB, it executes them on a single tile at a time.


Generated Task Graph Code
=========================

The real input to the runtime system is an AMR program containing task dependency graphs (or task graph for short).
Thus, the code written with the above asynchronous iterators will be transformed into a task graph form.
The definition of a task dependency graph is as follows.
Each task of a graph performs some computations on an FArrayBox (FAB).
Tasks are connected with each other via edges, denoting the dependency on data.
A task can be executed when all data dependencies have been satisfied.
The code snippet below queries runnable tasks of a task dependency graph named regionGraph.
Note that each task dependency graph is more or less a wrapper of a MultiFab.
In this example, a task of regionGraph computes the body code of the while loop to update the associated FAB.
Each task of this graph receives data arrived at the runtime system and injects the data into the associated FAB.
After updating FAB, it lets the runtime know about the change.
The runtime system uses AMR domain knowledge to establish data dependencies among tasks, and thus it can answer which tasks are runnable and how to update neighbor FABs when a current FAB changes.

.. highlight:: c++

::

while(!regionGraph->isGraphEmpty())
{
f = regionGraph->getAnyFireableRegion();
multifabCopyPull(..., f, ...); //inject arrived dependent data into the fab, if any
syncWorkerThreads();
...//compute on the fab f of multifab associated with coarseRegionGraph
syncWorkerThreads();
multifabCopyPush(..., f, ...); //tell the runtime that data of Fab f changed
regionGraph->finalizeRegion(f)
}

The process of learning the domain knowledge is as follows.
At the beginning of the program, the runtime extracts the metadata needed for establishing data dependencies among tasks of the same graph or between two different graphs.
Every time the AMR grid hierarchy changes (i.e. when a few or all AMR levels regrid), the runtime re-extracts the metadata to correct the task dependency graphs.
Once the metadata extraction completes, the runtime system invokes the computation on AMR levels (e.g., timeStep, initTimeStep, and postTimeStep).

Known Limitations
=================

To realize enough task parallelism, the runtime system constructs a task dependency graph for the whole coarse time step and executes it asynchronously to the completion of the step.
As a result, any request to regrid an AMR level must be foreseen before the execution of a coarse time step.
If there is a regridding request during the graph execution, the runtime system simply ignores it.
In the future we may relax this constraint in the programming model.
However, such a support would come at a significant performance cost due to the required checkpointing and rollback activities.

Loading

0 comments on commit 4fe38fc

Please sign in to comment.