AMReX-Codes
diff --git a/‎Docs/sphinx_documentation/source/AmrCore.rst
Lines changed: 105 additions & 100 deletions b/‎Docs/sphinx_documentation/source/AmrCore.rst
Lines changed: 105 additions & 100 deletions
diff --git a/‎Docs/sphinx_documentation/source/AsyncIter.rst
Lines changed: 47 additions & 1 deletion b/‎Docs/sphinx_documentation/source/AsyncIter.rst
Lines changed: 47 additions & 1 deletion
@@ -33,107 +33,12 @@
 
    \end{center}
 
-The Advection Equation
-======================
-
-We seek to solve the advection equation on a multi-level, adaptive grid structure:
-
-.. math:: \frac{\partial\phi}{\partial t} = -\nabla\cdot(\phi{\bf U}).
-
-The velocity field is a specified divergence-free (so the flow field is incompressible)
-function of space and time. The initial scalar field is a
-Gaussian profile. To integrate these equations on a given level, we use a simple conservative update,
-
-.. math:: \frac{\phi_{i,\,j}^{n+1}-\phi_{i,\,j}^n}{\Delta t} = \frac{(\phi u)_{i+^1\!/_2,\,j}^{n+^1\!/_2}-(\phi u)_{i-^1\!/_2,\,j}^{n+^1\!/_2}}{\Delta x} + \frac{(\phi v)_{i,\,j+^1\!/_2}^{n+^1\!/_2} - (\phi v)_{i,\,j-^1\!/_2}^{n+^1\!/_2}}{\Delta y},
-
-where the velocities on faces are prescribed functions of space and time, and the scalars on faces
-are computed using a Godunov advection integration scheme. The fluxes in this case are the face-centered,
-time-centered “:math:`\phi u`” and “:math:`\phi v`” terms.
-
-We use a subcycling-in-time approach where finer levels are advanced with smaller
-time steps than coarser levels, and then synchronization is later performed between levels.
-More specifically, the multi-level procedure can most
-easily be thought of as a recursive algorithm in which, to advance level :math:`\ell`,
-:math:`0\le\ell\le\ell_{\rm max}`, the following steps are taken:
-
--  Advance level :math:`\ell` in time by one time step, :math:`\Delta t^{\ell}`, as if it is
-   the only level. If :math:`\ell>0`, obtain boundary data (i.e. fill the level :math:`\ell` ghost cells)
-   using space- and time-interpolated data from the grids at :math:`\ell-1` where appropriate.
-
--  If :math:`\ell<\ell_{\rm max}`
-
-   -  Advance level :math:`(\ell+1)` for :math:`r` time steps with :math:`\Delta t^{\ell+1} = \frac{1}{r}\Delta t^{\ell}`.
-
-   -  Synchronize the data between levels :math:`\ell` and :math:`\ell+1`.
-
-.. raw:: latex
-
-   \begin{center}
-
-.. _fig:subcycling:
-
-.. figure:: ./AmrCore/figs/subcycling.png
-   :width: 4in
-
-   Schematic of subcycling-in-time algorithm.
-
-.. raw:: latex
-
-   \end{center}
-
-Specifically, for a 3-level simulation, depicted graphically in the figure
-showing the :ref:`fig:subcycling` above:
-
-#. Integrate :math:`\ell=0` over :math:`\Delta t`.
-
-#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.
-
-#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
-
-#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
-
-#. Synchronize levels :math:`\ell=1,2`.
-
-#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.
-
-#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
-
-#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
-
-#. Synchronize levels :math:`\ell=1,2`.
-
-#. Synchronize levels :math:`\ell=0,1`.
-
-
-
-For the scalar field, we keep track volume and time-weighted fluxes at coarse-fine interfaces.
-We accumulate area and time-weighted fluxes in :cpp:`FluxRegister` objects, which can be
-thought of as special boundary FABsets associated with coarse-fine interfaces.
-Since the fluxes are area and time-weighted (and sign-weighted, depending on whether they
-come from the coarse or fine level), the flux registers essentially store the extent by
-which the solution does not maintain conservation. Conservation only happens if the
-sum of the (area and time-weighted) fine fluxes equals the coarse flux, which in general
-is not true.
-
-The idea behind the level :math:`\ell/(\ell+1)` synchronization step is to correct for sources of
-mismatch in the composite solution:
-
-#. The data at level :math:`\ell` that underlie the level :math:`\ell+1` data are not synchronized with the level :math:`\ell+1` data.
-   This is simply corrected by overwriting covered coarse cells to be the average of the overlying fine cells.
-
-#. The area and time-weighted fluxes from the level :math:`\ell` faces and the level :math:`\ell+1` faces
-   do not agree at the :math:`\ell/(\ell+1)` interface, resulting in a loss of conservation.
-   The remedy is to modify the solution in the coarse cells immediately next to the coarse-fine interface
-   to account for the mismatch stored in the flux register (computed by taking the coarse-level divergence of the
-   flux register data).
-
-
 .. _ss:amrcore:
 
-AmrCore Source Code
-===================
+AmrCore Source Code: Details
+============================
 
-Here we provide a high-level overview of the source code in ``amrex/Src/AmrCore.``
+Here we provide more information about the source code in ``amrex/Src/AmrCore.``
 
 AmrMesh and AmrCore
 -------------------
@@ -299,16 +204,22 @@ Note that at the coarsest level,
 the interior and domain boundary (which can be periodic or prescribed based on physical considerations)
 need to be filled. At the non-coarsest level, the ghost cells can also be interior or domain,
 but can also be at coarse-fine interfaces away from the domain boundary.
-AMReX_FillPatchUtil.cpp/H contains two primary functions of interest.
+:cpp:`AMReX_FillPatchUtil.cpp/H` contains two primary functions of interest.
 
 #. :cpp:`FillPatchSingleLevel()` fills a :cpp:`MultiFab` and its ghost region at a single level of
    refinement. The routine is flexible enough to interpolate in time between two MultiFabs
    associated with different times.
 
-#. :cpp:`FillPatchTwoLevels()` fills a MultiFab and its ghost region at a single level of
+#. :cpp:`FillPatchTwoLevels()` fills a :cpp:`MultiFab` and its ghost region at a single level of
    refinement, assuming there is an underlying coarse level. This routine is flexible enough to interpolate
    the coarser level in time first using :cpp:`FillPatchSingleLevel()`.
 
+Note that :cpp:`FillPatchSingleLevel()` and :cpp:`FillPatchTwoLevels()` call the
+single-level routines :cpp:`MultiFab::FillBoundary` and :cpp:`FillDomainBoundary()`
+to fill interior, periodic, and physical boundary ghost cells.  In principle, you can
+write a single-level application that calls :cpp:`FillPatchSingleLevel()` instead
+of using :cpp:`MultiFab::FillBoundary` and :cpp:`FillDomainBoundary()`.
+   
 A :cpp:`FillPatchUtil` uses an :cpp:`Interpolator`. This is largely hidden from application codes.
 AMReX_Interpolater.cpp/H contains the virtual base class :cpp:`Interpolater`, which provides
 an interface for coarse-to-fine spatial interpolation operators. The fillpatch routines described
@@ -390,6 +301,100 @@ the class :cpp:`ParGDBBase` (in ``amrex/Src/Particle/AMReX_ParGDB``).
 Example: Advection_AmrCore
 ==========================
 
+The Advection Equation
+----------------------
+
+We seek to solve the advection equation on a multi-level, adaptive grid structure:
+
+.. math:: \frac{\partial\phi}{\partial t} = -\nabla\cdot(\phi{\bf U}).
+
+The velocity field is a specified divergence-free (so the flow field is incompressible)
+function of space and time. The initial scalar field is a
+Gaussian profile. To integrate these equations on a given level, we use a simple conservative update,
+
+.. math:: \frac{\phi_{i,\,j}^{n+1}-\phi_{i,\,j}^n}{\Delta t} = \frac{(\phi u)_{i+^1\!/_2,\,j}^{n+^1\!/_2}-(\phi u)_{i-^1\!/_2,\,j}^{n+^1\!/_2}}{\Delta x} + \frac{(\phi v)_{i,\,j+^1\!/_2}^{n+^1\!/_2} - (\phi v)_{i,\,j-^1\!/_2}^{n+^1\!/_2}}{\Delta y},
+
+where the velocities on faces are prescribed functions of space and time, and the scalars on faces
+are computed using a Godunov advection integration scheme. The fluxes in this case are the face-centered,
+time-centered “:math:`\phi u`” and “:math:`\phi v`” terms.
+
+We use a subcycling-in-time approach where finer levels are advanced with smaller
+time steps than coarser levels, and then synchronization is later performed between levels.
+More specifically, the multi-level procedure can most
+easily be thought of as a recursive algorithm in which, to advance level :math:`\ell`,
+:math:`0\le\ell\le\ell_{\rm max}`, the following steps are taken:
+
+-  Advance level :math:`\ell` in time by one time step, :math:`\Delta t^{\ell}`, as if it is
+   the only level. If :math:`\ell>0`, obtain boundary data (i.e. fill the level :math:`\ell` ghost cells)
+   using space- and time-interpolated data from the grids at :math:`\ell-1` where appropriate.
+
+-  If :math:`\ell<\ell_{\rm max}`
+
+   -  Advance level :math:`(\ell+1)` for :math:`r` time steps with :math:`\Delta t^{\ell+1} = \frac{1}{r}\Delta t^{\ell}`.
+
+   -  Synchronize the data between levels :math:`\ell` and :math:`\ell+1`.
+
+.. raw:: latex
+
+   \begin{center}
+
+.. _fig:subcycling:
+
+.. figure:: ./AmrCore/figs/subcycling.png
+   :width: 4in
+
+   Schematic of subcycling-in-time algorithm.
+
+.. raw:: latex
+
+   \end{center}
+
+Specifically, for a 3-level simulation, depicted graphically in the figure
+showing the :ref:`fig:subcycling` above:
+
+#. Integrate :math:`\ell=0` over :math:`\Delta t`.
+
+#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.
+
+#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
+
+#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
+
+#. Synchronize levels :math:`\ell=1,2`.
+
+#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.
+
+#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
+
+#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
+
+#. Synchronize levels :math:`\ell=1,2`.
+
+#. Synchronize levels :math:`\ell=0,1`.
+
+
+
+For the scalar field, we keep track volume and time-weighted fluxes at coarse-fine interfaces.
+We accumulate area and time-weighted fluxes in :cpp:`FluxRegister` objects, which can be
+thought of as special boundary FABsets associated with coarse-fine interfaces.
+Since the fluxes are area and time-weighted (and sign-weighted, depending on whether they
+come from the coarse or fine level), the flux registers essentially store the extent by
+which the solution does not maintain conservation. Conservation only happens if the
+sum of the (area and time-weighted) fine fluxes equals the coarse flux, which in general
+is not true.
+
+The idea behind the level :math:`\ell/(\ell+1)` synchronization step is to correct for sources of
+mismatch in the composite solution:
+
+#. The data at level :math:`\ell` that underlie the level :math:`\ell+1` data are not synchronized with the level :math:`\ell+1` data.
+   This is simply corrected by overwriting covered coarse cells to be the average of the overlying fine cells.
+
+#. The area and time-weighted fluxes from the level :math:`\ell` faces and the level :math:`\ell+1` faces
+   do not agree at the :math:`\ell/(\ell+1)` interface, resulting in a loss of conservation.
+   The remedy is to modify the solution in the coarse cells immediately next to the coarse-fine interface
+   to account for the mismatch stored in the flux register (computed by taking the coarse-level divergence of the
+   flux register data).
+
 Code Structure
 --------------
 
 
@@ -40,10 +40,56 @@ Initially, an object of RGIter (i.e. rgi) is instantiated, taking vectors of Fil
 Based on these arguments, a task dependency graph spanning two AMR levels will be established. 
 Next, isValid() asks the runtime system for FABs that have received all dependent data.
 When there is such a FAB, the computations in the loop body can execute on the FAB's data.
-When the computations on a FAB finishes, the ++ operator is called.
+When the computations on a FAB finish, the ++ operator is called.
 We overload this operator to traverse to the next runnable FAB.
 
 Note: RGIter also supports data tiling.
 Specifically, we overload the ++ operator so that it will traverse data tiles in a FAB before it goes to next FAB if the tiling flag in the FAB is enabled.
 Instead of applying the computations in the loop body on the entire FAB, it executes them on a single tile at a time.
 
+
+Generated Task Graph Code
+=========================
+
+The real input to the runtime system is an AMR program containing task dependency graphs (or task graph for short).
+Thus, the code written with the above asynchronous iterators will be transformed into a task graph form.
+The definition of a task dependency graph is as follows.
+Each task of a graph performs some computations on an FArrayBox (FAB).
+Tasks are connected with each other via edges, denoting the dependency on data.
+A task can be executed when all data dependencies have been satisfied.
+The code snippet below queries runnable tasks of a task dependency graph named regionGraph.
+Note that each task dependency graph is more or less a wrapper of a MultiFab.
+In this example, a task of regionGraph computes the body code of the while loop to update the associated FAB.
+Each task of this graph receives data arrived at the runtime system and injects the data into the associated FAB.
+After updating FAB, it lets the runtime know about the change.
+The runtime system uses AMR domain knowledge to establish data dependencies among tasks, and thus it can answer which tasks are runnable and how to update neighbor FABs when a current FAB changes.
+
+.. highlight:: c++
+
+::
+
+    while(!regionGraph->isGraphEmpty())
+    {
+        f = regionGraph->getAnyFireableRegion();
+	multifabCopyPull(..., f, ...); //inject arrived dependent data into the fab, if any
+        syncWorkerThreads();
+	...//compute on the fab f of multifab associated with coarseRegionGraph
+        syncWorkerThreads();
+        multifabCopyPush(..., f, ...); //tell the runtime that data of Fab f changed
+        regionGraph->finalizeRegion(f)
+    }
+
+The process of learning the domain knowledge is as follows.
+At the beginning of the program, the runtime extracts the metadata needed for establishing data dependencies among tasks of the same graph or between two different graphs.
+Every time the AMR grid hierarchy changes (i.e. when a few or all AMR levels regrid), the runtime re-extracts the metadata to correct the task dependency graphs.
+Once the metadata extraction completes, the runtime system invokes the computation on AMR levels (e.g., timeStep, initTimeStep, and postTimeStep).
+
+Known Limitations
+=================
+
+To realize enough task parallelism, the runtime system constructs a task dependency graph for the whole coarse time step and executes it asynchronously to the completion of the step.
+As a result, any request to regrid an AMR level must be foreseen before the execution of a coarse time step.
+If there is a regridding request during the graph execution, the runtime system simply ignores it.
+In the future we may relax this constraint in the programming model.
+However, such a support would come at a significant performance cost due to the required checkpointing and rollback activities.
+