Skip to content

Commit 4fe38fc

Browse files
committed
Merge branch 'development'
2 parents d56f966 + d202588 commit 4fe38fc

File tree

301 files changed

+15389
-10485
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

301 files changed

+15389
-10485
lines changed

Docs/sphinx_documentation/source/AmrCore.rst

Lines changed: 105 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -33,107 +33,12 @@
3333

3434
\end{center}
3535

36-
The Advection Equation
37-
======================
38-
39-
We seek to solve the advection equation on a multi-level, adaptive grid structure:
40-
41-
.. math:: \frac{\partial\phi}{\partial t} = -\nabla\cdot(\phi{\bf U}).
42-
43-
The velocity field is a specified divergence-free (so the flow field is incompressible)
44-
function of space and time. The initial scalar field is a
45-
Gaussian profile. To integrate these equations on a given level, we use a simple conservative update,
46-
47-
.. math:: \frac{\phi_{i,\,j}^{n+1}-\phi_{i,\,j}^n}{\Delta t} = \frac{(\phi u)_{i+^1\!/_2,\,j}^{n+^1\!/_2}-(\phi u)_{i-^1\!/_2,\,j}^{n+^1\!/_2}}{\Delta x} + \frac{(\phi v)_{i,\,j+^1\!/_2}^{n+^1\!/_2} - (\phi v)_{i,\,j-^1\!/_2}^{n+^1\!/_2}}{\Delta y},
48-
49-
where the velocities on faces are prescribed functions of space and time, and the scalars on faces
50-
are computed using a Godunov advection integration scheme. The fluxes in this case are the face-centered,
51-
time-centered “:math:`\phi u`” and “:math:`\phi v`” terms.
52-
53-
We use a subcycling-in-time approach where finer levels are advanced with smaller
54-
time steps than coarser levels, and then synchronization is later performed between levels.
55-
More specifically, the multi-level procedure can most
56-
easily be thought of as a recursive algorithm in which, to advance level :math:`\ell`,
57-
:math:`0\le\ell\le\ell_{\rm max}`, the following steps are taken:
58-
59-
- Advance level :math:`\ell` in time by one time step, :math:`\Delta t^{\ell}`, as if it is
60-
the only level. If :math:`\ell>0`, obtain boundary data (i.e. fill the level :math:`\ell` ghost cells)
61-
using space- and time-interpolated data from the grids at :math:`\ell-1` where appropriate.
62-
63-
- If :math:`\ell<\ell_{\rm max}`
64-
65-
- Advance level :math:`(\ell+1)` for :math:`r` time steps with :math:`\Delta t^{\ell+1} = \frac{1}{r}\Delta t^{\ell}`.
66-
67-
- Synchronize the data between levels :math:`\ell` and :math:`\ell+1`.
68-
69-
.. raw:: latex
70-
71-
\begin{center}
72-
73-
.. _fig:subcycling:
74-
75-
.. figure:: ./AmrCore/figs/subcycling.png
76-
:width: 4in
77-
78-
Schematic of subcycling-in-time algorithm.
79-
80-
.. raw:: latex
81-
82-
\end{center}
83-
84-
Specifically, for a 3-level simulation, depicted graphically in the figure
85-
showing the :ref:`fig:subcycling` above:
86-
87-
#. Integrate :math:`\ell=0` over :math:`\Delta t`.
88-
89-
#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.
90-
91-
#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
92-
93-
#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
94-
95-
#. Synchronize levels :math:`\ell=1,2`.
96-
97-
#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.
98-
99-
#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
100-
101-
#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
102-
103-
#. Synchronize levels :math:`\ell=1,2`.
104-
105-
#. Synchronize levels :math:`\ell=0,1`.
106-
107-
108-
109-
For the scalar field, we keep track volume and time-weighted fluxes at coarse-fine interfaces.
110-
We accumulate area and time-weighted fluxes in :cpp:`FluxRegister` objects, which can be
111-
thought of as special boundary FABsets associated with coarse-fine interfaces.
112-
Since the fluxes are area and time-weighted (and sign-weighted, depending on whether they
113-
come from the coarse or fine level), the flux registers essentially store the extent by
114-
which the solution does not maintain conservation. Conservation only happens if the
115-
sum of the (area and time-weighted) fine fluxes equals the coarse flux, which in general
116-
is not true.
117-
118-
The idea behind the level :math:`\ell/(\ell+1)` synchronization step is to correct for sources of
119-
mismatch in the composite solution:
120-
121-
#. The data at level :math:`\ell` that underlie the level :math:`\ell+1` data are not synchronized with the level :math:`\ell+1` data.
122-
This is simply corrected by overwriting covered coarse cells to be the average of the overlying fine cells.
123-
124-
#. The area and time-weighted fluxes from the level :math:`\ell` faces and the level :math:`\ell+1` faces
125-
do not agree at the :math:`\ell/(\ell+1)` interface, resulting in a loss of conservation.
126-
The remedy is to modify the solution in the coarse cells immediately next to the coarse-fine interface
127-
to account for the mismatch stored in the flux register (computed by taking the coarse-level divergence of the
128-
flux register data).
129-
130-
13136
.. _ss:amrcore:
13237

133-
AmrCore Source Code
134-
===================
38+
AmrCore Source Code: Details
39+
============================
13540

136-
Here we provide a high-level overview of the source code in ``amrex/Src/AmrCore.``
41+
Here we provide more information about the source code in ``amrex/Src/AmrCore.``
13742

13843
AmrMesh and AmrCore
13944
-------------------
@@ -299,16 +204,22 @@ Note that at the coarsest level,
299204
the interior and domain boundary (which can be periodic or prescribed based on physical considerations)
300205
need to be filled. At the non-coarsest level, the ghost cells can also be interior or domain,
301206
but can also be at coarse-fine interfaces away from the domain boundary.
302-
AMReX_FillPatchUtil.cpp/H contains two primary functions of interest.
207+
:cpp:`AMReX_FillPatchUtil.cpp/H` contains two primary functions of interest.
303208

304209
#. :cpp:`FillPatchSingleLevel()` fills a :cpp:`MultiFab` and its ghost region at a single level of
305210
refinement. The routine is flexible enough to interpolate in time between two MultiFabs
306211
associated with different times.
307212

308-
#. :cpp:`FillPatchTwoLevels()` fills a MultiFab and its ghost region at a single level of
213+
#. :cpp:`FillPatchTwoLevels()` fills a :cpp:`MultiFab` and its ghost region at a single level of
309214
refinement, assuming there is an underlying coarse level. This routine is flexible enough to interpolate
310215
the coarser level in time first using :cpp:`FillPatchSingleLevel()`.
311216

217+
Note that :cpp:`FillPatchSingleLevel()` and :cpp:`FillPatchTwoLevels()` call the
218+
single-level routines :cpp:`MultiFab::FillBoundary` and :cpp:`FillDomainBoundary()`
219+
to fill interior, periodic, and physical boundary ghost cells. In principle, you can
220+
write a single-level application that calls :cpp:`FillPatchSingleLevel()` instead
221+
of using :cpp:`MultiFab::FillBoundary` and :cpp:`FillDomainBoundary()`.
222+
312223
A :cpp:`FillPatchUtil` uses an :cpp:`Interpolator`. This is largely hidden from application codes.
313224
AMReX_Interpolater.cpp/H contains the virtual base class :cpp:`Interpolater`, which provides
314225
an interface for coarse-to-fine spatial interpolation operators. The fillpatch routines described
@@ -390,6 +301,100 @@ the class :cpp:`ParGDBBase` (in ``amrex/Src/Particle/AMReX_ParGDB``).
390301
Example: Advection_AmrCore
391302
==========================
392303

304+
The Advection Equation
305+
----------------------
306+
307+
We seek to solve the advection equation on a multi-level, adaptive grid structure:
308+
309+
.. math:: \frac{\partial\phi}{\partial t} = -\nabla\cdot(\phi{\bf U}).
310+
311+
The velocity field is a specified divergence-free (so the flow field is incompressible)
312+
function of space and time. The initial scalar field is a
313+
Gaussian profile. To integrate these equations on a given level, we use a simple conservative update,
314+
315+
.. math:: \frac{\phi_{i,\,j}^{n+1}-\phi_{i,\,j}^n}{\Delta t} = \frac{(\phi u)_{i+^1\!/_2,\,j}^{n+^1\!/_2}-(\phi u)_{i-^1\!/_2,\,j}^{n+^1\!/_2}}{\Delta x} + \frac{(\phi v)_{i,\,j+^1\!/_2}^{n+^1\!/_2} - (\phi v)_{i,\,j-^1\!/_2}^{n+^1\!/_2}}{\Delta y},
316+
317+
where the velocities on faces are prescribed functions of space and time, and the scalars on faces
318+
are computed using a Godunov advection integration scheme. The fluxes in this case are the face-centered,
319+
time-centered “:math:`\phi u`” and “:math:`\phi v`” terms.
320+
321+
We use a subcycling-in-time approach where finer levels are advanced with smaller
322+
time steps than coarser levels, and then synchronization is later performed between levels.
323+
More specifically, the multi-level procedure can most
324+
easily be thought of as a recursive algorithm in which, to advance level :math:`\ell`,
325+
:math:`0\le\ell\le\ell_{\rm max}`, the following steps are taken:
326+
327+
- Advance level :math:`\ell` in time by one time step, :math:`\Delta t^{\ell}`, as if it is
328+
the only level. If :math:`\ell>0`, obtain boundary data (i.e. fill the level :math:`\ell` ghost cells)
329+
using space- and time-interpolated data from the grids at :math:`\ell-1` where appropriate.
330+
331+
- If :math:`\ell<\ell_{\rm max}`
332+
333+
- Advance level :math:`(\ell+1)` for :math:`r` time steps with :math:`\Delta t^{\ell+1} = \frac{1}{r}\Delta t^{\ell}`.
334+
335+
- Synchronize the data between levels :math:`\ell` and :math:`\ell+1`.
336+
337+
.. raw:: latex
338+
339+
\begin{center}
340+
341+
.. _fig:subcycling:
342+
343+
.. figure:: ./AmrCore/figs/subcycling.png
344+
:width: 4in
345+
346+
Schematic of subcycling-in-time algorithm.
347+
348+
.. raw:: latex
349+
350+
\end{center}
351+
352+
Specifically, for a 3-level simulation, depicted graphically in the figure
353+
showing the :ref:`fig:subcycling` above:
354+
355+
#. Integrate :math:`\ell=0` over :math:`\Delta t`.
356+
357+
#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.
358+
359+
#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
360+
361+
#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
362+
363+
#. Synchronize levels :math:`\ell=1,2`.
364+
365+
#. Integrate :math:`\ell=1` over :math:`\Delta t/2`.
366+
367+
#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
368+
369+
#. Integrate :math:`\ell=2` over :math:`\Delta t/4`.
370+
371+
#. Synchronize levels :math:`\ell=1,2`.
372+
373+
#. Synchronize levels :math:`\ell=0,1`.
374+
375+
376+
377+
For the scalar field, we keep track volume and time-weighted fluxes at coarse-fine interfaces.
378+
We accumulate area and time-weighted fluxes in :cpp:`FluxRegister` objects, which can be
379+
thought of as special boundary FABsets associated with coarse-fine interfaces.
380+
Since the fluxes are area and time-weighted (and sign-weighted, depending on whether they
381+
come from the coarse or fine level), the flux registers essentially store the extent by
382+
which the solution does not maintain conservation. Conservation only happens if the
383+
sum of the (area and time-weighted) fine fluxes equals the coarse flux, which in general
384+
is not true.
385+
386+
The idea behind the level :math:`\ell/(\ell+1)` synchronization step is to correct for sources of
387+
mismatch in the composite solution:
388+
389+
#. The data at level :math:`\ell` that underlie the level :math:`\ell+1` data are not synchronized with the level :math:`\ell+1` data.
390+
This is simply corrected by overwriting covered coarse cells to be the average of the overlying fine cells.
391+
392+
#. The area and time-weighted fluxes from the level :math:`\ell` faces and the level :math:`\ell+1` faces
393+
do not agree at the :math:`\ell/(\ell+1)` interface, resulting in a loss of conservation.
394+
The remedy is to modify the solution in the coarse cells immediately next to the coarse-fine interface
395+
to account for the mismatch stored in the flux register (computed by taking the coarse-level divergence of the
396+
flux register data).
397+
393398
Code Structure
394399
--------------
395400

Docs/sphinx_documentation/source/AsyncIter.rst

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,56 @@ Initially, an object of RGIter (i.e. rgi) is instantiated, taking vectors of Fil
4040
Based on these arguments, a task dependency graph spanning two AMR levels will be established.
4141
Next, isValid() asks the runtime system for FABs that have received all dependent data.
4242
When there is such a FAB, the computations in the loop body can execute on the FAB's data.
43-
When the computations on a FAB finishes, the ++ operator is called.
43+
When the computations on a FAB finish, the ++ operator is called.
4444
We overload this operator to traverse to the next runnable FAB.
4545

4646
Note: RGIter also supports data tiling.
4747
Specifically, we overload the ++ operator so that it will traverse data tiles in a FAB before it goes to next FAB if the tiling flag in the FAB is enabled.
4848
Instead of applying the computations in the loop body on the entire FAB, it executes them on a single tile at a time.
4949

50+
51+
Generated Task Graph Code
52+
=========================
53+
54+
The real input to the runtime system is an AMR program containing task dependency graphs (or task graph for short).
55+
Thus, the code written with the above asynchronous iterators will be transformed into a task graph form.
56+
The definition of a task dependency graph is as follows.
57+
Each task of a graph performs some computations on an FArrayBox (FAB).
58+
Tasks are connected with each other via edges, denoting the dependency on data.
59+
A task can be executed when all data dependencies have been satisfied.
60+
The code snippet below queries runnable tasks of a task dependency graph named regionGraph.
61+
Note that each task dependency graph is more or less a wrapper of a MultiFab.
62+
In this example, a task of regionGraph computes the body code of the while loop to update the associated FAB.
63+
Each task of this graph receives data arrived at the runtime system and injects the data into the associated FAB.
64+
After updating FAB, it lets the runtime know about the change.
65+
The runtime system uses AMR domain knowledge to establish data dependencies among tasks, and thus it can answer which tasks are runnable and how to update neighbor FABs when a current FAB changes.
66+
67+
.. highlight:: c++
68+
69+
::
70+
71+
while(!regionGraph->isGraphEmpty())
72+
{
73+
f = regionGraph->getAnyFireableRegion();
74+
multifabCopyPull(..., f, ...); //inject arrived dependent data into the fab, if any
75+
syncWorkerThreads();
76+
...//compute on the fab f of multifab associated with coarseRegionGraph
77+
syncWorkerThreads();
78+
multifabCopyPush(..., f, ...); //tell the runtime that data of Fab f changed
79+
regionGraph->finalizeRegion(f)
80+
}
81+
82+
The process of learning the domain knowledge is as follows.
83+
At the beginning of the program, the runtime extracts the metadata needed for establishing data dependencies among tasks of the same graph or between two different graphs.
84+
Every time the AMR grid hierarchy changes (i.e. when a few or all AMR levels regrid), the runtime re-extracts the metadata to correct the task dependency graphs.
85+
Once the metadata extraction completes, the runtime system invokes the computation on AMR levels (e.g., timeStep, initTimeStep, and postTimeStep).
86+
87+
Known Limitations
88+
=================
89+
90+
To realize enough task parallelism, the runtime system constructs a task dependency graph for the whole coarse time step and executes it asynchronously to the completion of the step.
91+
As a result, any request to regrid an AMR level must be foreseen before the execution of a coarse time step.
92+
If there is a regridding request during the graph execution, the runtime system simply ignores it.
93+
In the future we may relax this constraint in the programming model.
94+
However, such a support would come at a significant performance cost due to the required checkpointing and rollback activities.
95+

0 commit comments

Comments
 (0)