Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/omptarget #1266

Draft
wants to merge 292 commits into
base: develop
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
292 commits
Select commit Hold shift + click to select a range
11a6747
omp: fix reduction by properly initializing omp_priv
jxy Apr 7, 2021
b76a1ed
omp: allow program to run when offloading is disabled
jxy Apr 7, 2021
b652488
Merge branch 'feature/generic_kernel' into omp
jxy Apr 21, 2021
7ba3c0c
omp: fix last merge
jxy Apr 21, 2021
3907f29
omp: ignore *_ctest
jxy Apr 21, 2021
08f6a52
omp: add explicit specialization for zero()
jxy May 10, 2021
e893ceb
omp: add more static functions for omp declare reduction [WIP]
jxy May 10, 2021
b1fae7d
omp: use global variable in omp target for emulating 3D kernels
jxy May 10, 2021
a14a446
omp: use multiple parallel regions in target teams region for block-l…
jxy May 10, 2021
e3e5cc1
Merge branch 'feature/generic_kernel' into omp
jxy May 10, 2021
a93bf9c
omp: add QUDA_RT_CONSTS in coarse_op_kernel for the last merge
jxy May 10, 2021
789839a
omp: remove unused j in reduction_kernel.h
jxy May 11, 2021
45c0f1c
omp: fix diagnostic output in reductions
jxy May 11, 2021
027acc8
omp: revert blas_test debug
jxy May 12, 2021
967bf34
omp: add a few commented out printf in reduction
jxy May 12, 2021
80a2535
omp: copy kernel arg to stack to for arg modifying kernels like caxpy…
jxy May 12, 2021
20600b7
Revert "omp: workaround rng"
jxy May 17, 2021
06fa0c4
omp: use rocrand_mrg32k3a for omp target
jxy May 17, 2021
7616d7d
omp: better handling of memcpyDefault
jxy May 17, 2021
0dfb5c7
omp: update default device parameters
jxy May 17, 2021
736600d
omp: try allocator(omp_pteam_mem_alloc) for shared memory [WIP]
jxy May 17, 2021
33db744
Merge branch 'feature/generic_kernel' into omp
jxy May 17, 2021
f299358
omp: only warning if no device
jxy May 21, 2021
a19a5b8
omp: add omp_init/reduce to MomUpdate
jxy May 21, 2021
06ca380
omp: update ompwip functions
jxy May 21, 2021
6a57605
omp: remove cpu side debug print in reduce_helper
jxy May 21, 2021
83b93dc
omp: remove debug print in tunable_nd/reduction
jxy May 21, 2021
ca6939e
omp: remove debug print in cuda_color_spinor_field
jxy May 21, 2021
bf353b3
omp: target/kernel cast pointer to void* before memcpy
jxy May 21, 2021
5302454
omp: target/math_helper updates and uses generic versions
jxy May 21, 2021
0dc1f61
omp: target/quda_api.cpp fix memset and add 2D versions
jxy May 21, 2021
081a685
omp: update malloc debug print
jxy May 21, 2021
c08b764
omp: try turning off shared in dslash_domain_wall_m5
jxy May 21, 2021
f46d30e
Merge branch 'feature/generic_kernel' into omp
jxy May 21, 2021
a80465b
omp: fix index_helper
jxy May 24, 2021
f318f1a
omp: cleanup malloc and device, add event timing
jxy May 24, 2021
3427941
omp: remove debug print in kernels
jxy May 24, 2021
0d656b5
omp: update math_helper
jxy May 24, 2021
c33546b
omp: remove debug print in tunable_nd
jxy May 24, 2021
67bc5b8
omp: test kernel launches to make sure the runtime allows the specifi…
jxy May 24, 2021
7c168dc
tunable: correctly respect constraints on launch parameters
jxy May 24, 2021
d4e05df
Merge branch 'feature/generic_kernel' into omp
jxy May 24, 2021
8345546
omp: clean up OMP WIP print messages
jxy May 25, 2021
c6d4f17
omp: reduce constexpr max_block_size to 512
jxy May 25, 2021
0fe3d55
Merge branch 'feature/generic_kernel' into omp
jxy Jun 9, 2021
76cab69
Merge branch 'feature/generic_kernel' into omp
jxy Jun 9, 2021
45c0703
omp: incorporate kernel launch changes
jxy Jun 11, 2021
40fc8a6
omp: temporary workaround for compiler bugs
jxy Jun 16, 2021
5cce322
omp: extend omptarget tuning parameters
jxy Jun 16, 2021
fd99e5e
Merge remote-tracking branch 'origin/feature/generic_kernel' into omp
jxy Jun 17, 2021
79689cd
omp: remove unused qudaLaunchKernel
jxy Jun 17, 2021
e4919e0
omp: add declaration
jxy Jun 17, 2021
2286ff0
omp: fix verbose timing
jxy Jun 17, 2021
1f1f67f
omp: fix qudaSetupLauchParam's namespace
jxy Jun 17, 2021
7ff2dae
Merge remote-tracking branch 'origin/feature/generic_kernel' into omp
jxy Jun 25, 2021
befa972
Merge branch 'feature/generic_kernel' into omp
jxy Aug 18, 2021
c024c2a
omp: omptarget conforms to cuda from last merge
jxy Aug 18, 2021
56b1e17
Merge remote-tracking branch 'origin/feature/generic_kernel' into omp
jxy Sep 2, 2021
479bfaf
Merge remote-tracking branch 'origin/feature/generic_kernel' into omp
jxy Sep 10, 2021
9d7a0d1
omp: finish merging target changes
jxy Sep 10, 2021
c6fa602
Revert "omp: temporary workaround for compiler bugs"
jxy Sep 10, 2021
871d034
omp: fix return type of __uint_as_float
jxy Sep 11, 2021
d5550c8
Merge remote-tracking branch 'origin/feature/generic_kernel' into omp
jxy Sep 15, 2021
e738528
Revert "omp: try allocator(omp_pteam_mem_alloc) for shared memory [WIP]"
jxy Sep 15, 2021
28633d6
Revert "omp: try turning off shared in dslash_domain_wall_m5"
jxy Sep 15, 2021
110d267
omp: only define atomics for limited types
jxy Sep 17, 2021
556ffab
omp: implement shared local memory using stack array inside teams
jxy Sep 17, 2021
0fffba4
omp: implement block/warpreduction pretending to be cub
jxy Sep 17, 2021
31e915b
Merge remote-tracking branch 'origin/feature/generic_kernel' into omp
jxy Sep 17, 2021
c32cdd9
omp: enable cub_helper and block_orthogonalize
jxy Sep 17, 2021
d33e67d
omp: raise warp_size to 16
jxy Sep 21, 2021
f738d52
Merge remote-tracking branch 'origin/feature/generic_kernel' into omp
jxy Sep 21, 2021
8b840f5
Introduce QUDA_UNROLL for setting #pragma unroll
jxy Oct 4, 2021
a9819a5
omp: use omp atomic capture compare properly
jxy Oct 4, 2021
04a4798
Merge remote-tracking branch 'origin/feature/generic_kernel' into omp
jxy Oct 5, 2021
77c0be1
omp: fix last merge
jxy Oct 5, 2021
5fd11fd
omp: replace extra pragma unroll
jxy Oct 6, 2021
6488f40
Merge remote-tracking branch 'origin/develop' into omp
jxy Oct 11, 2021
5ad6e6b
Merge remote-tracking branch 'origin/develop' into omp
jxy Oct 29, 2021
f3676ee
omp: define threadIdx for DW m5 dslash
jxy Oct 29, 2021
3b7d8ab
merge upstream develop
jxy Nov 10, 2021
65f79df
omp: fix last merge and clean up target files
jxy Dec 8, 2021
86fc838
Merge remote-tracking branch 'origin/develop' into omp
jxy Dec 8, 2021
5f91ffd
omp: save unknown pointers in get_pointer_location
jxy Dec 8, 2021
77b2c0f
omp: introduce defines for unknown device properties
jxy Dec 8, 2021
ce5f14b
omp: back out change to maxGridSize
jxy Dec 9, 2021
f024788
replace remaining pragma unroll with QUDA_UNROLL
jxy Dec 10, 2021
3df91c9
omp: prepare for cmake (not cmake ready yet)
jxy Dec 10, 2021
319e197
omp: fix quda_arch.h
jxy Dec 10, 2021
c7b70c5
omp: remove extra omp.h includes
jxy Dec 10, 2021
73a4e98
Merge remote-tracking branch 'origin/develop' into omp
jxy Dec 16, 2021
544849c
Merge remote-tracking branch 'origin/develop' into omp
jxy Dec 16, 2021
9bc64c1
omp: include math_helper in blas_helper
jxy Dec 16, 2021
fce494a
Merge remote-tracking branch 'origin/develop' into omp
jxy Jan 19, 2022
c743dbc
Merge remote-tracking branch 'origin/develop' into omp
jxy Feb 17, 2022
1f5aa69
Merge remote-tracking branch 'origin/develop' into omp
jxy Mar 2, 2022
77208a9
Merge remote-tracking branch 'origin/develop' into omp
jxy Mar 14, 2022
76bd682
Merge remote-tracking branch 'origin/develop' into omp
jxy Mar 22, 2022
1fb4e9c
Merge remote-tracking branch 'origin/develop' into omp
jxy Mar 30, 2022
3f06b80
Merge remote-tracking branch 'origin/develop' into omp
jxy Apr 11, 2022
2c2c392
omp: update reducer with generic block reduce
jxy Apr 11, 2022
ca04701
omp: fix get_pointer_location with a hack
jxy Apr 14, 2022
fc69826
omp: update from cuda (mostly formatting)
jxy Apr 15, 2022
3efc505
omp: workaround issues with omp barrier
jxy Apr 16, 2022
4ccd102
omp: reorder reduce() so all threads call BlockReduce at a single place
jxy Apr 16, 2022
b11cb1c
omp: use explicit memcpy for arg in block reduction
jxy Apr 19, 2022
f942ec2
omp: cmake works
jxy Apr 19, 2022
80cc4a5
omp: use our own implementation of MRG32k3a
jxy Apr 20, 2022
c560194
Merge remote-tracking branch 'origin/develop' into omp
jxy Apr 20, 2022
db9d29b
omp: revert generic/fast_intdiv.h to upstream version
jxy Apr 20, 2022
6329d57
omp: use "omp for" for block reduce
jxy Apr 18, 2022
b2b4c5a
omp/mrg32k3a: use numeric_limits::min for TINY
jxy Apr 21, 2022
778056c
Merge remote-tracking branch 'origin/develop' into omp
jxy May 6, 2022
c04ab72
omp: update function types in comm_target.cpp
jxy May 6, 2022
e2f9f73
omp: reworked shared cache system, replaced single address passing
jxy May 18, 2022
2d6335f
omp: allocate global memory for shared local memory
jxy May 18, 2022
1bdd388
omp: fix min/max for arrays
jxy May 20, 2022
da4a880
omp: cmake allows larger parameters for testing
jxy May 20, 2022
e3fff02
Merge remote-tracking branch 'origin/develop' into omp
jxy May 20, 2022
b8141d9
Merge remote-tracking branch 'origin/develop' into omp
jxy May 25, 2022
f0fc798
omp: disable simulation of get_num_threads in target
jxy May 27, 2022
6e0b064
omp: use firstprivate on teams to get thread private copy (non-starda…
jxy May 27, 2022
c449c4a
Merge remote-tracking branch 'origin/develop' into omp
jxy May 27, 2022
493e2e2
omp: update reductions to upstream API
jxy May 31, 2022
f95738f
omp: use large shared memory size (in global mem space) by default
jxy May 31, 2022
7de3592
omp: thread/block_idx use uint
jxy May 31, 2022
078fe8a
omp: check shared cache allocation
jxy Jun 1, 2022
53f70d7
omp: fix race condition in omp for; replace omp for with manual reduc…
jxy Jun 1, 2022
b776f9f
omp: test implementations of block reduce
jxy Jun 1, 2022
2f8380a
omp: using share memory for warp_combine
jxy Jun 3, 2022
834cbcf
omp: change the constraint on qudaLaunchKernel a bit
jxy Jun 3, 2022
a402f6b
Merge remote-tracking branch 'origin/develop' into omp
jxy Jun 9, 2022
97872e2
omp: update math_helper
jxy Jun 9, 2022
67b5047
omp: block_reduce array element iteratively with limited shared mem
jxy Jun 9, 2022
6cf1c68
omp: block_reduce split large arrays into smaller ones
jxy Jun 24, 2022
7f47a40
omp: use omp_target_alloc_host for pinned/mapped malloc
jxy Jun 24, 2022
ac5771b
omp: clean up kernel launch
jxy Jun 24, 2022
f639261
omp: reduce max_kernel_arg_size to 2048
jxy Jun 24, 2022
8cba780
omp: use a single buffer for arg and extern in header
jxy Jun 24, 2022
f5dec10
Merge remote-tracking branch 'origin/develop' into omp
jxy Jun 25, 2022
b14af6d
omp: clean up malloc
jxy Jun 29, 2022
a3abf65
omp: remove device::get_arg, directly passing arg pointer to kernels …
jxy Jun 29, 2022
3d434c5
omp: use constant_kernel_arg for coarse_op
jxy Jun 29, 2022
7d3838d
Merge remote-tracking branch 'origin/develop' into omp
jxy Jul 7, 2022
1622d0f
omp: fix max_nthr type in warp_collective
jxy Jul 7, 2022
c63f03a
Merge remote-tracking branch 'origin/develop' into omp
jxy Jul 13, 2022
9f8055f
Merge remote-tracking branch 'origin/develop' into omp
jxy Jul 13, 2022
03aeb63
omp: use MKL cgetrf/i and ?gemm, with the 32 bit integer interface
jxy Jul 19, 2022
8b1a57f
omp: reinstate #pragma unroll
jxy Jul 21, 2022
4ff05f6
omp: remove some unnecessary QUDA_RT_CONSTS
jxy Jul 21, 2022
24af9ef
omp: add constant_kernel_arg.h for dslash5_domain_wall
jxy Jul 22, 2022
c3de292
omp: use host pinned memory for global device var
jxy Jul 26, 2022
5dc20af
Revert "omp: use constant_kernel_arg for coarse_op"
jxy Jul 26, 2022
4b9a3ce
Revert "omp: add constant_kernel_arg.h for dslash5_domain_wall"
jxy Jul 26, 2022
79c6c7d
Revert "omp: reduce max_kernel_arg_size to 2048"
jxy Jul 26, 2022
c98f8c2
blas_core: explicitly static assert kernel arg for caxpyxmazMR_
jxy Jul 27, 2022
0fc6bc2
kernel_param: force use_kernel_arg when larger than 1
jxy Jul 27, 2022
1bbdf17
omp: reduce max_kernel_arg_size down to 64
jxy Jul 27, 2022
1469d48
omp: correct setup launch param behavior
jxy Jul 27, 2022
9bd0fda
cmake: use C++20 by default
jxy Jul 28, 2022
b976d80
array: remove default for c++20 compatibility
jxy Jul 28, 2022
decaacb
omp: introducing requires_threads_sync in some kernel args to avoid h…
jxy Jul 28, 2022
b9f1d1e
omp: reserve a fixed amount of global mem to emulate SLM with env QUD…
jxy Aug 3, 2022
7432447
Merge remote-tracking branch 'origin/develop' into omp
jxy Aug 3, 2022
02ae61a
omp: only filter out by arg.threads%tp.block
jxy Aug 4, 2022
f4f9499
omp: fix reserve mem size printf
jxy Aug 4, 2022
6768d31
omp: tag init_arg and rngArg with ThreadsSyncNo
jxy Aug 4, 2022
e91fe52
Merge remote-tracking branch 'origin/develop' into omp
jxy Aug 8, 2022
08aa798
Merge remote-tracking branch 'me/use_kernel_arg' into omp
jxy Aug 8, 2022
18aacdb
omp: update reduce_helper with use_kernel_arg
jxy Aug 8, 2022
e7dfe22
Merge remote-tracking branch 'origin/develop' into omp
jxy Aug 9, 2022
588db8a
Merge remote-tracking branch 'me/use_kernel_arg' into omp
jxy Aug 9, 2022
556f767
omp: follow use_kernel_arg changes
jxy Aug 9, 2022
87197d9
Merge remote-tracking branch 'origin/develop' into omp
jxy Aug 10, 2022
30f2fdd
omp/mkl: use strided API; add zgetrf/i; use 64bit for large batch
jxy Aug 11, 2022
001d4b0
Merge remote-tracking branch 'origin/develop' into omp
jxy Aug 22, 2022
62b2423
Merge remote-tracking branch 'origin/develop' into omp
jxy Oct 26, 2022
60eb524
omp: add pragma once to thread_array.h
jxy Oct 26, 2022
2b5663f
Merge branch 'use_kernel_arg' into omp
jxy Oct 27, 2022
09b4c2d
omp: remove our copy of mrg32k3a, use generic
jxy Nov 7, 2022
9dac653
omp: mark launch_param declare target
jxy Nov 7, 2022
b91feea
omp/blas_lapack_mkl: use 5.1 standard omp dispatch instead of Intel e…
jxy Nov 17, 2022
e2281c9
Merge remote-tracking branch 'upstream/develop' into omp
jxy Nov 28, 2022
7a9705d
omp: declare target some static constexpr
jxy Dec 6, 2022
5d34044
color_spinor_pack: tag PackGhostArg with require thread sync X
jxy Dec 9, 2022
27baa06
Merge remote-tracking branch 'upstream/develop' into omp
jxy Dec 13, 2022
fa86a35
Merge remote-tracking branch 'upstream/develop' into omp
jxy Jan 12, 2023
fdd58d6
omp: static_assert enough shared cache for shared_atomic VUV
jxy Jan 12, 2023
5580276
Merge remote-tracking branch 'upstream/develop' into omp
jxy Jan 24, 2023
55e44c4
omp: follow upstream changes to shared_memory_cache
jxy Jan 24, 2023
8baca17
omp: set pragma unroll threshold; teach cmake to leave it be
jxy Jan 25, 2023
368a7aa
Merge remote-tracking branch 'upstream/develop' into omp
jxy Feb 21, 2023
eebc074
omp: remove pragma-unroll-threshold from default options
jxy Feb 22, 2023
79f2cf4
Merge branch 'hotfix/init_stag_test_link' into omp
jxy Feb 23, 2023
c3b6058
Merge remote-tracking branch 'upstream/develop' into omp
jxy Mar 14, 2023
983d09e
omp: better control of thread limit
jxy Mar 15, 2023
d4be5f6
omp: optinally build omp target with jit
jxy Mar 15, 2023
22ae40f
Merge remote-tracking branch 'upstream/develop' into omp
jxy Mar 22, 2023
07bd6bb
Fix initialization bug for test_split_grid
maddyscientist Feb 8, 2023
2be6636
omp: update backend following upstream
jxy Mar 23, 2023
301bc9d
omp: fix sizeof
jxy Mar 24, 2023
b9cd6f3
Merge remote-tracking branch 'upstream/develop' into omp
jxy Mar 30, 2023
3e8b1a2
Revert "omp: declare target some static constexpr"
jxy Mar 30, 2023
e642e40
omp: make sure extra args/spaces does not break version string
jxy Apr 26, 2023
0b687fe
Merge remote-tracking branch 'upstream/develop' into omp
jxy May 23, 2023
5f438d4
omp: update target specific code to conform to upstream
jxy May 24, 2023
9937883
omp: tests/cmake WA: ignore unresolved symbols from device functions …
jxy May 24, 2023
dce8a94
Merge remote-tracking branch 'upstream/develop' into omp
jxy May 24, 2023
7167569
Merge remote-tracking branch 'upstream/develop' into omp
jxy Jun 12, 2023
df7f415
Merge remote-tracking branch 'upstream/develop' into omp
jxy Jul 11, 2023
9b7a00e
Merge remote-tracking branch 'upstream/develop' into omp
jxy Jul 19, 2023
96c7f39
Merge remote-tracking branch 'upstream/develop' into omp
jxy Nov 1, 2023
c77b2d3
omp: update target api following upstream
jxy Nov 2, 2023
36276e9
omp: use omp groupprivate for slm
jxy Nov 9, 2023
754cb56
omp: add option for fixed simd16
jxy Dec 5, 2023
1bc80f9
omp: provide variants of kernel launching methods
jxy Dec 5, 2023
fc5303b
omp: update env names and compilation params
jxy Dec 6, 2023
de9bed4
omp: avoid direct char access in memset, use float when possible
jxy Dec 6, 2023
754bece
Merge remote-tracking branch 'upstream/develop' into omp
jxy Dec 13, 2023
426017f
omp: remove per_kernel compilation and add huge-device-code options
jxy Feb 1, 2024
5b9a8b8
Merge remote-tracking branch 'upstream/develop' into omp
jxy Feb 29, 2024
a68544f
omp: update following upstream
jxy Feb 29, 2024
629d0b4
omp: provide QUDA_OMPTARGET_THREAD_ARRAY_SLM to move thread_array loc…
jxy Feb 29, 2024
6522f88
Merge remote-tracking branch 'upstream/develop' into omp
jxy Mar 30, 2024
a5678dd
Merge remote-tracking branch 'upstream/develop' into omp
jxy Apr 5, 2024
c2b5fd7
Merge remote-tracking branch 'upstream/develop' into omp
jxy Apr 17, 2024
156d81a
omp: include <utility>
jxy Apr 17, 2024
872c8bf
omp: QUDA_OMPTARGET_DEBUG only CPU side for now
jxy Apr 26, 2024
aded7a1
Merge remote-tracking branch 'upstream/develop' into omp
jxy Apr 26, 2024
605edaf
Merge remote-tracking branch 'upstream/develop' into omp
jxy Apr 29, 2024
86648d0
Merge branch 'unpack_fix' into omp
jxy May 1, 2024
8b87595
Merge remote-tracking branch 'upstream/develop' into omp
jxy May 13, 2024
ea0aaba
Merge remote-tracking branch 'upstream/develop' into omp
jxy May 23, 2024
98addc4
omp: fix omptarget after merge
jxy May 23, 2024
1568397
Merge remote-tracking branch 'upstream/hotfix/complex_template' into omp
jxy May 23, 2024
41eec7e
Merge remote-tracking branch 'upstream/develop' into omp
jxy Jun 13, 2024
16b7eab
omp: work around a compiler bug
jxy Jun 13, 2024
c687a04
Merge remote-tracking branch 'upstream/develop' into omp
jxy Aug 2, 2024
9a5547a
omp: add empty device::get_state
jxy Aug 5, 2024
f6fe04e
BlockKernel2D_host: create block inside omp parallel
jxy Aug 9, 2024
1d9db13
Merge remote-tracking branch 'upstream/develop' into omp
jxy Aug 30, 2024
71231b4
omp: use memcpy for vector_load/store
jxy Aug 30, 2024
891f080
Merge remote-tracking branch 'upstream/develop' into omp
jxy Sep 6, 2024
c918bde
Merge remote-tracking branch 'upstream/develop' into omp
jxy Sep 24, 2024
8badbd9
Merge remote-tracking branch 'upstream/develop' into omp
jxy Sep 27, 2024
a67e6e3
Merge remote-tracking branch 'upstream/develop' into omp
jxy Oct 16, 2024
cb647c0
Merge remote-tracking branch 'upstream/develop' into omp
jxy Oct 24, 2024
af51dd0
omptarget/reduce_helper.h: atomic read the partial results
jxy Nov 1, 2024
8794c0d
Merge remote-tracking branch 'upstream/develop' into omp
jxy Nov 1, 2024
d351388
omp: omptarget/atomic_helper: add atomic_read for complex<T>
jxy Nov 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge remote-tracking branch 'upstream/develop' into omp
jxy committed May 23, 2024
commit ea0aabaf62068deccb1455a2f14514485db5ccff
9 changes: 2 additions & 7 deletions include/kernels/gauge_stout.cuh
Original file line number Diff line number Diff line change
@@ -151,13 +151,8 @@ namespace quda
dir = dir + (dir >= arg.dir_ignore);

Link U, Q;
#ifdef QUDA_OMPTARGET_THREAD_ARRAY_SIMPLE
Link Stap;
Link Rect;
#else
ThreadLocalCache<Link, 0, computeStapleRectangleOps> Stap;
ThreadLocalCache<Link, 0, decltype(Stap)> Rect; // offset by Stap type to ensure non-overlapping allocations
#endif
typename OvrImpSTOUTOps<Arg>::StapCacheT Stap {*this};
typename OvrImpSTOUTOps<Arg>::RectCacheT Rect {*this};

// This function gets stap = S_{mu,nu} i.e., the staple of length 3,
// and the 1x2 and 2x1 rectangles of length 5. From the following paper:
2 changes: 1 addition & 1 deletion include/kernels/gauge_utils.cuh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#include <gauge_field_order.h>
#include <index_helper.cuh>
#include <quda_matrix.h>
#include <shared_memory_cache_helper.h>
#include <thread_local_cache.h>
#include <thread_array.h>

namespace quda
11 changes: 3 additions & 8 deletions include/kernels/gauge_wilson_flow.cuh
Original file line number Diff line number Diff line change
@@ -83,14 +83,9 @@ namespace quda
// This function gets stap = S_{mu,nu} i.e., the staple of length 3,
// and the 1x2 and 2x1 rectangles of length 5. From the following paper:
// https://arxiv.org/abs/0801.1165
#ifdef QUDA_OMPTARGET_THREAD_ARRAY_SIMPLE
Link Stap;
Link Rect;
#else
ThreadLocalCache<Link, 0, computeStapleRectangleOps> Stap;
ThreadLocalCache<Link, 0, decltype(Stap)> Rect; // offset by Stap type to ensure non-overlapping allocations
#endif
computeStapleRectangle(arg, x, arg.E, parity, dir, Stap, Rect, Arg::wflow_dim);
typename computeStapleOpsWF<Arg>::StapOp Stap {ftor};
typename computeStapleOpsWF<Arg>::RectOp Rect {ftor};
computeStapleRectangle(ftor, x, arg.E, parity, dir, Stap, Rect, Arg::wflow_dim);
Z = arg.coeff1x1 * static_cast<const Link &>(Stap) + arg.coeff2x1 * static_cast<const Link &>(Rect);
}
return Z;
You are viewing a condensed version of this merge commit. You can view the full changes here.