-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Hello,
I am applying your nice tool to typical stencil applications and I am observing very long simulation runtimes on high-dimensional stencils (several orders of magnitude longer than execution time). Most of the time is spent in the "warmup phase" and I am wondering about this:
kerncraft/kerncraft/cacheprediction.py
Line 563 in b5a302d
warmup_increment = ceildiv(max_cache_size // element_size, max_steps // 2) |
Does it assume that only one element is loaded/stored to the cache per iteration? On higher-dimensional stencils, I easily read 100-1000 elements per iteration.
So could something like this be used instead of element_size:
kerncraft/kerncraft/cacheprediction.py
Line 548 in b5a302d
sympy.Integer(self.kernel.bytes_per_iteration)) |
, but estimated on read elements per iteration? If this leads to inaccuracy, would this still be reasonably accurate?
I would have researched this in the related publications, but I couldn't find those details.
Thanks in advance!