Cache Simulator Speedup Improvements: Elements per Iteration assumption?

Hello,

I am applying your nice tool to typical stencil applications and I am observing very long simulation runtimes on high-dimensional stencils (several orders of magnitude longer than execution time). Most of the time is spent in the "warmup phase" and I am wondering about this:

https://github.com/RRZE-HPC/kerncraft/blob/b5a302d20669fe6c1d1ee08ebcc68457968d9257/kerncraft/cacheprediction.py#L563

Does it assume that only one element is loaded/stored to the cache per iteration? On higher-dimensional stencils, I easily read 100-1000 elements per iteration.

So could something like this be used instead of element_size:
https://github.com/RRZE-HPC/kerncraft/blob/b5a302d20669fe6c1d1ee08ebcc68457968d9257/kerncraft/cacheprediction.py#L548

, but estimated on read elements per iteration? If this leads to inaccuracy, would this still be reasonably accurate?

I would have researched this in the related publications, but I couldn't find those details.

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache Simulator Speedup Improvements: Elements per Iteration assumption? #138

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache Simulator Speedup Improvements: Elements per Iteration assumption? #138

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions