Skip to content

Commit 920ea7c

Browse files
hdellermehmetb0
authored andcommitted
crypto: xor - fix template benchmarking
BugLink: https://bugs.launchpad.net/bugs/2086242 [ Upstream commit ab9a244c396aae4aaa34b2399b82fc15ec2df8c1 ] Commit c055e3e ("crypto: xor - use ktime for template benchmarking") switched from using jiffies to ktime-based performance benchmarking. This works nicely on machines which have a fine-grained ktime() clocksource as e.g. x86 machines with TSC. But other machines, e.g. my 4-way HP PARISC server, don't have such fine-grained clocksources, which is why it seems that 800 xor loops take zero seconds, which then shows up in the logs as: xor: measuring software checksum speed 8regs : -1018167296 MB/sec 8regs_prefetch : -1018167296 MB/sec 32regs : -1018167296 MB/sec 32regs_prefetch : -1018167296 MB/sec Fix this with some small modifications to the existing code to improve the algorithm to always produce correct results without introducing major delays for architectures with a fine-grained ktime() clocksource: a) Delay start of the timing until ktime() just advanced. On machines with a fast ktime() this should be just one additional ktime() call. b) Count the number of loops. Run at minimum 800 loops and finish earliest when the ktime() counter has progressed. With that the throughput can now be calculated more accurately under all conditions. Fixes: c055e3e ("crypto: xor - use ktime for template benchmarking") Signed-off-by: Helge Deller <[email protected]> Tested-by: John David Anglin <[email protected]> v2: - clean up coding style (noticed & suggested by Herbert Xu) - rephrased & fixed typo in commit message Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Koichiro Den <[email protected]> Signed-off-by: Roxana Nicolescu <[email protected]>
1 parent 3d1a7c1 commit 920ea7c

File tree

1 file changed

+14
-17
lines changed

1 file changed

+14
-17
lines changed

crypto/xor.c

+14-17
Original file line numberDiff line numberDiff line change
@@ -83,33 +83,30 @@ static void __init
8383
do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
8484
{
8585
int speed;
86-
int i, j;
87-
ktime_t min, start, diff;
86+
unsigned long reps;
87+
ktime_t min, start, t0;
8888

8989
tmpl->next = template_list;
9090
template_list = tmpl;
9191

9292
preempt_disable();
9393

94-
min = (ktime_t)S64_MAX;
95-
for (i = 0; i < 3; i++) {
96-
start = ktime_get();
97-
for (j = 0; j < REPS; j++) {
98-
mb(); /* prevent loop optimization */
99-
tmpl->do_2(BENCH_SIZE, b1, b2);
100-
mb();
101-
}
102-
diff = ktime_sub(ktime_get(), start);
103-
if (diff < min)
104-
min = diff;
105-
}
94+
reps = 0;
95+
t0 = ktime_get();
96+
/* delay start until time has advanced */
97+
while ((start = ktime_get()) == t0)
98+
cpu_relax();
99+
do {
100+
mb(); /* prevent loop optimization */
101+
tmpl->do_2(BENCH_SIZE, b1, b2);
102+
mb();
103+
} while (reps++ < REPS || (t0 = ktime_get()) == start);
104+
min = ktime_sub(t0, start);
106105

107106
preempt_enable();
108107

109108
// bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s]
110-
if (!min)
111-
min = 1;
112-
speed = (1000 * REPS * BENCH_SIZE) / (unsigned int)ktime_to_ns(min);
109+
speed = (1000 * reps * BENCH_SIZE) / (unsigned int)ktime_to_ns(min);
113110
tmpl->speed = speed;
114111

115112
pr_info(" %-16s: %5d MB/sec\n", tmpl->name, speed);

0 commit comments

Comments
 (0)