Instrumentation could be faster #1

vanhauser-thc · 2020-05-10T08:56:21Z

I was looking at tracehook_line_trace_hook in _tracehookmodule.c - the instrumentation you are doing is overly complex and therefore is very, very time consuming.

if the bytecode offset is something that does not change during fuzzing, why not solely use this information? maybe do some assessment of the value and e.g. if its always of minimum increments of 8 then do a (bytecode_offset >> 3) % MAP_SIZE and be done with it.
and if it is important that a lineno exits just check that it exists?
IMHO the multiplication just decreases speed and mixing the different values gives no benefit.

Try to remove everything that is not absolutely needed or can be precomputed somewhere else.

also this is useless:

uint32_t this_loc = state >> (32-afl_map_size_bits);

if the map size would change during a run, afl would not know about it anyway.

The text was updated successfully, but these errors were encountered:

risicle · 2020-05-10T10:44:04Z

I was looking at tracehook_line_trace_hook in _tracehookmodule.c - the instrumentation you are doing is overly complex and therefore is very, very time consuming.

Well, because I'm working with CPython, I rather have the luxury of not having to worry about micro-optimizations because in the grand scheme of things, the branches and multiplies I add here are negligible in their impact - doing the three python attribute lookups in this tracehook each result in many hundreds of instructions, with much branching and pointer chasing.

The bytecode offset is a code-object-local offset (code objects are like functions), so if I just used that I would just end up with a bunch of numbers usually not going past a couple hundred and we'd be in collision city.

A value more likely for direct use might be f_lineno, because it doesn't really measure line number at all - I'm abusing it to count basic blocks and it has a per-code-object pseudorandom base offset. It would provide clusters of consecutive values at random places in the map space, which I guess could be alright. If I think about it, the traditional wisdom I've always received about making sure your hash functions are nicely mixed is usually about their use in hash tables, where clusters of values will result in overweighted buckets - but there are no buckets in AFL's map so maybe it doesn't matter. But then again, the first thing AFL does with a prev_loc value is >>1, so there's one of the bits you were relying on gone already.

So, the reason for the mixing is mostly paranoia - but I think it's affordable paranoia given the context - I'd rather a tiny slowdown than have me wiping out all my entropy through some stupid bit shifting mistake.

I will continue to think about how necessary this really is though, and definitely appreciate the analysis of my code 👍

if the map size would change during a run, afl would not know about it anyway.

Maybe not, but my tests will 😁. One bit of that will otherwise get shifted into the final result through the prev_loc>>1 and that will cause my tests to fail. Another affordable laziness I'm afraid.

risicle · 2020-05-10T10:45:29Z

Damn, you've just made me realize that it will cope with a decrease in map size, but not an increase. Maybe it is useless.

vanhauser-thc · 2020-05-10T13:32:30Z

I would recommend to compare the speed of the existing implementation and my proposed changes. I understand your paranoia but IMHO you just decrease speed and increase the chance for collisions in the map and no getting anything positive from it.
but this is always easily arguable in both directions. only real tests show the impact and difference :)

risicle · 2020-05-10T14:02:42Z

Absolutely - I'll probably look into this next time I'm getting frustrated with the fuzzing speed of a python-heavy program. As it is, most of my current targets spend the majority of their time in native extensions.

risicle · 2020-05-10T14:30:22Z

(though I'm not sure how mixing etc will increase collisions...?)

risicle changed the title ~~instrumentation~~ Instrumentation could be faster Jun 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrumentation could be faster #1

Instrumentation could be faster #1

vanhauser-thc commented May 10, 2020

risicle commented May 10, 2020

risicle commented May 10, 2020

vanhauser-thc commented May 10, 2020

risicle commented May 10, 2020

risicle commented May 10, 2020

Instrumentation could be faster #1

Instrumentation could be faster #1

Comments

vanhauser-thc commented May 10, 2020

risicle commented May 10, 2020

risicle commented May 10, 2020

vanhauser-thc commented May 10, 2020

risicle commented May 10, 2020

risicle commented May 10, 2020