Skip to content

How to reverse engineer a buffer by hand

Daniel Wirtz edited this page Jan 3, 2017 · 32 revisions

Let's say you have the following buffer and that you want to know what's wrong with it:

Buffer <0a 0a 32 08 08 07 10 00 18 1a 20 00 0a 08 22 06 08 02 10 19 18 03 0a 09 22 07 08 02 10 a2 03 18 20 0a 09 22 07 08 02 10 8d 02 ...>

First you need to know whether the message is length delimited or not. If it is length delimited, the first byte(s) would specify the message's total length as a varint. Otherwise the message starts with the first field's tag directly.

How to decode a varint?

Varints occupy 1 to 10 bytes (var = variable) to encode integer values. The most significant bit of each byte indicates whether more bytes are following (msb=1, | 0x80) or not (msb=0, & 0x7f). Most of the time the actual value isn't of importance when just reverse engineering a standard buffer. In these cases you can just skip all bytes with the msb set plus one. Otherwise you can calculate the value by taking the other 7 bits (all except the msb) and add each byte's value by shifting it by 0, 7, 14, 21 etc. first. In code: var i = 0; while (buffer[pos] & 0x80) value |= (buffer[pos++] & 0x7f) << i++*7;

Back to our buffer: In our case the message isn't length delimited, hence the message starts with the first field's tag. Tags are encoded as a varint. The last 3 bits of the tag's value represent the wire type, all other bits except the last 3 represent the field id.

Let's start decoding:

0a	convert to binary: 1010
  	split into id and wireType: 1 | 010
  	convert to decimal: id = 1, wireType = 2
  	note that the msb isn't set here, hence it is a varint of just 1 byte

So, what are those wire types?

Value Wire type Size Possible types
0 varint 1 to 10 bytes int32, int64, uint32 etc.
1 fixed64 8 bytes (little endian) fixed64, sfixed64, double
2 ldelim varint length + length * bytes string, bytes, (inner) messages
3 start_group N/A N/A
4 end_group N/A N/A
5 fixed32 4 bytes (little endian) fixed32, sfixed32, float

Going back to our buffer, we now know that the field uses the ldelim wire type, which indicates a varint length followed by this exact amount of bytes.

0a	id = 1, wireType = 2
0a	convert to binary: 1010 (msb not set, last byte) => length 10

32 08 08 07 10 00 18 1a 20 00

0a	id = 1, wireType = 2
08	length = 8

22 06 08 02 10 19 18 03

0a	id = 1, wireType = 2 
  	we clearly have a pattern here. this is most likely a repeated field.
09	length = 9

22 07 08 02 10 a2 03 18 20

0a	id = 1, wireType = 2
09	length = 9

22 07 08 02 10 8d 02 ...

Unfortunately, that's all the bytes printed to console, so the next step would be to get the complete buffer and continue here OR to make a guess what actual type the length delimited chunks above could be. We already know it must be a repeated field because the same id is used multiple times.

What could wireType = 2 / ldelim refer to?

There are a couple of types encoding as the ldelim wire type: UTF8 encoded strings, raw bytes or inner messages. When looking at the chunks above, these do not appear to be strings so it makes sense to check for inner messages.

Let's start with the first:

32	110 | 010 = id 6, wireType 2
08	length = 8

08 07 10 00 18 1a 20 00

Just as we assumed, it appears that the chunks are inner messages. Again, the inner inner chunks do not appear to be strings, so let's assume another level of nesting:

08	1 | 000 = id 1, wireType 0
  	remember: wire type 0 is a varint
07	value 7

10	10 | 000 = id 2, wireType 0 (varint)
00	value 0

18	11 | 000 = id 3, wireType 0 (varint)
1a	value 26

20	100 | 000 = id 4, wireType 0 (varint)
00	value 0

So far the buffer looks ok and appears to match our assumptions on inner messages. Something else must be wrong. Either other inner messages are invalid or there is an error even farther down the road.

Next steps: Either obtain more data or validate the other inner chunks. You know the drill (now). Happy reverse engineering!