-
Notifications
You must be signed in to change notification settings - Fork 200
Open
Description
so far we can use kaitai to parse from data to data
from low-level data (bytes) to high-level data (numbers, strings, ...)
it would be nice to have a way
to parse from data to code
to "reverse engineer" a high-level serialization code
which generates the same bytes as in the input data
this would be helpful to create custom serialization functions
by refactoring the generated code
this would also be helpful for fuzz-testing the correctness of ksy files
by feeding known-good data into the pipeline
example:
# some_format.ksy
meta:
id: some_format
endian: be
seq:
- id: key
size: 1
- id: value
type: str
size: 4
encoding: UTF-8
#!/usr/bin/env python3
input_data = b"\x00asdf"
import some_format
code = some_format.codegen_from_data(input_data)
with open("editme.py", "w") as f:
f.write(code)
eval(code)
_io = write_data()
_io.seek(0)
output_data = _io.read()
assert input_data == output_data
the generated code
would look like
#!/usr/bin/env python3
data_size = 5
def create_data():
import some_format
root = some_format.SomeFormat()
root.key = 0
root.value = "asdf"
root._check()
return root
def write_data():
import kaitaistruct
_io = kaitaistruct.KaitaiStream(io.BytesIO(bytearray(data_size)))
root = create_data()
root._write(_io)
return _io
first draft in kaitai_serialize_codegen.py
example output: codegen_result.py
Metadata
Metadata
Assignees
Labels
No labels