Skip to content

serialization: encode complex values #1242

@milahu

Description

@milahu

before serialization, kaitai was only about decoding from bytes to values
with serialization, kaitai can also do encoding from values to bytes

so far, this encoding works for simple values, but fails for "complex" values
example: vlq_base128_be.ksy

seq:
  - id: groups
    type: group
    repeat: until
    repeat-until: not _.has_next
types:
  group:
    seq:
      - id: has_next
        type: b1
      - id: value
        type: b7
instances:
  last:
    value: groups.size - 1
  value:
    value: |
      (groups[last].value
      + (last >= 1 ? (groups[last - 1].value << 7) : 0)
      + (last >= 2 ? (groups[last - 2].value << 14) : 0)
      + (last >= 3 ? (groups[last - 3].value << 21) : 0)
      + (last >= 4 ? (groups[last - 4].value << 28) : 0)
      + (last >= 5 ? (groups[last - 5].value << 35) : 0)
      + (last >= 6 ? (groups[last - 6].value << 42) : 0)
      + (last >= 7 ? (groups[last - 7].value << 49) : 0)).as<u8>

in python, (naively) setting value fails with

$ kaitai-struct-compiler kaitai_struct_formats/common/vlq_base128_be.ksy --target python --read-write --no-auto-read
$ python
>>> import vlq_base128_be
>>> i = vlq_base128_be.VlqBase128Be()
>>> i.value = 123
AttributeError: property 'value' of 'VlqBase128Be' object has no setter

... because instances are always read-only

workaround:

# TODO encode int(2**14-1) to vlq_bytes
vlq_bytes = b"\xff\x7f"
i = vlq_base128_be.VlqBase128Be.from_bytes(vlq_bytes)
assert i.value == 2**14-1

possible solution:

i = vlq_base128_be.VlqBase128Be.from_value(2**14-1)
assert i.value == 2**14-1

in the .ksy file, the from_value constructor could be declared like

constructors:
  from_value:
    inputs:
      - value
    outputs:
      bytes: |
        value < 2**7 ? [value] :
        value < 2**14 ? [
          (value >> 7) | 2**7,
          value & (2**7 - 1)
        ] :
        # ...
>>> value = 2**14-1
>>> list(map(lambda b: bin(b).split("b")[1].zfill(8), [(value >> 7) | 2**7, value & (2**7 - 1)]))
['11111111', '01111111']

see also pyvlq.encode

the outputs dict can hold temporary variables
needed to produce the final output bytes

every byte is an integer from 0 to 255

todo: cache bytes for serialization
this is faster than deriving bytes from seq values

keywords:

  • inverse of instances
  • opposite of instances
  • reverse instances
  • builtin types vs user-defined types

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions