Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curiosity #11

Open
DevAndArtist opened this issue Feb 22, 2019 · 3 comments
Open

Curiosity #11

DevAndArtist opened this issue Feb 22, 2019 · 3 comments
Labels

Comments

@DevAndArtist
Copy link

Hi @jverkoey, this is not an issue report but a question I'd like to ask. Why do you think we need an extra set of protocols to interact with binary data? I work with a bluetooth peripheral at my org and we created a naive implementation and mapping of the custom data layouts for our BLE API on the App side. My long term goal was it to explore custom decoder / encoder for Codable to unify the implementation by a battle tested functionality and also make use of code synthetisation as much as possible.

I also would love if you could bring up this topic about binary data to the Swift forums. I think some community member would love to elaborate with you on the general problem. I also think this problem area should be solved generally in the stdlib, because one day Swift will likely enter the embedded region where this type of functionality will be indispensable.

@jverkoey
Copy link
Owner

jverkoey commented Feb 23, 2019

Solid question, and one I'll answer here but likely need to expand upon in the repo's docs. Here's the sequence of events that led me here:

When I first built https://github.com/jverkoey/MySqlConnector/ I used a naive binary decoding implementation that used iterators of [UInt8]/Data to consume the data from a socket. See an example of this here. Snippet included below:

public struct LengthEncodedString {
  public init?(data: Data, encoding: String.Encoding) throws {
    // Empty data is not a length-encoded string.
    if data.isEmpty {
      return nil
    }

    let integer: LengthEncodedInteger
    do {
      guard let integerOrNil = try LengthEncodedInteger(data: data) else {
        return nil
      }
      integer = integerOrNil
    } catch let error {
      if let lengthEncodedError = error as? LengthEncodedIntegerDecodingError {
        switch lengthEncodedError {
        case .unexpectedEndOfData(let expectedAtLeast):
          throw LengthEncodedStringDecodingError.unexpectedEndOfData(expectedAtLeast: expectedAtLeast)
        }
      }
      throw error
    }

    self.length = UInt64(integer.length) + UInt64(integer.value)

    let remainingData = data[integer.length..<(integer.length + UInt(integer.value))]
    if remainingData.count < integer.value {
      throw LengthEncodedStringDecodingError.unexpectedEndOfData(expectedAtLeast: UInt(integer.value))
    }

    guard let string = String(data: remainingData, encoding: encoding) else {
      throw LengthEncodedStringDecodingError.unableToCreateStringWithEncoding(encoding)
    }
    self.value = string
  }
}

This implementation was fast and effective, but managing type conversions, bounds checking, and making data iterators became a fairly repetitive pattern. Swift Codable came to mind as a possible improvement, so I began exploring it in jverkoey/MySqlClient#23. You can see a proof of concept in the first commit of that PR.

In essence, I moved the data iterator into a custom Decoder implementation and updated my payloads to conform to Decodable:

public struct LengthEncodedString: Codable {
  public init(from decoder: Decoder) throws {
    var container = try decoder.unkeyedContainer()

    let length = try container.decode(LengthEncodedInteger.self)
    self.length = UInt64(length.length) + UInt64(length.value)

    let stringData = try (0..<length.value).map { _ in try container.decode(UInt8.self) }
    self.value = String(data: Data(stringData), encoding: .utf8)!
  }

Quite a bit simpler now, but in doing so I encountered a few concerns about Codable's applicability to binary data solutions, which I've outlined below.

Swift Codable assumes complex external representations are dictionaries

One of the main benefits of Swift Codable is that you can get encoding and decoding of complex types for free. These for-free implementations rely on CodingKeys that must exist in some manner in the external representation. Binary data unfortunately does not always have a concept of a named key; at least not without completely parsing the data representation which defeats the purpose of the Codable interface.

While Swift's default behavior can be hacked to our benefit by assuming that each property will be decoded in the order in which it was defined — Mike Ash took this approach — I prefer clearly debuggable code when working with binary formats. There are also enough quirks with binary formats that the assumption of Decodable primitives mapping to binary primitives can fall over pretty quickly (length-encoded strings being a good example).

Aside: I do think there is potential in BinaryCodable to provide some for-free implementations of complex types; my thoughts are outlined here: #4.

So in practice, binary representations written with Codable will almost always have to provide an explicit implementation anyway in order to "opt out" of the keyed external representation assumption. This wasn't a deal-breaker, it just meant binary representations wouldn't benefit from Codable's code generation for complex types (somewhat reducing the value of Codable).

Swift Codable's primitives do not give access to underlying data

This is what ended up being the deal-breaker for me. Let's look again at that length-encoded string implementation using Codable:

public struct LengthEncodedString: Codable {
  public init(from decoder: Decoder) throws {
    var container = try decoder.unkeyedContainer()

    let length = try container.decode(LengthEncodedInteger.self)
    self.length = UInt64(length.length) + UInt64(length.value)

    let stringData = try (0..<length.value).map { _ in try container.decode(UInt8.self) }
    self.value = String(data: Data(stringData), encoding: .utf8)!
  }

Particularly this line:

let stringData = try (0..<length.value).map { _ in try container.decode(UInt8.self) }

Swift Codable does not have a primitive of "arbitrary bytes of data", so we're forced to channel all byte encoding/decoding one UInt8 at a time. We could encode/decode one UInt64 at a time, but the implementation then needs to handle lengths that are not multiples of 8 gracefully. Either way, this is a substantial cpu bottleneck for larger blocks of data.

Without a healthy way to work with arbitrary blocks of data, Codable's value dipped from "reasonable, given we don't get free code generation" to "negative, given there is now a significant performance penalty".

Swift Codable does not encourage correctness by default for binary representations

This is a minor point, but one I feel is worth mentioning because on the average I feel Swift is a wonderful language directly because it encourages correctness.

Swift Codable has three container types: keyed, unkeyed, and singleValue. Binary data does not necessarily benefit from these three layers of abstraction, so in practice all of my binary types were using unkeyed containers to hack the external representation as an array of bytes (using the UInt8 primitive). As such, unkeyed containers are in essence the only "correct" container in Codable for complex binary data, so the availability of incorrect containers was a source of tension for me as I was implementing more complex types.

BinaryCodable's solutions to the above concerns

BinaryCodable takes inspiration from Swift Codable, but makes a few distinct architectural decisions that optimize it for working with binary data:

  1. BinaryCodable is essentially a type-safe, Codable-like equivalent to the C family of file operators, with only fread-and fwrite-like behavior implemented thus far. I may add fseek-like behavior in the future as needed.
  2. Only one container type is provided. This encourages correctness.
  3. There are APIs for encoding and decoding arbitrary blocks of data.
  4. There are APIs for encoding and decoding strings, either terminated or container-bound.
  5. RawRepresentable types do get auto-generated BinaryCodable implementations for free using protocol extensions. Complex types will require some more care and thought.

And finally, this is the BinaryCodable version of the LengthEncodedString implementation:

struct LengthEncodedString: BinaryDecodable {
  init(from decoder: BinaryDecoder) throws {
    var container = decoder.sequentialContainer(maxLength: nil)

    let length = try container.decode(LengthEncodedInteger.self)
    let stringData = try container.decode(length: Int(length.value))
    guard let string = String(data: Data(stringData), encoding: .utf8) else {
      throw BinaryDecodingError.dataCorrupted(.init(debugDescription:
        "Unable to create String representation of data"))
    }
    self.value = string
  }

@DevAndArtist
Copy link
Author

DevAndArtist commented Feb 23, 2019

That is great feedback, thank you for that. Judging from the implementation that I had a quick glance at I still think that we could extend Codable to operate on binary data in a way the whole community would prefer to and benefit from + we get some more language features for free.

I'm pretty sure if you would bring up this discussion to the official Swift forums, together with the community we could shape a great proposal to extend that area of Swift and avoid possible bottlenecks, because if this would go into stdlib then you would have even more ways to implement certain things at your disposal since there you can have more compiler support it required to avoid performance penalties.

Such an extension will also light up some discussion about Data, because it's not part of stdlib and if we can have a superior type for working with binary data. (I tapped myself so many times with the fact that Data can be a slice of the original Data instance.)

With all that maybe we would also see more extensions of the stdlib types to provide seamless support to work with binary data. Wouldn't that be great?

That said, your module is not the first that is trying to solve these things in a similar fashion. And since all these solutions kind of overlap (partly) with Codable, maybe it's a great signal to push a general solution and establish a standard in the language itself. :)

@jverkoey
Copy link
Owner

Love it :) I’ll bring this up in the forums and perhaps take a stab at an evolution doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants