-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Curiosity #11
Comments
Solid question, and one I'll answer here but likely need to expand upon in the repo's docs. Here's the sequence of events that led me here: When I first built https://github.com/jverkoey/MySqlConnector/ I used a naive binary decoding implementation that used iterators of public struct LengthEncodedString {
public init?(data: Data, encoding: String.Encoding) throws {
// Empty data is not a length-encoded string.
if data.isEmpty {
return nil
}
let integer: LengthEncodedInteger
do {
guard let integerOrNil = try LengthEncodedInteger(data: data) else {
return nil
}
integer = integerOrNil
} catch let error {
if let lengthEncodedError = error as? LengthEncodedIntegerDecodingError {
switch lengthEncodedError {
case .unexpectedEndOfData(let expectedAtLeast):
throw LengthEncodedStringDecodingError.unexpectedEndOfData(expectedAtLeast: expectedAtLeast)
}
}
throw error
}
self.length = UInt64(integer.length) + UInt64(integer.value)
let remainingData = data[integer.length..<(integer.length + UInt(integer.value))]
if remainingData.count < integer.value {
throw LengthEncodedStringDecodingError.unexpectedEndOfData(expectedAtLeast: UInt(integer.value))
}
guard let string = String(data: remainingData, encoding: encoding) else {
throw LengthEncodedStringDecodingError.unableToCreateStringWithEncoding(encoding)
}
self.value = string
}
} This implementation was fast and effective, but managing type conversions, bounds checking, and making data iterators became a fairly repetitive pattern. Swift Codable came to mind as a possible improvement, so I began exploring it in jverkoey/MySqlClient#23. You can see a proof of concept in the first commit of that PR. In essence, I moved the data iterator into a custom Decoder implementation and updated my payloads to conform to Decodable: public struct LengthEncodedString: Codable {
public init(from decoder: Decoder) throws {
var container = try decoder.unkeyedContainer()
let length = try container.decode(LengthEncodedInteger.self)
self.length = UInt64(length.length) + UInt64(length.value)
let stringData = try (0..<length.value).map { _ in try container.decode(UInt8.self) }
self.value = String(data: Data(stringData), encoding: .utf8)!
}
Quite a bit simpler now, but in doing so I encountered a few concerns about Codable's applicability to binary data solutions, which I've outlined below. Swift Codable assumes complex external representations are dictionariesOne of the main benefits of Swift Codable is that you can get encoding and decoding of complex types for free. These for-free implementations rely on CodingKeys that must exist in some manner in the external representation. Binary data unfortunately does not always have a concept of a named key; at least not without completely parsing the data representation which defeats the purpose of the Codable interface. While Swift's default behavior can be hacked to our benefit by assuming that each property will be decoded in the order in which it was defined — Mike Ash took this approach — I prefer clearly debuggable code when working with binary formats. There are also enough quirks with binary formats that the assumption of Decodable primitives mapping to binary primitives can fall over pretty quickly (length-encoded strings being a good example). Aside: I do think there is potential in BinaryCodable to provide some for-free implementations of complex types; my thoughts are outlined here: #4. So in practice, binary representations written with Codable will almost always have to provide an explicit implementation anyway in order to "opt out" of the keyed external representation assumption. This wasn't a deal-breaker, it just meant binary representations wouldn't benefit from Codable's code generation for complex types (somewhat reducing the value of Codable). Swift Codable's primitives do not give access to underlying dataThis is what ended up being the deal-breaker for me. Let's look again at that length-encoded string implementation using Codable: public struct LengthEncodedString: Codable {
public init(from decoder: Decoder) throws {
var container = try decoder.unkeyedContainer()
let length = try container.decode(LengthEncodedInteger.self)
self.length = UInt64(length.length) + UInt64(length.value)
let stringData = try (0..<length.value).map { _ in try container.decode(UInt8.self) }
self.value = String(data: Data(stringData), encoding: .utf8)!
}
Particularly this line: let stringData = try (0..<length.value).map { _ in try container.decode(UInt8.self) } Swift Codable does not have a primitive of "arbitrary bytes of data", so we're forced to channel all byte encoding/decoding one UInt8 at a time. We could encode/decode one UInt64 at a time, but the implementation then needs to handle lengths that are not multiples of 8 gracefully. Either way, this is a substantial cpu bottleneck for larger blocks of data. Without a healthy way to work with arbitrary blocks of data, Codable's value dipped from "reasonable, given we don't get free code generation" to "negative, given there is now a significant performance penalty". Swift Codable does not encourage correctness by default for binary representationsThis is a minor point, but one I feel is worth mentioning because on the average I feel Swift is a wonderful language directly because it encourages correctness. Swift Codable has three container types: keyed, unkeyed, and singleValue. Binary data does not necessarily benefit from these three layers of abstraction, so in practice all of my binary types were using unkeyed containers to hack the external representation as an array of bytes (using the UInt8 primitive). As such, unkeyed containers are in essence the only "correct" container in Codable for complex binary data, so the availability of incorrect containers was a source of tension for me as I was implementing more complex types. BinaryCodable's solutions to the above concernsBinaryCodable takes inspiration from Swift Codable, but makes a few distinct architectural decisions that optimize it for working with binary data:
And finally, this is the BinaryCodable version of the LengthEncodedString implementation: struct LengthEncodedString: BinaryDecodable {
init(from decoder: BinaryDecoder) throws {
var container = decoder.sequentialContainer(maxLength: nil)
let length = try container.decode(LengthEncodedInteger.self)
let stringData = try container.decode(length: Int(length.value))
guard let string = String(data: Data(stringData), encoding: .utf8) else {
throw BinaryDecodingError.dataCorrupted(.init(debugDescription:
"Unable to create String representation of data"))
}
self.value = string
}
|
That is great feedback, thank you for that. Judging from the implementation that I had a quick glance at I still think that we could extend I'm pretty sure if you would bring up this discussion to the official Swift forums, together with the community we could shape a great proposal to extend that area of Swift and avoid possible bottlenecks, because if this would go into stdlib then you would have even more ways to implement certain things at your disposal since there you can have more compiler support it required to avoid performance penalties. Such an extension will also light up some discussion about With all that maybe we would also see more extensions of the stdlib types to provide seamless support to work with binary data. Wouldn't that be great? That said, your module is not the first that is trying to solve these things in a similar fashion. And since all these solutions kind of overlap (partly) with |
Love it :) I’ll bring this up in the forums and perhaps take a stab at an evolution doc. |
Hi @jverkoey, this is not an issue report but a question I'd like to ask. Why do you think we need an extra set of protocols to interact with binary data? I work with a bluetooth peripheral at my org and we created a naive implementation and mapping of the custom data layouts for our BLE API on the App side. My long term goal was it to explore custom decoder / encoder for
Codable
to unify the implementation by a battle tested functionality and also make use of code synthetisation as much as possible.I also would love if you could bring up this topic about binary data to the Swift forums. I think some community member would love to elaborate with you on the general problem. I also think this problem area should be solved generally in the stdlib, because one day Swift will likely enter the embedded region where this type of functionality will be indispensable.
The text was updated successfully, but these errors were encountered: