VisualChannel #11
Replies: 8 comments 2 replies
-
Stepping back and looking at what the inspirations do: Observable's Plot Bar constrains the input for Bar to only accept categorical or ordinal data, and the use case where I was noodling on how to represent a floating point value was explicitly redirected to a Rect type that accepts quantitative values on both axis. Still binned, mind you, just the range of each bin being used to provide a lower left and right corner for the rect, where a singular, unique value represents a bar. In comparison, Vega-lite's Bar leans more heavily into describing transformations for the data, and having explicit paths to interpret. In particular, specifying an integer and step to "bin" the values explicitly, or an aggregate value "by month", with the month becoming the category that each bar represents. I'm inclined to lean more to the constraint side (how Observable presents its API) rather than trying to pull in additional functionality with extensive transformation engines being declared as a part of the visualization (Vega-lite's mechanism), wanting to lean on developers doing relevant transformations in Swift using the functional aspects of the language, and constraining this library to presenting the resulting visualizations. That said, aggregate values and histograms in particular are something that I personally want to display. In the case of a bar mark, the Vega-lite is heavily focused on using it on the result of binned data, where you declare the binning process in the Bar diagram, where observable expects you to have done whatever binning work up front, and then maps the bins into explicit categories (possibly already sorted) to display. My initial sense is that having a combined transform & display concept in the declaration makes it more complex to understand and use. |
Beta Was this translation helpful? Give feedback.
-
I've stubbed out VisualChannel a bit further, working off the idea that we want this to be a declarative mapping. The initial case I stubbed into place held the idea that VisualChannel would have a reference back to the
It's pretty clear that while this is great, it's not the only means of referencing the data we want to use to apply to a visual property - and in particular, we'd also like to be able to provide a static value in the declaration for the value of a property - effectively "hard coding" that value. I debated extracting this into a protocol, but quickly realized in trying that we still need all three properties if the function signature for writing the data was to remain the same. So rather than making a PropertyVisualChannel and a ConstantVisualChannel that had slightly different signatures for when we wanted to map the value, I cobbled a quick enumeration to encapsulate a "kind of" data and used optionals within the struct, and switch off the enumeration to know which optional to use. This pattern will be terrible if we have a lot of different versions, but might hold up OK for now. The stub so far looks like: public struct VisualChannel<MarkType: Mark, DataType, PropertyType: TypeOfVisualProperty> {
let markProperty: WritableKeyPath<MarkType, PropertyType>
let dataProperty: KeyPath<DataType, PropertyType>?
let constantValue: PropertyType?
let kindOfChannel: RefOfConstant
//var scale: Scale?
// something like `VisualChannel(\BarMark.width, \.node)` or `VisualChannel(y, 13)`
// maybe `VisualChannel(\.width, from: \.name)` - does the `from: ` add meaningful semantic context?
public init(_ markProperty: WritableKeyPath<MarkType, PropertyType>,
from dataProperty: KeyPath<DataType, PropertyType>) {
self.markProperty = markProperty
self.kindOfChannel = .reference
self.dataProperty = dataProperty
self.constantValue = nil
// self.scale = LinearScale(domain: 0...1)
// We need the at least the domain to create it - so we need to know the range of values
// before we can instantiate a scale if it's not explicitly declared
}
// something like `VisualChannel(\BarMark.width, \.node)` or `VisualChannel(y, 13)`
// maybe `VisualChannel(\.width, from: \.name)` - does the `from: ` add meaningful semantic context?
public init(_ markProperty: WritableKeyPath<MarkType, PropertyType>,
value: PropertyType) {
self.markProperty = markProperty
self.kindOfChannel = .constant
self.constantValue = value
self.dataProperty = nil
// self.scale = LinearScale(domain: 0...1)
// We need the at least the domain to create it - so we need to know the range of values
// before we can instantiate a scale if it's not explicitly declared
}
func writeScaledValue(d: DataType, m: inout MarkType) {
let valueFromData: PropertyType
switch kindOfChannel {
case .reference:
guard let dataProperty = self.dataProperty else {
preconditionFailure("keypath for a reference visual channel was null")
}
valueFromData = d[keyPath: dataProperty]
case .constant:
guard let constantValue = constantValue else {
preconditionFailure("keypath for a reference visual channel was null")
}
valueFromData = constantValue
}
// scale the value here...
m[keyPath: self.markProperty] = valueFromData
}
} Then I ran into the tricky bit. How I want to use these is by collecting a set of them together - each with a different concrete type - and iterating over them within @resultBuilder
struct MarkBuilder<MarkType: Mark> {
static func buildBlock(_ components: VisualChannel...) -> [VisualChannel] {
return []
}
} The Swift type system has a bit of a hissy fit here - because while the type of the data and the type of That seemed to indicate that what we need is at least a partially type erased VisualChannel so that we could collect them into an array and process against them when we're processing the developer's DSL input. It can be a generic type - but the generic types need to all be identical. I dug into how to do type erasure, and there's two patterns in use. The simpler pattern uses an abstract class with closures, capturing the closures from the specific type within it in order to fulfill the constraints of the protocol. But this pattern is limited to where you're accepting or returning non-generic types from the protocol. There's a couple examples of this simpler pattern written about in blog posts (Understanding type erasure in Swift by Donny Wals and Type erasure using closures in Swift by John Sundell). Unfortunately, what we need doesn't fit within those constraints - and while they talk about "how Sequence does it with AnySequence" - it isn't the same technique. Paul Hudson, however, has two articles walking through the problem space and showing the more complex technique within two articles in Hacking With Swift+ Existentials and Type Erasure - part 1 and Existentials and Type Erasure - part 2 - sorry, both of these are behind a paywall, but his explanation is brilliant. I did also find a free article on this technique: Type Erasure in Swift by Thibault Wittemberg that covers the same pattern and technique, but I found Paul's article to be the easier to fully understand. The gist of the type erasure pattern leverages both classes and generic structs - and it's how AnySequence is implemented within the Swift standard library. Reading the standard library raw source is double, but there is a lot of additional complexity - and Sequence itself is a complex protocol - so the ~2400 lines are a bit of wade through. The gist of this more complex pattern is a multi-step process:
This (much more complex) pattern allows us to have protocol methods that accept, or return, generic types as a part of the protocol, where the simpler "type erasure with closures" mechanism doesn't. So next steps in the implementation are to make those various types so that I can return an (The work in progress is set up as pull request #9) Oh - found another article that describes the type erasure process: Breaking Down Type Erasure in Swift by Robert Edwards of Big Nerd Ranch. It's also an excellent walk-through of the "how to do do a type erasure technique". |
Beta Was this translation helpful? Give feedback.
-
Using the result builder structure to create a Mark, while handing the specific types in flight, is proving to be quite a challenge. I'm wondering if I'm not framing the problem correctly, or perhaps just looking at it from a slightly of angle, in order to manage lining up and carrying the relevant types and using the same through result builders. The swift 5.7 development branch has an evolution of result builders that could make this notably easier: buildPartialBlock - it's implemented and under active review in Swift Evolution as I'm trying to work through how to set this up, so I'm trying not to dig too far into it. There's a pretty amazing example of using it to create a "marble diagram"-like text validation scheme that Swift Async Algorithms is using. As a side note, I found a gist that includes a variety of type erasure techniques and looks at them from a performance perspective (by Dave Abrahams) - linked from a Swift forum post on efficient type erasure. While I was digging around for how others have tackled this sort of effort, I found the short discussion titled best ways to get type information back after its been erase. In there, Jonathan G. mentions that defining your API first in terms of the actions is takes is the best way to go forward, so that you can capture the type information that you need to preserve while you're doing the erasure in order to later restore it when you need it. |
Beta Was this translation helpful? Give feedback.
-
I enabled some basic type erasure for VisualChannel, creating an "AnyVisualChannel" that erased the type of the property that it maps, which leaves VisualChannel still generic over the type of Mark, and the type of input data. This pattern aligned for storing a specific type within a mark that's generic over a specific type of data input. After reading more into type erasure and working through the problem from a "what it does" perspective (trying to frame the problem not as nouns - such as a mapping link - but as verbs - "reads from something, writes to something"), I'm wondering if I erased the one piece that I should have been preserving in a type-erasure construct. The reason to reach for type erasure is to make the types being handed around more "dynamic" while allowing them to be consistently stored, iterated on, processed, etc. The gist of the advice I've seen in type erasure is that if you're using it, grab the specific pieces of type that you're erasing and store them with the erased type, so that you can leverage them again when you need it. And when you break down the mapping into the "actions it does" - it breaks down into "read the value of a property from some data" and "write a value to a property". The piece that ties those two together is the kind of property. This runs into another potential trouble area with using result builders to structure the chart: required mappings or values for specific parts of the chart. Since a One way to handle this is to not show any errors, but also not show any marks, if one of the required fields isn't included within the template of mapping visual properties. The idea being that something like the following results in nothing being drawn: Chart {
Dot(data) {
VisualChannel(\.x, \.age)
}
} That pattern seems hostile to learning how to use the charting though, in particular - no information being returned to the developer writing the pattern saying "Hey, you're missing something mapping to the One solution to this is to add the required properties to the initializer that's used within the result builder declaration - so a developer declaration with Chart {
Dot(data, x: VisualChannel(\.age), y: VisualChannel(\.height) {
...VisualChannel(\.shape, \.category)... // optional channels within the build block
}
} By specifying the property that it applies to in an initializer, the code applying the mapping already has the part that links the "write to this property" information - so it really only needs the "read this kind of property from that data" element. Any optional visual channel would still need the property that it "writes back to" - and this seems to imply that the composition of what the mapping does is better broken down into two specific pieces - one that "reads from", another that "writes to", based on the actions they take. When we're creating the individual marks, we ultimately need a way to invoke something a get a value - and if we leverage a type erased thing that has the internal bits of type encapsulated and erased, the signature for this gets super simple. For example, the Jumping back for a moment to the issue of required vs. optional parameters for a As a result, the acceptable constraint (baring any creative way to hand feedback to the developer through the compiler) seems to be "Never throw an error while building a template within a result builder DSL structure". One of the techniques that was highlighted in Becca's talk on DSLs was leveraging the specific types being handed back along with That talk highlighted that we could use the position within the list of options to accept different types at different locations, and we could implicitly expect |
Beta Was this translation helpful? Give feedback.
-
While the VisualChannel per explicit parameter can be captured with a concrete type, it makes it slightly more awkward to have additional (optional) visual channels in a general bucket within the The idea being that defining a Chart could look something like: Chart {
Dot(data: [],
x: VisualChannel(\.range)
y: VisualChannel(\.latency).scale(LogScale(0.001,10))
)
.xAxis()
.yAxis()
} We could alternately for-go that pattern, instead having every property that can be overridden take a VisualChannel and have default values be referenced and overwritten by one of those, versus having a set of flat defaults and then iterating over all the data and applying any of the user-specified visual channels to replace those defaults only when they're specified - the idea being that only when you defined something would it take additional computation to make that mark decision, where many of the defaults would just be fast-path and referencing pre-set data. In a small chart, this isn't likely to be a noticeable difference, but a chart of 100,000 elements (or more) might be a very different story. |
Beta Was this translation helpful? Give feedback.
-
For the purposes of considering a specific kind of measurement to be included - potentially for formatting tick values along an axis, or highlighting a value, I was noodling the idea of having an extension on VisualChannel:
The idea being that the raw Int, Float, Double, etc data could be explicitly stated at display time to be "this is a distance in meters" or such. In practice, it would mean enabling an additional value transformation to Measurement with a specific unit being applied. I don't think much current data is already using this technique based on my own examples, but it's worth at least considering how to handle the situation where that's already the case. And finally, if the domain of the scale isn't near the base unit of measure, especially for SI units, then it might make sense to have implicit conversation to the SI unit that's relevant to the exponential factor of the domain, or the ability to allow a used to explicitly say "display this as unit X - such as |
Beta Was this translation helpful? Give feedback.
-
Development Log: Having pushed forward the scale implementations to support discrete and continuous scales that can be incrementally configured, there's enough basic functionality there to push forward on enabling Visual Channels again in order to build up values for a mark. I've tried to reset my head around type-constraints for the relevant information, trying to avoid specific type-erasure and managing where that type is defined such that it makes sense from the higher level declarative concepts from Chart -> Mark -> VisualChannel. The Chart project is now tracking the |
Beta Was this translation helpful? Give feedback.
-
At the moment, I have different types for different kinds of output for visual channels - there's the continuous visual channel, which in the code is Additionally, it seems like I could handle most all of these with a single VisualChannel type that had some hidden generics buried within the initializer that noted that the input domain was discrete vs. quantitative (one of the floating point types or an integer cast over to floating point). I haven't tackled this yet, and depending on how I'm dealing with the properties inside (scales and such), it might require an "AnyVisualChannel" kind of type erasure to encapsulate the pieces I need - but I'm kind of hoping I can collapse this down to a single type called I wouldn't care all that much except these types are exposed in the declarative API, and |
Beta Was this translation helpful? Give feedback.
-
Visual Channel
The VisualChannel doesn't provide a value, but it provides a function that can be called on data that returns the value with a type appropriate to the mark's property.
The appropriate type is defined based on the mark.
How
Mark
ingests aVisual Channel
Each kind of mark (such as
bar
,dot
, andline
) has some common properties (fill
,color
,stroke
, and a few others) that we'll want available to use when drawing the individual symbols that the mark produces from data. Common marks frequently have a default value, or perhaps a default value that's overridden by an environment indicator (tint
, for example). If a mapping exists for one of these properties, then it should override a mark's default value.There are other properties (position,
x
andy
values, for adot
orline
mark) that can't have reasonable defaults, being required to be mapped in order to produce values.side note as I'm thinking this through, this pattern matches parameters in a function. So one way to express this might be an initializers for a specific Mark, linking the mapping to the property using a parameter in the initializer. Something akin to: BarMark.init(x: VisualChannel(...), y: VisualChannel(...))
To translate that into a result builder, the ordering of a result build doesn't provide the equivalent of "named parameters" - instead we get a list of parameters in a specific order. Attempting to assert that specific required parameters (such as
dot
positional parametersx
ory
) need to come first is kind of icky - or more specifically, likely to be easily misunderstood and mistaken.One means of potentially dealing with this is by including creating a visual channel such that it has a reference to the property of the Mark it's linking to, as well as a reference to the property of the data it's linking from. Using the result builder DSL style formatting, the result might look something like:
(where
\.x
represents anx
property on DotMark, and\.someDataProperty
represents thesomeDataProperty
on a an instance of ElementType that was provided as a list of values to the initializer)Or if used with a direct parameter format:
In this case
VisualChannel
would end up being generic over both the type ofMark
and the type of data included within a sequence.How
Mark
interprets different kinds of mapped dataA channel has a specific kind of data that the property requires.
For example, the
width
of a bar mark is expected to be either categorical (the specific category the mark represents), or ordinal (representing the count or position the mark represents).A categorical value will be compared against a list of all potential categories to derive a relative location, and the relative location is used to calculate the position for the width of the bar - either by it's center and deriving the width, or by a start and end pairing that represent the corners of the bar.
How the data is processed depends on the type of data that the mapping points to:
This example brings into question the idea the that how we want to treat the data in the mapping can be known entirely from the type of the data being mapped from. Or more specifically, that we might want to cast or transform the data so that it matches the intent of the mark's property. "Which bar does this data map into" is fundamentally categorizing from the data, where "what's the X position for this dot" is about getting an an explicit quantitative value.
So the intent of the data we're capturing (categorical, positional, etc) should perhaps be included or derived from the type of property for the respective
Mark
. One way to handle this would be to constrain the types of allowed input to match that intent (for example, only accepting a "String" input for the categorical input for "which bar is this"). The rather obvious downside is that forces the developer to do transformations on the data up front and adds cognitive load to using the mark. AnInt
type being provided for a bar width - there's an open question of "is this a positional index, or an identifier for a specific category?"If we have data that has [1,5] in the sequence what does this mean - does that mean we want to draw bars are position 1 and 5 and space them out, or are there two bars - category "5" and "1", that are next to each other?
And I realized there's an additional interesting challenge - what if the categorical data isn't unique? How should the mark handle that? For example, in the above scenario, what is the values are [1, 5, 1, 1] - so one appears three times - how is that represented? Do the associated values get summed, visually stacked, or dropped? Or do we have an error condition that an expected invariant isn't correct - that the categories aren't unique?
Beta Was this translation helpful? Give feedback.
All reactions