|
| 1 | +# DataSchema |
| 2 | + |
| 3 | +<!-- We def want a livebook for this. So much easier to explain. --> |
| 4 | + |
| 5 | +Data schemas are declarative descriptions of how to create a struct from some input data. You can set up different schemas to handle different kinds of input data. |
| 6 | + |
| 7 | +Let's think of creating a struct as taking some source data and turning it into the desired struct. To do this we need to know at least three things: |
| 8 | + |
| 9 | +1. The keys of the desired struct |
| 10 | +2. The types of the values for each of the keys |
| 11 | +3. Where / how to get the data for each value from the source data. |
| 12 | + |
| 13 | +Turning the source data into the correct type defined by the schema will often require casting, so to cater for that the type definitions are casting functions. Let's look at a simple field example |
| 14 | + |
| 15 | +```elixir |
| 16 | +field {:content, "text", &cast_string/1} |
| 17 | +# ^ ^ ^ |
| 18 | +# struct field | | |
| 19 | +# path to data in the source | |
| 20 | +# casting function |
| 21 | +``` |
| 22 | + |
| 23 | +This says in the source data there will be a field called `:text`. When creating a struct we should get the data under that field and pass it too `cast_string/1`. The result of that function will be put in the resultant struct under the key `:content`. |
| 24 | + |
| 25 | +There are 4 kinds of struct fields we could want: |
| 26 | + |
| 27 | +1. `field` - The value will be a casted value from the source data. |
| 28 | +2. `list_of` - The value will be a list of casted values created from the source data. |
| 29 | +3. `has_one` - The value will be created from a nested data schema. |
| 30 | +4. `aggregate` - The value will a casted value formed from multiple bits of data in the source. |
| 31 | + |
| 32 | +To see this better let's look at a very simple example. Assume our input data looks like this: |
| 33 | + |
| 34 | +```elixir |
| 35 | +source_data = %{ |
| 36 | + "content" => "This is a blog post", |
| 37 | + "comments" => [%{"text" => "This is a comment"},%{"text" => "This is another comment"}], |
| 38 | + "draft" => %{"content" => "This is a draft blog post"}, |
| 39 | + "date" => "2021-11-11", |
| 40 | + "time" => "14:00:00", |
| 41 | + "metadata" => %{ "rating" => 0} |
| 42 | +} |
| 43 | +``` |
| 44 | + |
| 45 | +And now let's assume the struct we wish to make is this one: |
| 46 | + |
| 47 | +```elixir |
| 48 | +%BlogPost{ |
| 49 | + content: "This is a blog post", |
| 50 | + comments: [%Comment{text: "This is a comment"}, %Comment{text: "This is another comment"}] |
| 51 | + draft: %DraftPost{content: "This is a draft blog post"}} |
| 52 | + post_datetime: ~N[2020-11-11 14:00:00] |
| 53 | +} |
| 54 | +``` |
| 55 | + |
| 56 | +As we mentioned before we want to be able to handle multiple different kinds of source data in different schemas. For each type of source data we want to be able to specify how you access the data for each field type, we do that by providing a data accessor module that implements the `DataAccessBehaviour` when we create the schema. |
| 57 | + |
| 58 | +When creating the struct DataSchema will call the relevant function for the field we are creating, passing it the source data and the path to the value(s) in the source. |
| 59 | + |
| 60 | +Given that the source data is a map, we can pass a simple MapAccessor: |
| 61 | + |
| 62 | +```elixir |
| 63 | +defmodule MapAccessor do |
| 64 | + @behaviour DataSchema.DataAccessBehaviour |
| 65 | + |
| 66 | + @impl true |
| 67 | + def field(data, field) do |
| 68 | + Map.get(data, field) |
| 69 | + end |
| 70 | + |
| 71 | + @impl true |
| 72 | + def list_of(data, field) do |
| 73 | + Map.get(data, field) |
| 74 | + end |
| 75 | + |
| 76 | + @impl true |
| 77 | + def has_one(data, field) do |
| 78 | + Map.get(data, field) |
| 79 | + end |
| 80 | + |
| 81 | + @impl true |
| 82 | + def aggregate(data, field) do |
| 83 | + Map.get(data, field) |
| 84 | + end |
| 85 | +end |
| 86 | +``` |
| 87 | + |
| 88 | +So now we can define the following data_schemas |
| 89 | + |
| 90 | +```elixir |
| 91 | +defmodule DraftPost do |
| 92 | + import DataSchema, only: [data_schema: 2] |
| 93 | + |
| 94 | + data_schema([ |
| 95 | + field: {:content, "content", DataSchema.String} |
| 96 | + ], MapAccessor) |
| 97 | +end |
| 98 | + |
| 99 | +defmodule Comment do |
| 100 | + import DataSchema, only: [data_schema: 2] |
| 101 | + |
| 102 | + data_schema([ |
| 103 | + field: {:text, "text", DataSchema.String} |
| 104 | + ], MapAccessor) |
| 105 | + |
| 106 | + def cast(data) do |
| 107 | + DataSchema.to_struct(data, __MODULE__) |
| 108 | + end |
| 109 | +end |
| 110 | + |
| 111 | +defmodule BlogPost do |
| 112 | + import DataSchema, only: [data_schema: 2] |
| 113 | + |
| 114 | + data_schema([ |
| 115 | + field: {:content, "content", DataSchema.String}, |
| 116 | + list_of: {:comments, "comments", Comment}, |
| 117 | + has_one: {:draft, "draft", DraftPost}, |
| 118 | + aggregate: {:post_datetime, %{date: "date", time: "time"}, &BlogPost.to_datetime/1}, |
| 119 | + ], MapAccessor) |
| 120 | + |
| 121 | + def to_datetime(%{date: date, time: time}) do |
| 122 | + date = Date.from_iso8601!(date) |
| 123 | + time = Time.from_iso8601!(time) |
| 124 | + {:ok, datetime} = NaiveDateTime.new(date, time) |
| 125 | + datetime |
| 126 | + end |
| 127 | +end |
| 128 | +``` |
| 129 | + |
| 130 | +But we can clean this up a bit with currying. Instead of passing `MapAccessor` every time we create a schema we can define a helper function like so: |
| 131 | + |
| 132 | +```elixir |
| 133 | +defmodule MapAccessor do |
| 134 | + ... |
| 135 | + defmacro map_schema(fields) do |
| 136 | + quote do |
| 137 | + require DataSchema |
| 138 | + DataSchema.data_schema(unquote(fields), MapAccessor) |
| 139 | + end |
| 140 | + end |
| 141 | + ... |
| 142 | +end |
| 143 | +``` |
| 144 | + |
| 145 | +Then change our schema definitions to look like this: |
| 146 | + |
| 147 | +```elixir |
| 148 | +defmodule DraftPost do |
| 149 | + import MapAccessor, only: [map_schema: 1] |
| 150 | + |
| 151 | + map_schema([ |
| 152 | + field: {:content, "content", DataSchema.String} |
| 153 | + ]) |
| 154 | +end |
| 155 | + |
| 156 | +defmodule Comment do |
| 157 | + import MapAccessor, only: [map_schema: 1] |
| 158 | + |
| 159 | + map_schema([ |
| 160 | + field: {:text, "text", DataSchema.String} |
| 161 | + ]) |
| 162 | + |
| 163 | + def cast(data) do |
| 164 | + DataSchema.to_struct(data, __MODULE__) |
| 165 | + end |
| 166 | +end |
| 167 | + |
| 168 | +defmodule BlogPost do |
| 169 | + import MapAccessor, only: [map_schema: 1] |
| 170 | + |
| 171 | + map_schema([ |
| 172 | + field: {:content, "content", DataSchema.String}, |
| 173 | + list_of: {:comments, "comments", Comment}, |
| 174 | + has_one: {:draft, "draft", DraftPost}, |
| 175 | + aggregate: {:post_datetime, %{date: "date", time: "time"}, &BlogPost.to_datetime/1}, |
| 176 | + ]) |
| 177 | + |
| 178 | + def to_datetime(%{date: date_string, time: time_string}) do |
| 179 | + date = Date.from_iso8601!(date_string) |
| 180 | + time = Time.from_iso8601!(time_string) |
| 181 | + {:ok, datetime} = NaiveDateTime.new(date, time) |
| 182 | + datetime |
| 183 | + end |
| 184 | +end |
| 185 | +``` |
| 186 | + |
| 187 | +Now we can create our struct from the source data: |
| 188 | + |
| 189 | +```elixir |
| 190 | +source_data = %{ |
| 191 | + "content" => "This is a blog post", |
| 192 | + "comments" => [%{"text" => "This is a comment"},%{"text" => "This is another comment"}], |
| 193 | + "draft" => %{"content" => "This is a draft blog post"}, |
| 194 | + "date" => "2021-11-11", |
| 195 | + "time" => "14:00:00", |
| 196 | + "metadata" => %{ "rating" => 0} |
| 197 | +} |
| 198 | + |
| 199 | +DataSchema.to_struct(source_data, BlogPost) |
| 200 | + |
| 201 | +# => %BlogPost{ |
| 202 | +# content: "This is a blog post", |
| 203 | +# comments: [%Comment{text: "This is a comment"}, %Comment{text: "This is another comment"}] |
| 204 | +# draft: %DraftPost{content: "This is a draft blog post"}} |
| 205 | +# post_datetime: ~N[2020-11-11 14:00:00] |
| 206 | +#} |
| 207 | +``` |
| 208 | + |
| 209 | +### XML Schemas |
| 210 | + |
| 211 | +Now let's imagine instead that our source data was XML. What would it require to enable that? First a new Xpath data accessor: |
| 212 | + |
| 213 | +```elixir |
| 214 | +defmodule XpathAccessor do |
| 215 | + @behaviour DataSchema.DataAccessBehaviour |
| 216 | + import SweetXml, only: [sigil_x: 2] |
| 217 | + |
| 218 | + defmacro xpath_schema(fields) do |
| 219 | + quote do |
| 220 | + require DataSchema |
| 221 | + DataSchema.data_schema(unquote(fields), XpathAccessor) |
| 222 | + end |
| 223 | + end |
| 224 | + |
| 225 | + @impl true |
| 226 | + def field(data, path) do |
| 227 | + SweetXml.xpath(data, ~x"#{path}"s) |
| 228 | + end |
| 229 | + |
| 230 | + @impl true |
| 231 | + def list_of(data, path) do |
| 232 | + SweetXml.xpath(data, ~x"#{path}"l) |
| 233 | + end |
| 234 | + |
| 235 | + @impl true |
| 236 | + def has_one(data, path) do |
| 237 | + SweetXml.xpath(data, ~x"#{path}") |
| 238 | + end |
| 239 | + |
| 240 | + @impl true |
| 241 | + def aggregate(data, path) do |
| 242 | + SweetXml.xpath(data, ~x"#{path}"s) |
| 243 | + end |
| 244 | +end |
| 245 | +``` |
| 246 | + |
| 247 | +Our source data looks like this: |
| 248 | + |
| 249 | +```elixir |
| 250 | +source_data = """ |
| 251 | +<Blog date="2021-11-11" time="14:00:00"> |
| 252 | + <Content>This is a blog post</Content> |
| 253 | + <Comments> |
| 254 | + <Comment>This is a comment</Comment> |
| 255 | + <Comment>This is another comment</Comment> |
| 256 | + </Comments> |
| 257 | + <Draft> |
| 258 | + <Content>This is a draft blog post</Content> |
| 259 | + </Draft> |
| 260 | +</Blog> |
| 261 | +""" |
| 262 | +``` |
| 263 | + |
| 264 | +Let's define our schemas like so: |
| 265 | + |
| 266 | +```elixir |
| 267 | +defmodule DraftPost do |
| 268 | + import XpathAccessor, only: [xpath_schema: 1] |
| 269 | + |
| 270 | + xpath_schema([ |
| 271 | + field: {:content, "./Content/text()", DataSchema.String} |
| 272 | + ]) |
| 273 | +end |
| 274 | + |
| 275 | +defmodule Comment do |
| 276 | + import XpathAccessor, only: [xpath_schema: 1] |
| 277 | + |
| 278 | + xpath_schema([ |
| 279 | + field: {:text, "./text()", DataSchema.String} |
| 280 | + ]) |
| 281 | + |
| 282 | + def cast(data) do |
| 283 | + DataSchema.to_struct(data, __MODULE__) |
| 284 | + end |
| 285 | +end |
| 286 | + |
| 287 | +defmodule BlogPost do |
| 288 | + import XpathAccessor, only: [xpath_schema: 1] |
| 289 | + |
| 290 | + xpath_schema([ |
| 291 | + field: {:content, "/Blog/Content/text()", DataSchema.String}, |
| 292 | + list_of: {:comments, "//Comment", Comment}, |
| 293 | + has_one: {:draft, "/Blog/Draft", DraftPost}, |
| 294 | + aggregate: {:post_datetime, %{date: "/Blog/@date", time: "/Blog/@time"}, &BlogPost.to_datetime/1}, |
| 295 | + ]) |
| 296 | + |
| 297 | + def to_datetime(%{date: date_string, time: time_string}) do |
| 298 | + date = Date.from_iso8601!(date_string) |
| 299 | + time = Time.from_iso8601!(time_string) |
| 300 | + {:ok, datetime} = NaiveDateTime.new(date, time) |
| 301 | + datetime |
| 302 | + end |
| 303 | +end |
| 304 | +``` |
| 305 | + |
| 306 | +And now we can transform: |
| 307 | + |
| 308 | +```elixir |
| 309 | +DataSchema.to_struct(source_data, BlogPost) |
| 310 | +# => %BlogPost{ |
| 311 | +# comments: [ |
| 312 | +# %Comment{text: "This is a comment"}, |
| 313 | +# %Comment{text: "This is another comment"} |
| 314 | +# ], |
| 315 | +# content: "This is a blog post", |
| 316 | +# draft: %DraftPost{content: "This is a draft blog post"}, |
| 317 | +# post_datetime: ~N[2021-11-11 14:00:00] |
| 318 | +# } |
| 319 | +``` |
| 320 | + |
| 321 | +### JSONPath Schemas |
| 322 | + |
| 323 | +This is left as an exercise for the reader but hopefully you can see how you could extend this idea to allow for json data and JSONPaths pointing to the data in the schemas. |
| 324 | + |
| 325 | +## Installation |
| 326 | + |
| 327 | +[available in Hex](https://hex.pm/docs/publish), the package can be installed |
| 328 | +by adding `data_schema` to your list of dependencies in `mix.exs`: |
| 329 | + |
| 330 | +```elixir |
| 331 | +def deps do |
| 332 | + [ |
| 333 | + {:data_schema, "~> 0.1.0"} |
| 334 | + ] |
| 335 | +end |
| 336 | +``` |
| 337 | + |
| 338 | +Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc) |
| 339 | +and published on [HexDocs](https://hexdocs.pm). Once published, the docs can |
| 340 | +be found at [https://hexdocs.pm/data_schema](https://hexdocs.pm/data_schema). |
| 341 | + |
0 commit comments