Skip to content

Commit ad51626

Browse files
committed
Enables DataSchema v1
This allows for different source data types. Adds examples for XML and maps.
0 parents  commit ad51626

12 files changed

+885
-0
lines changed

.formatter.exs

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Used by "mix format"
2+
[
3+
inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"]
4+
]

.gitignore

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# The directory Mix will write compiled artifacts to.
2+
/_build/
3+
4+
# If you run "mix test --cover", coverage assets end up here.
5+
/cover/
6+
7+
# The directory Mix downloads your dependencies sources to.
8+
/deps/
9+
10+
# Where third-party dependencies like ExDoc output generated docs.
11+
/doc/
12+
13+
# Ignore .fetch files in case you like to edit your project deps locally.
14+
/.fetch
15+
16+
# If the VM crashes, it generates a dump, let's ignore it too.
17+
erl_crash.dump
18+
19+
# Also ignore archive artifacts (built via "mix archive.build").
20+
*.ez
21+
22+
# Ignore package tarball (built via "mix hex.build").
23+
data_schema-*.tar
24+
25+
# Temporary files, for example, from tests.
26+
/tmp/

README.md

+341
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,341 @@
1+
# DataSchema
2+
3+
<!-- We def want a livebook for this. So much easier to explain. -->
4+
5+
Data schemas are declarative descriptions of how to create a struct from some input data. You can set up different schemas to handle different kinds of input data.
6+
7+
Let's think of creating a struct as taking some source data and turning it into the desired struct. To do this we need to know at least three things:
8+
9+
1. The keys of the desired struct
10+
2. The types of the values for each of the keys
11+
3. Where / how to get the data for each value from the source data.
12+
13+
Turning the source data into the correct type defined by the schema will often require casting, so to cater for that the type definitions are casting functions. Let's look at a simple field example
14+
15+
```elixir
16+
field {:content, "text", &cast_string/1}
17+
# ^ ^ ^
18+
# struct field | |
19+
# path to data in the source |
20+
# casting function
21+
```
22+
23+
This says in the source data there will be a field called `:text`. When creating a struct we should get the data under that field and pass it too `cast_string/1`. The result of that function will be put in the resultant struct under the key `:content`.
24+
25+
There are 4 kinds of struct fields we could want:
26+
27+
1. `field` - The value will be a casted value from the source data.
28+
2. `list_of` - The value will be a list of casted values created from the source data.
29+
3. `has_one` - The value will be created from a nested data schema.
30+
4. `aggregate` - The value will a casted value formed from multiple bits of data in the source.
31+
32+
To see this better let's look at a very simple example. Assume our input data looks like this:
33+
34+
```elixir
35+
source_data = %{
36+
"content" => "This is a blog post",
37+
"comments" => [%{"text" => "This is a comment"},%{"text" => "This is another comment"}],
38+
"draft" => %{"content" => "This is a draft blog post"},
39+
"date" => "2021-11-11",
40+
"time" => "14:00:00",
41+
"metadata" => %{ "rating" => 0}
42+
}
43+
```
44+
45+
And now let's assume the struct we wish to make is this one:
46+
47+
```elixir
48+
%BlogPost{
49+
content: "This is a blog post",
50+
comments: [%Comment{text: "This is a comment"}, %Comment{text: "This is another comment"}]
51+
draft: %DraftPost{content: "This is a draft blog post"}}
52+
post_datetime: ~N[2020-11-11 14:00:00]
53+
}
54+
```
55+
56+
As we mentioned before we want to be able to handle multiple different kinds of source data in different schemas. For each type of source data we want to be able to specify how you access the data for each field type, we do that by providing a data accessor module that implements the `DataAccessBehaviour` when we create the schema.
57+
58+
When creating the struct DataSchema will call the relevant function for the field we are creating, passing it the source data and the path to the value(s) in the source.
59+
60+
Given that the source data is a map, we can pass a simple MapAccessor:
61+
62+
```elixir
63+
defmodule MapAccessor do
64+
@behaviour DataSchema.DataAccessBehaviour
65+
66+
@impl true
67+
def field(data, field) do
68+
Map.get(data, field)
69+
end
70+
71+
@impl true
72+
def list_of(data, field) do
73+
Map.get(data, field)
74+
end
75+
76+
@impl true
77+
def has_one(data, field) do
78+
Map.get(data, field)
79+
end
80+
81+
@impl true
82+
def aggregate(data, field) do
83+
Map.get(data, field)
84+
end
85+
end
86+
```
87+
88+
So now we can define the following data_schemas
89+
90+
```elixir
91+
defmodule DraftPost do
92+
import DataSchema, only: [data_schema: 2]
93+
94+
data_schema([
95+
field: {:content, "content", DataSchema.String}
96+
], MapAccessor)
97+
end
98+
99+
defmodule Comment do
100+
import DataSchema, only: [data_schema: 2]
101+
102+
data_schema([
103+
field: {:text, "text", DataSchema.String}
104+
], MapAccessor)
105+
106+
def cast(data) do
107+
DataSchema.to_struct(data, __MODULE__)
108+
end
109+
end
110+
111+
defmodule BlogPost do
112+
import DataSchema, only: [data_schema: 2]
113+
114+
data_schema([
115+
field: {:content, "content", DataSchema.String},
116+
list_of: {:comments, "comments", Comment},
117+
has_one: {:draft, "draft", DraftPost},
118+
aggregate: {:post_datetime, %{date: "date", time: "time"}, &BlogPost.to_datetime/1},
119+
], MapAccessor)
120+
121+
def to_datetime(%{date: date, time: time}) do
122+
date = Date.from_iso8601!(date)
123+
time = Time.from_iso8601!(time)
124+
{:ok, datetime} = NaiveDateTime.new(date, time)
125+
datetime
126+
end
127+
end
128+
```
129+
130+
But we can clean this up a bit with currying. Instead of passing `MapAccessor` every time we create a schema we can define a helper function like so:
131+
132+
```elixir
133+
defmodule MapAccessor do
134+
...
135+
defmacro map_schema(fields) do
136+
quote do
137+
require DataSchema
138+
DataSchema.data_schema(unquote(fields), MapAccessor)
139+
end
140+
end
141+
...
142+
end
143+
```
144+
145+
Then change our schema definitions to look like this:
146+
147+
```elixir
148+
defmodule DraftPost do
149+
import MapAccessor, only: [map_schema: 1]
150+
151+
map_schema([
152+
field: {:content, "content", DataSchema.String}
153+
])
154+
end
155+
156+
defmodule Comment do
157+
import MapAccessor, only: [map_schema: 1]
158+
159+
map_schema([
160+
field: {:text, "text", DataSchema.String}
161+
])
162+
163+
def cast(data) do
164+
DataSchema.to_struct(data, __MODULE__)
165+
end
166+
end
167+
168+
defmodule BlogPost do
169+
import MapAccessor, only: [map_schema: 1]
170+
171+
map_schema([
172+
field: {:content, "content", DataSchema.String},
173+
list_of: {:comments, "comments", Comment},
174+
has_one: {:draft, "draft", DraftPost},
175+
aggregate: {:post_datetime, %{date: "date", time: "time"}, &BlogPost.to_datetime/1},
176+
])
177+
178+
def to_datetime(%{date: date_string, time: time_string}) do
179+
date = Date.from_iso8601!(date_string)
180+
time = Time.from_iso8601!(time_string)
181+
{:ok, datetime} = NaiveDateTime.new(date, time)
182+
datetime
183+
end
184+
end
185+
```
186+
187+
Now we can create our struct from the source data:
188+
189+
```elixir
190+
source_data = %{
191+
"content" => "This is a blog post",
192+
"comments" => [%{"text" => "This is a comment"},%{"text" => "This is another comment"}],
193+
"draft" => %{"content" => "This is a draft blog post"},
194+
"date" => "2021-11-11",
195+
"time" => "14:00:00",
196+
"metadata" => %{ "rating" => 0}
197+
}
198+
199+
DataSchema.to_struct(source_data, BlogPost)
200+
201+
# => %BlogPost{
202+
# content: "This is a blog post",
203+
# comments: [%Comment{text: "This is a comment"}, %Comment{text: "This is another comment"}]
204+
# draft: %DraftPost{content: "This is a draft blog post"}}
205+
# post_datetime: ~N[2020-11-11 14:00:00]
206+
#}
207+
```
208+
209+
### XML Schemas
210+
211+
Now let's imagine instead that our source data was XML. What would it require to enable that? First a new Xpath data accessor:
212+
213+
```elixir
214+
defmodule XpathAccessor do
215+
@behaviour DataSchema.DataAccessBehaviour
216+
import SweetXml, only: [sigil_x: 2]
217+
218+
defmacro xpath_schema(fields) do
219+
quote do
220+
require DataSchema
221+
DataSchema.data_schema(unquote(fields), XpathAccessor)
222+
end
223+
end
224+
225+
@impl true
226+
def field(data, path) do
227+
SweetXml.xpath(data, ~x"#{path}"s)
228+
end
229+
230+
@impl true
231+
def list_of(data, path) do
232+
SweetXml.xpath(data, ~x"#{path}"l)
233+
end
234+
235+
@impl true
236+
def has_one(data, path) do
237+
SweetXml.xpath(data, ~x"#{path}")
238+
end
239+
240+
@impl true
241+
def aggregate(data, path) do
242+
SweetXml.xpath(data, ~x"#{path}"s)
243+
end
244+
end
245+
```
246+
247+
Our source data looks like this:
248+
249+
```elixir
250+
source_data = """
251+
<Blog date="2021-11-11" time="14:00:00">
252+
<Content>This is a blog post</Content>
253+
<Comments>
254+
<Comment>This is a comment</Comment>
255+
<Comment>This is another comment</Comment>
256+
</Comments>
257+
<Draft>
258+
<Content>This is a draft blog post</Content>
259+
</Draft>
260+
</Blog>
261+
"""
262+
```
263+
264+
Let's define our schemas like so:
265+
266+
```elixir
267+
defmodule DraftPost do
268+
import XpathAccessor, only: [xpath_schema: 1]
269+
270+
xpath_schema([
271+
field: {:content, "./Content/text()", DataSchema.String}
272+
])
273+
end
274+
275+
defmodule Comment do
276+
import XpathAccessor, only: [xpath_schema: 1]
277+
278+
xpath_schema([
279+
field: {:text, "./text()", DataSchema.String}
280+
])
281+
282+
def cast(data) do
283+
DataSchema.to_struct(data, __MODULE__)
284+
end
285+
end
286+
287+
defmodule BlogPost do
288+
import XpathAccessor, only: [xpath_schema: 1]
289+
290+
xpath_schema([
291+
field: {:content, "/Blog/Content/text()", DataSchema.String},
292+
list_of: {:comments, "//Comment", Comment},
293+
has_one: {:draft, "/Blog/Draft", DraftPost},
294+
aggregate: {:post_datetime, %{date: "/Blog/@date", time: "/Blog/@time"}, &BlogPost.to_datetime/1},
295+
])
296+
297+
def to_datetime(%{date: date_string, time: time_string}) do
298+
date = Date.from_iso8601!(date_string)
299+
time = Time.from_iso8601!(time_string)
300+
{:ok, datetime} = NaiveDateTime.new(date, time)
301+
datetime
302+
end
303+
end
304+
```
305+
306+
And now we can transform:
307+
308+
```elixir
309+
DataSchema.to_struct(source_data, BlogPost)
310+
# => %BlogPost{
311+
# comments: [
312+
# %Comment{text: "This is a comment"},
313+
# %Comment{text: "This is another comment"}
314+
# ],
315+
# content: "This is a blog post",
316+
# draft: %DraftPost{content: "This is a draft blog post"},
317+
# post_datetime: ~N[2021-11-11 14:00:00]
318+
# }
319+
```
320+
321+
### JSONPath Schemas
322+
323+
This is left as an exercise for the reader but hopefully you can see how you could extend this idea to allow for json data and JSONPaths pointing to the data in the schemas.
324+
325+
## Installation
326+
327+
[available in Hex](https://hex.pm/docs/publish), the package can be installed
328+
by adding `data_schema` to your list of dependencies in `mix.exs`:
329+
330+
```elixir
331+
def deps do
332+
[
333+
{:data_schema, "~> 0.1.0"}
334+
]
335+
end
336+
```
337+
338+
Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)
339+
and published on [HexDocs](https://hexdocs.pm). Once published, the docs can
340+
be found at [https://hexdocs.pm/data_schema](https://hexdocs.pm/data_schema).
341+

lib/data_access_behaviour.ex

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
defmodule DataSchema.DataAccessBehaviour do
2+
@moduledoc """
3+
Defines how DataSchema should access data for each given field type.
4+
"""
5+
6+
@callback field(any(), any()) :: any()
7+
@callback list_of(any(), any()) :: any()
8+
@callback has_one(any(), any()) :: any()
9+
@callback aggregate(any(), any()) :: any()
10+
end

0 commit comments

Comments
 (0)