1
1
2
- # DiDa tutorial
2
+ # DistributedData tutorial
3
3
4
- The primary purpose of this tutorial is to get a basic grasp of the main ` DiDa `
4
+ The primary purpose of this tutorial is to get a basic grasp of the main ` DistributedData `
5
5
functions and methodology.
6
6
7
7
For starting up, let's create a few distributed workers and import the package:
8
8
9
9
``` julia
10
- julia> using Distributed, DiDa
10
+ julia> using Distributed, DistributedData
11
11
12
12
julia> addprocs (3 )
13
13
2 - element Array{Int64,1 }:
14
14
2
15
15
3
16
16
4
17
17
18
- julia> @everywhere using DiDa
18
+ julia> @everywhere using DistributedData
19
19
```
20
20
21
21
## Moving the data around
22
22
23
- In ` DiDa ` , the storage of distributed data is done in the "native" Julia way --
23
+ In ` DistributedData ` , the storage of distributed data is done in the "native" Julia way --
24
24
the data is stored in normal named variables. Each node holds its own data in
25
25
an arbitrary set of variables as "plain data"; content of these variables is
26
26
completely independent among nodes.
@@ -52,7 +52,7 @@ UndefValError: x not defined
52
52
…
53
53
```
54
54
55
- ` DiDa ` uses * quoting* to allow you to precisely specify the parts of the code
55
+ ` DistributedData ` uses * quoting* to allow you to precisely specify the parts of the code
56
56
that should be evaluated on the "main" Julia process (the one you interact
57
57
with), and the code that should be evaluated on the remote workers. Basically,
58
58
all quoted code is going to get to the workers without any evaluation; all
@@ -192,7 +192,7 @@ beneficial for implementing advanced parallel algorithms.
192
192
193
193
Remembering and managing the remote variable names and worker numbers is
194
194
extremely impractical, especially if you need to maintain multiple variables on
195
- various subsets of all available workers at once. ` DiDa ` defines a small
195
+ various subsets of all available workers at once. ` DistributedData ` defines a small
196
196
[ ` Dinfo ` ] ( @ref ) data structure that keeps that information for you. Many other
197
197
functions are able to work with ` Dinfo ` transparently, instead of the "raw"
198
198
symbols and worker lists.
@@ -354,7 +354,7 @@ julia> dmap(Vector(1:length(workers())),
354
354
355
355
## Persisting the data
356
356
357
- ` DiDa ` provides support for storing the loaded dataset in each worker's local
357
+ ` DistributedData ` provides support for storing the loaded dataset in each worker's local
358
358
storage. This is quite beneficial for saving sub-results and various artifacts
359
359
of the computation process for later use, without unnecessarily wasting
360
360
main memory.
@@ -378,9 +378,9 @@ significant overhead.
378
378
379
379
## Miscellaneous functions
380
380
381
- For convenience, ` DiDa ` also contains simple implementations of various common
381
+ For convenience, ` DistributedData ` also contains simple implementations of various common
382
382
utility operations for processing matrix data. These originated in
383
- flow-cytometry use-cases (which is what ` DiDa ` was originally built for), but
383
+ flow-cytometry use-cases (which is what ` DistributedData ` was originally built for), but
384
384
are applicable in many other areas of data analysis:
385
385
386
386
- [ ` dselect ` ] ( @ref ) reduces a matrix to several selected columns (in a
0 commit comments