Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #12

Merged
merged 14 commits into from
Jan 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ name: Documentation
on:
push:
branches:
- master
- develop
tags: '*'
pull_request:
release:
types: [published, created]

Expand Down
5 changes: 3 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name = "DiDa"
name = "DistributedData"
uuid = "f6a0035f-c5ac-4ad0-b410-ad102ced35df"
authors = ["Mirek Kratochvil <[email protected]>"]
authors = ["Mirek Kratochvil <[email protected]>",
"LCSB R3 team <[email protected]>"]
version = "0.1.0"

[deps]
Expand Down
18 changes: 6 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
# <img src="docs/src/assets/logo.svg" alt="DiDa.jl logo" height="32px"> DiDa.jl
# <img src="docs/src/assets/logo.svg" alt="DistributedData.jl logo" height="32px"> DistributedData.jl


| Build status | Documentation |
|:---:|:---:|
| ![CI](https://github.com/LCSB-BioCore/DiDa.jl/workflows/CI/badge.svg?branch=develop) | [![doc](https://img.shields.io/badge/docs-stable-blue)](https://lcsb-biocore.github.io/DiDa.jl/stable/) [![doc](https://img.shields.io/badge/docs-dev-blue)](https://lcsb-biocore.github.io/DiDa.jl/dev/) |
| ![CI](https://github.com/LCSB-BioCore/DistributedData.jl/workflows/CI/badge.svg?branch=develop) | [![doc](https://img.shields.io/badge/docs-stable-blue)](https://lcsb-biocore.github.io/DistributedData.jl/stable/) [![doc](https://img.shields.io/badge/docs-dev-blue)](https://lcsb-biocore.github.io/DistributedData.jl/dev/) |

Simple distributed data manipulation and processing routines for Julia.

This was originally developed for
[`GigaSOM.jl`](https://github.com/LCSB-BioCore/GigaSOM.jl); DiDa.jl package
[`GigaSOM.jl`](https://github.com/LCSB-BioCore/GigaSOM.jl); DistributedData.jl package
contains the separated-out lightweight distributed-processing framework that
was used in `GigaSOM.jl`.

## Why?

DiDa.jl provides a very simple, imperative and straightforward way to move your
DistributedData.jl provides a very simple, imperative and straightforward way to move your
data around a cluster of Julia processes created by the
[`Distributed`](https://docs.julialang.org/en/v1/stdlib/Distributed/) package,
and run computation on the distributed data pieces. The main aim of the package
Expand Down Expand Up @@ -43,14 +43,14 @@ same way, but takes the data back from the worker.
You can thus send some random array to a few distributed workers:

```julia
julia> using Distributed, DiDa
julia> using Distributed, DistributedData

julia> addprocs(2)
2-element Array{Int64,1}:
2
3

julia> @everywhere using DiDa
julia> @everywhere using DistributedData

julia> save_at(2, :x, randn(10,10))
Future(2, 1, 4, nothing)
Expand Down Expand Up @@ -132,9 +132,3 @@ julia> gather_array(dataset) # download the data from workers to a sing
0.610183 1.12165 0.722438
```

## What does the name `DiDa` mean?

**Di**stributed **Da**ta.

There is no consensus on how to pronounce the shortcut.
6 changes: 6 additions & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
coverage:
status:
project:
default:
threshold: 15
patch: off
10 changes: 5 additions & 5 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
using Documenter, DiDa
using Documenter, DistributedData

makedocs(modules = [DiDa],
makedocs(modules = [DistributedData],
clean = false,
format = Documenter.HTML(prettyurls = !("local" in ARGS)),
sitename = "DiDa.jl",
authors = "The developers of DiDa.jl",
sitename = "DistributedData.jl",
authors = "The developers of DistributedData.jl",
linkcheck = !("skiplinks" in ARGS),
pages = [
"Documentation" => "index.md",
Expand All @@ -14,7 +14,7 @@ makedocs(modules = [DiDa],
)

deploydocs(
repo = "github.com/LCSB-BioCore/DiDa.jl.git",
repo = "github.com/LCSB-BioCore/DistributedData.jl.git",
target = "build",
branch = "gh-pages",
devbranch = "develop",
Expand Down
8 changes: 4 additions & 4 deletions docs/src/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,27 @@
## Data structures

```@autodocs
Modules = [DiDa]
Modules = [DistributedData]
Pages = ["structs.jl"]
```

## Base functions

```@autodocs
Modules = [DiDa]
Modules = [DistributedData]
Pages = ["base.jl"]
```

## Higher-level array operations

```@autodocs
Modules = [DiDa]
Modules = [DistributedData]
Pages = ["tools.jl"]
```

## Input/Output

```@autodocs
Modules = [DiDa]
Modules = [DistributedData]
Pages = ["io.jl"]
```
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

# DiDa.jl — simple work with distributed data
# DistributedData.jl — simple work with distributed data

This packages provides simple Distributed Data manipulation and processing
routines for Julia.
Expand Down
20 changes: 10 additions & 10 deletions docs/src/tutorial.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@

# DiDa tutorial
# DistributedData tutorial

The primary purpose of this tutorial is to get a basic grasp of the main `DiDa`
The primary purpose of this tutorial is to get a basic grasp of the main `DistributedData`
functions and methodology.

For starting up, let's create a few distributed workers and import the package:

```julia
julia> using Distributed, DiDa
julia> using Distributed, DistributedData

julia> addprocs(3)
2-element Array{Int64,1}:
2
3
4

julia> @everywhere using DiDa
julia> @everywhere using DistributedData
```

## Moving the data around

In `DiDa`, the storage of distributed data is done in the "native" Julia way --
In `DistributedData`, the storage of distributed data is done in the "native" Julia way --
the data is stored in normal named variables. Each node holds its own data in
an arbitrary set of variables as "plain data"; content of these variables is
completely independent among nodes.
Expand Down Expand Up @@ -52,7 +52,7 @@ UndefValError: x not defined
```

`DiDa` uses *quoting* to allow you to precisely specify the parts of the code
`DistributedData` uses *quoting* to allow you to precisely specify the parts of the code
that should be evaluated on the "main" Julia process (the one you interact
with), and the code that should be evaluated on the remote workers. Basically,
all quoted code is going to get to the workers without any evaluation; all
Expand Down Expand Up @@ -192,7 +192,7 @@ beneficial for implementing advanced parallel algorithms.

Remembering and managing the remote variable names and worker numbers is
extremely impractical, especially if you need to maintain multiple variables on
various subsets of all available workers at once. `DiDa` defines a small
various subsets of all available workers at once. `DistributedData` defines a small
[`Dinfo`](@ref) data structure that keeps that information for you. Many other
functions are able to work with `Dinfo` transparently, instead of the "raw"
symbols and worker lists.
Expand Down Expand Up @@ -354,7 +354,7 @@ julia> dmap(Vector(1:length(workers())),

## Persisting the data

`DiDa` provides support for storing the loaded dataset in each worker's local
`DistributedData` provides support for storing the loaded dataset in each worker's local
storage. This is quite beneficial for saving sub-results and various artifacts
of the computation process for later use, without unnecessarily wasting
main memory.
Expand All @@ -378,9 +378,9 @@ significant overhead.

## Miscellaneous functions

For convenience, `DiDa` also contains simple implementations of various common
For convenience, `DistributedData` also contains simple implementations of various common
utility operations for processing matrix data. These originated in
flow-cytometry use-cases (which is what `DiDa` was originally built for), but
flow-cytometry use-cases (which is what `DistributedData` was originally built for), but
are applicable in many other areas of data analysis:

- [`dselect`](@ref) reduces a matrix to several selected columns (in a
Expand Down
2 changes: 1 addition & 1 deletion src/DiDa.jl → src/DistributedData.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
module DiDa
module DistributedData

using Distributed
using Serialization
Expand Down
10 changes: 5 additions & 5 deletions test/base.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
end

addprocs(3)
@everywhere using DiDa
@everywhere using DistributedData
W = workers()

@testset "Distributed data transfers -- with workers" begin
Expand Down Expand Up @@ -86,16 +86,16 @@
end

@testset "Internal utilities" begin
@test DiDa.tmp_symbol(:test) != :test
@test DiDa.tmp_symbol(:test, prefix = "abc",
@test DistributedData.tmp_symbol(:test) != :test
@test DistributedData.tmp_symbol(:test, prefix = "abc",
suffix = "def") == :abctestdef
@test DiDa.tmp_symbol(Dinfo(:test, W)) != :test
@test DistributedData.tmp_symbol(Dinfo(:test, W)) != :test
end

@testset "Persistent distributed data" begin
di = dtransform(:(), x -> rand(5), W, :test)

files = DiDa.defaultFiles(di.val, di.workers)
files = DistributedData.defaultFiles(di.val, di.workers)
@test allunique(files)

orig = gather_array(di)
Expand Down
4 changes: 2 additions & 2 deletions test/runtests.jl
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@

using Test
using DiDa, Distributed, Random
using DistributedData, Distributed, Random

@testset "DiDa tests" begin
@testset "DistributedData tests" begin
include("base.jl")
include("tools.jl")
end
2 changes: 1 addition & 1 deletion test/tools.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
@testset "High-level tools" begin

W = addprocs(2)
@everywhere using DiDa
@everywhere using DistributedData

Random.seed!(1)
dd = rand(11111, 5)
Expand Down