Skip to content

Commit 8c37d6a

Browse files
committed
Initial commit of code & data
1 parent b69e907 commit 8c37d6a

10 files changed

+54821
-2
lines changed

.gitignore

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
bin
2+
.ipynb_checkpoints
3+
.classpath
4+
.project
5+
.settings
6+
.gradle

README.md

+15-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,15 @@
1-
# java-dataframes
2-
A quick test of a couple of data frame libraries for Java
1+
# Java dataframes test
2+
This is the companion repository to the following medium post: [Doing cool data science in Java: how 3 DataFrame libraries stack up](https://medium.com/@thijser/doing-cool-data-science-in-java-how-3-dataframe-libraries-stack-up-5e6ccb7b437)
3+
4+
## Data
5+
The data was extracted from [Eurostat](http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=urb_cpop1&lang=en) in the beginning of September 2018. I opened the extracted CSV in LibreOffice and saved it again because there were some illegal UTF-8 characters in the Eurostat output that some csv importers couldn't handle directly.
6+
7+
## Code
8+
The code for the three libraries is present in the `Test{libraryname}.java` files. They all use `CheckResult.java` to do a basic correctness check for the top-growing cities.
9+
10+
The libraries tested fully are:
11+
* [tablesaw](https://github.com/jtablesaw/tablesaw)
12+
* [joinery](https://github.com/cardillo/joinery)
13+
* [morpheus](https://github.com/zavtech/morpheus-core)
14+
15+
As described in the [medium post](https://medium.com/@thijser/doing-cool-data-science-in-java-how-3-dataframe-libraries-stack-up-5e6ccb7b437), I couldn't find a good way to do the pivot step in [datavec](https://deeplearning4j.org/docs/latest/datavec-overview), but I included the code I wrote up until that point.

build.gradle

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
apply plugin: 'java'
2+
apply plugin: 'eclipse'
3+
4+
sourceCompatibility = 1.8
5+
6+
repositories {
7+
mavenCentral()
8+
jcenter()
9+
}
10+
11+
dependencies {
12+
compile 'tech.tablesaw:tablesaw-core:0.25.2'
13+
14+
compile 'joinery:joinery-dataframe:1.9'
15+
// For the CSV import joinery needs this dependency too:
16+
compile 'org.apache.poi:poi:3.17'
17+
18+
compile 'com.zavtech:morpheus-core:0.9.21'
19+
20+
compile 'org.datavec:datavec-api:1.0.0-beta2'
21+
compile 'org.datavec:datavec-local:1.0.0-beta2'
22+
}
23+

0 commit comments

Comments
 (0)