Skip to content

Commit c0fb058

Browse files
authored
Merge pull request #64 from ropensci/0.3.1
0.3.1
2 parents 20762c5 + 827848a commit c0fb058

File tree

8 files changed

+91
-34
lines changed

8 files changed

+91
-34
lines changed

DESCRIPTION

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: git2rdata
22
Title: Store and Retrieve Data.frames in a Git Repository
3-
Version: 0.3.0
3+
Version: 0.3.1
44
Authors@R:
55
c(person(given = "Thierry",
66
family = "Onkelinx",
@@ -25,11 +25,29 @@ Authors@R:
2525
person(given = "Research Institute for Nature and Forest",
2626
role = c("cph", "fnd"),
2727
email = "[email protected]"))
28-
Description: Make versioning of data.frame easy and efficient using git
29-
repositories.
28+
Description: The git2rdata package is an R package for writing and reading
29+
dataframes as plain text files. A metadata file stores important
30+
information. 1) Storing metadata allows to maintain the classes of
31+
variables. By default, git2rdata optimizes the data for file storage.
32+
The optimization is most effective on data containing factors. The
33+
optimization makes the data less human readable. The user can turn
34+
this off when they prefer a human readable format over smaller files.
35+
Details on the implementation are available in vignette("plain_text",
36+
package = "git2rdata"). 2) Storing metadata also allows smaller row
37+
based diffs between two consecutive commits. This is a useful feature
38+
when storing data as plain text files under version control. Details
39+
on this part of the implementation are available in
40+
vignette("version_control", package = "git2rdata"). Although we
41+
envisioned git2rdata with a git workflow in mind, you can use it in
42+
combination with other version control systems like subversion or
43+
mercurial. 3) git2rdata is a useful tool in a reproducible and
44+
traceable workflow. vignette("workflow", package = "git2rdata") gives
45+
a toy example. 4) vignette("efficiency", package = "git2rdata")
46+
provides some insight into the efficiency of file storage, git
47+
repository size and speed for writing and reading. Please cite using
48+
<doi:10.5281/zenodo.1485309>.
3049
License: GPL-3
31-
URL: https://github.com/ropensci/git2rdata,
32-
https://doi.org/10.5281/zenodo.1485309
50+
URL: https://ropensci.github.io/git2rdata/
3351
BugReports: https://github.com/ropensci/git2rdata/issues
3452
Depends:
3553
R (>= 3.5.0)

NEWS.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# git2rdata 0.3.1
2+
3+
* Use `icuSetCollate()` to define a standardised sorting.
4+
15
# git2rdata 0.3.0
26

37
## New features
@@ -14,7 +18,7 @@
1418

1519
# git2rdata 0.2.2
1620

17-
* Use the [checklist](https://inbo.github.io/checklist) package for CI.
21+
* Use the [checklist](https://packages.inbo.be/checklist/) package for CI.
1822

1923
# git2rdata 0.2.1
2024

@@ -32,8 +36,8 @@
3236

3337
* Calculation of data hash has changed (#53).
3438
You must use `upgrade_data()` to read data stored by an older version.
35-
* `is_git2rdata()` and `upgrade_data()` do not test equality in data hashes
36-
anymore (but `read_vc()` still does).
39+
* `is_git2rdata()` and `upgrade_data()` no longer not test equality in data
40+
hashes (but `read_vc()` still does).
3741
* `write_vc()` and `read_vc()` fail when `file` is a location outside of `root`
3842
(#50).
3943
* Reordering factor levels requires `strict = TRUE`.

R/datahash.R

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -50,22 +50,18 @@ datahash <- function(file) {
5050
#' @noRd
5151
#' @return a named vector with the old locale
5252
set_c_locale <- function() {
53-
old_ctype <- Sys.getlocale(category = "LC_CTYPE")
54-
old_collate <- Sys.getlocale(category = "LC_COLLATE")
55-
old_time <- Sys.getlocale(category = "LC_TIME")
56-
Sys.setlocale(category = "LC_CTYPE", locale = "C")
57-
Sys.setlocale(category = "LC_COLLATE", locale = "C")
58-
Sys.setlocale(category = "LC_TIME", locale = "C")
59-
return(c(ctype = old_ctype, collate = old_collate, time = old_time))
53+
icuSetCollate(
54+
locale = "en_GB", case_first = "lower", normalization = "on",
55+
case_level = "on"
56+
)
57+
return(c())
6058
}
6159

6260
#' Reset the old locale
6361
#' @param locale the output of `set_c_locale()`
6462
#' @return invisible `NULL`
6563
#' @noRd
6664
set_local_locale <- function(locale) {
67-
Sys.setlocale(category = "LC_CTYPE", locale = locale["ctype"])
68-
Sys.setlocale(category = "LC_COLLATE", locale = locale["collate"])
69-
Sys.setlocale(category = "LC_TIME", locale = locale["time"])
65+
icuSetCollate(locale = "default")
7066
return(invisible(NULL))
7167
}

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,10 +138,10 @@ Please use the output of `citation("git2rdata")`
138138

139139
## Folder Structure
140140

141-
- `R`: The source scripts of the [R](https://cran.r-project.org/) functions with documentation in [Roxygen](https://github.com/klutometis/roxygen) format
141+
- `R`: The source scripts of the [R](https://cran.r-project.org/) functions with documentation in [Roxygen](https://CRAN.R-project.org/package=roxygen2) format
142142
- `man`: The help files in [Rd](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Rd-format) format
143143
- `inst/efficiency`: pre-calculated data to speed up `vignette("efficiency", package = "git2rdata")`
144-
- `testthat`: R scripts with unit tests using the [testthat](http://testthat.r-lib.org/) framework
144+
- `testthat`: R scripts with unit tests using the [testthat](https://CRAN.R-project.org/package=testthat) framework
145145
- `vignettes`: source code for the vignettes describing the package
146146
- `man-roxygen`: templates for documentation in Roxygen format
147147
- `pkgdown`: source files for the `git2rdata` [website](https://ropensci.github.io/git2rdata/)

cran-comments.md

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
## Test environments
22
* local
3-
* ubuntu 18.04.3 LTS, R 3.6.1
4-
* travis-ci
5-
* trusty, oldrel
6-
* xenial, release and devel
7-
* osx, release
8-
* AppVeyor
9-
* Windows Server 2012 R2 x64, R 3.6.1 Patched
3+
* ubuntu 18.04.5 LTS, R 4.0.3
4+
* github actions
5+
* macOS-latest, release
6+
* windows-latest, release
7+
* ubuntu 20.04, devel
8+
* ubuntu 16.04, oldrel
9+
* checklist package: ubuntu 20.04.1, R 4.0.3
1010
* r-hub
1111
* Windows Server 2008 R2 SP1, R-devel, 32/64 bit
1212
* Ubuntu Linux 16.04 LTS, R-release, GCC
@@ -15,3 +15,24 @@
1515
## R CMD check results
1616

1717
0 errors | 0 warnings | 0 note
18+
19+
r-hub gave a few false positive notes
20+
21+
* Windows Server 2008 R2 SP1, R-devel, 32/64 bit
22+
23+
```
24+
Possibly mis-spelled words in DESCRIPTION:
25+
rdata (28:22, 31:33, 36:20, 40:48, 41:20, 43:24, 44:62, 45:62)
26+
workflow (41:37, 44:15, 44:36)
27+
```
28+
29+
* Fedora Linux, R-devel, clang, gfortran
30+
31+
```
32+
Possibly mis-spelled words in DESCRIPTION:
33+
rdata (28:22, 31:33, 36:20, 40:48, 41:20, 43:24, 44:62, 45:62)
34+
```
35+
36+
Ubuntu Linux 16.04 LTS, R-release, GCC failed on r-hub because ICU is not
37+
available on that build.
38+

man/git2rdata-package.Rd

Lines changed: 22 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/testthat/test_b_special.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ expect_is(
1919
)
2020
expect_equal(
2121
names(output)[1],
22-
"9e5edf55ceadd2c148d6d715ea5d12cc8e1538d8"
22+
"1d135a85dc9beff3223d6c79f0d8975b559afca7"
2323
)
2424
old_locale <- git2rdata:::set_c_locale()
2525
dso <- ds[order(ds$a), , drop = FALSE] # nolint
@@ -64,7 +64,7 @@ expect_equal(
6464
)
6565
expect_equal(
6666
names(output)[1],
67-
"9e5edf55ceadd2c148d6d715ea5d12cc8e1538d8"
67+
"1d135a85dc9beff3223d6c79f0d8975b559afca7"
6868
)
6969
expect_identical(
7070
names(output),

vignettes/split_by.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ We add an `index.tsv` containing the combinations of the `split_by` variables an
136136
This hash becomes the base name of the partial data files.
137137

138138
Splitting the dataframe into smaller files makes them easier to handle in version control system.
139-
The overall size depends on the amount of replication in the dataframe.
139+
The total size depends on the amount of replication in the dataframe.
140140
More on that in the next section.
141141

142142
## When to Split the Dataframe

0 commit comments

Comments
 (0)