Open
Description
DB: MySQL 5.6.27
Package version: RMySQL_0.10.11
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Steps to reproduce:
- Write large data frame > 500K rows & 15 columns to table with existing schema (append = TRUE)
- Typically 300-400K rows get written, but not the entire data frame. Number of rows written each time are variable.
- Batching with a single connection seems unreliable, but batching with separate connecitons works reliable as per example below -- so this smells like a connection buffer / timeout issue.
- May be an old issue: http://stackoverflow.com/questions/22358623/r-dbwritetable-append-not-all-data
Functioning workaround:
Batch the write with new connection per batch
eg:
## Workaround for MySQL driver issues
batchsize = 100000
if (nrow(data) >= batchsize) {
batch <- split(data, 1:nrow(data) %/% batchsize)
op <- lapply(batch, function(x) {
## Make separate connections for each batch
conn2 <- RMySQL::dbConnect( ... connectionparams ... )
RMySQL::dbWriteTable(conn = conn2,
name = tablename,
value = x,
append = TRUE,
row.names = FALSE)
DBI::dbDisconnect(conn2)
})
} else { ... write normally ...}