Skip to content

Consider VRK:n rakennusten osoitetiedot ja äänestysalueet -data #13

@muuankarski

Description

@muuankarski

Väestörekisterikeskus publishes annually data containing all buildings in Finland. Data is zipped delimited file with .OPT-extension and has 3,6 million rows. It can be read and processed in R (slowly) with following code:

# 2019
library(dplyr)
library(sp)
library(sf)
tmpfile <- tempfile()
tmpdir <- tempdir()
download.file("https://www.avoindata.fi/data/dataset/cf9208dc-63a9-44a2-9312-bbd2c3952596/resource/ae13f168-e835-4412-8661-355ea6c4c468/download/suomi_osoitteet_2019-05-15.zip",
              destfile = tmpfile)
unzip(zipfile = tmpfile,
      exdir = tmpdir)

opt <- read.csv(glue::glue("{tmpdir}/Suomi_osoitteet_2019-05-15.OPT"), 
                sep = ";", 
                stringsAsFactors = FALSE, 
                header = FALSE)

names(opt) <- c("rakennustu","sijaintiku",
                "sijaintima","rakennusty",
                "CoordY","CoordX",
                "osoitenume", "katunimi_f",
                "katunimi_s", "katunumero",
                "postinumer", "vaalipiirikoodi",
                "vaalipiirinimi","tyhja",
                "idx", "date")
if (F){ # subsetting just to make conversions faster
opt_orig <- as_tibble(opt)
opt <- sample_n(opt_orig, size = 2000)
}

opt$katunimi_f <- iconv(opt$katunimi_f, from = "windows-1252", to = "UTF-8")
opt$katunimi_s <- iconv(opt$katunimi_s, from = "windows-1252", to = "UTF-8")
opt$katunumero <- iconv(opt$katunumero, from = "windows-1252", to = "UTF-8")
opt$vaalipiirinimi <- iconv(opt$vaalipiirinimi, from = "windows-1252", to = "UTF-8")

sp.data <- SpatialPointsDataFrame(opt[, c("CoordX", "CoordY")], 
                                  opt, 
                                  proj4string = CRS("+init=epsg:3067"))

# Project the spatial data to lat/lon
# sp.data <- spTransform(sp.data, CRS("+proj=longlat +datum=WGS84"))

shape <- st_as_sf(sp.data)

st_coordinates(shape)

# shape %>% select(rakennustu) %>% plot()

saveRDS(shape, file=paste0("./sf19_buildings.RDS"))

Any ideas how to incorporate this with geofi. It is useful for instance when geocoding sensitive addresses.

However, this would require a storage as the data should be preprocessed. Do you think this as a suitable data for geofi and should we create a data repo such as geofi_data?

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions