-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
109 lines (77 loc) · 3.5 KB
/
README.Rmd
File metadata and controls
109 lines (77 loc) · 3.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# addressr
<!-- badges: start -->
<!-- badges: end -->
The goal of addressr is to standardize address cleaning for various datasets used by the Eviction Lab.
## Installation
You can install the development version of addressr from [GitHub](https://github.com/) with:
``` r
# install.packages("pak")
pak::pak("EvictionLab/addressr")
```
## Usage
Addresses can be difficult to work with and messy in many different ways.
Here is an example of a relatively clean table:
```{r message=FALSE}
library(dplyr)
address_table <- dplyr::tribble(
~address, ~city, ~state,
"456 Jersey Avenue #102", "Montclair", "NJ",
"123-125 N Street Rd", "Cincinnati", "OH",
"789 Pirate Cv East DBA ELAB LLC", "Memphis", "TN",
"3548 1ST ST FL 1", "St. Louis", "MO",
)
```
Geocoders can be thrown off by various things, such as address ranges, unit numbers, directionals, and street endings.
The `clean_address()` function streamlines address cleaning and outputs data into a standard format
```{r message=FALSE}
library(addressr)
cleaned_addresses <- address_table |> clean_address(address) |> janitor::remove_empty("cols")
cleaned_addresses
```
By default, the function returns all address components. You can also select the output:
```{r}
address_table |> clean_address(address, output = c("clean_address", "short_address", "street_number", "unit", "extra"))
```
`clean_address` will return the street number, pre-direction, street name, street suffix and post-direction. `short_address` will return the street number and street name.
The package can also separate rows with a street range or multiple addresses:
```{r}
address_table <- address_table |>
add_row(address = "928-928 S Montgomery Ave 1500 Jefferson Rd", city = "New York", state = "NY") |>
add_row(address = "1500-1502 1550 Ptree Rd", city = "Atlanta", state = "GA")
address_table |> clean_address(address, separate_street_range = TRUE, separate_multi_address = TRUE)
```
If there is a common pattern that is not removed with the function, you can use `extract_remove_squish()`, to pre-clean the data.
```{r, message=FALSE}
address_table |>
add_row(address = "246 S Bend St Unit 530 3 Bedroom") |>
extract_remove_squish(address, other, "\\d Bedroom")
```
To switch a column from the abbreviated spelling to the long format, there are two functions: `switch_abbreviation` and `str_replace_names`
- `switch_abbreviation()` for abbreviations included in the `address_abbreviations` dataset: `r stringr::str_flatten_comma((unique(address_abbreviations$type)), last = ", and ")`.
- `all_street_suffixes` has a one-to-many relationship and should only be used `long-to-short` to standardize spellings into the same official abbreviation (AV, AVE, AVN to AVE)
- `official_street_suffixes` has a one-to-one relationship between short and long spellings. It can be used either `short-to-long` or `long-to-short` (assuming all endings are in the official format).
```{r}
address_table |>
clean_address(address) |>
mutate(
street_suffix_short = switch_abbreviation(street_suffix, "official_street_suffixes", "long-to-short"),
.keep = "used"
)
```
- `str_replace_names()` to replace any vector with another vector of the same length
```{r}
address_table |>
mutate(state = str_replace_names(state, state.abb, state.name))
```