-
Notifications
You must be signed in to change notification settings - Fork 26
/
Copy pathnormalize-shop-title
executable file
·62 lines (61 loc) · 2.18 KB
/
normalize-shop-title
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#!/bin/sh
#
# Normalize a shopping title string, suitable for alphabetizing and comparing.
#
# Example:
#
# ```shell
# $ echo "The Shops at Alpha Bravo Center" | normalize-shop-title
# Alpha Bravo
#
# This script processes each input line and massages it.
#
# Step 1: run generic `normalize-title` items.
#
# * Smush whitespace runs to one space.
# * Strip leading/trailing whitespace.
# * Strip parentheticals.
# * Strip extraneous characters: ™ © ®
# * Strip leading stop words: A, The, etc.
# * Strip trailing stop words after a comma.
#
# Step 2: run shopping-related items.
#
# * Strip leading/trailing ornaments: "Mall at", "Shops by", etc.
#
# Ornament words:
#
# * Center, Centre
# * Court
# * Food Court
# * Inc, Incorporated
# * Mall
# * Outlet
# * Premium Outlets
# * Restaurant
# * Shop, Shoppe
# * Shopping Center
# * Square
# * Theater, Theatre
# * Prepositions: at, by, for, in, near, of
#
# ## Tracking
#
# * Command: normalize-shop-title
# * Version: 1.2.0
# * Created: 2017-05-27
# * Updated: 2017-06-17
# * License: GPL
# * Contact: Joel Parker Henderson ([email protected])
sed -E " \
s/[[:space:]]+/ /g; \
s/^[[:space:]]+//; s/[[:space:]]+$//; \
s/\([^\)]*\)//g; \
s/ *[™©®] *//g; \
s/^[- ]*(A|An|The)[- ,]+//i; \
s/[- ,]+(A|An|The)[- ,]*$//i; \
s/^[- ]*(Centers?|Centres?|Courts?|Food[- ]Courts?|Inc\.?|Incorporated|Malls?|Outlets?|Premium[- ]Outlets?|Restaurants?|Shops?|Shoppes?|Shopping[- ]Centers?|Squares?|Stores?|Theaters?|Theatres?)[- ]+(at|by|for|in|near|of)[- ]+//gi; \
s/[- ,]+(Centers?|Centres?|Courts?|Food[- ]Courts?|Inc\.?|Incorporated|Malls?|Outlets?|Premium[- ]Outlets?|Restaurants?|Shops?|Shoppes?|Shopping[- ]Centers?|Squares?|Stores?|Theaters?|Theatres?)[- ]+(at|by|for|in|near|of)[- ]*$//gi; \
s/^[- ]*(Centers?|Centres?|Courts?|Food[- ]Courts?|Inc\.?|Incorporated|Malls?|Outlets?|Premium[- ]Outlets?|Restaurants?|Shops?|Shoppes?|Shopping[- ]Centers?|Squares?|Stores?|Theaters?|Theatres?)[- ]+//gi; \
s/[- ,]+(Centers?|Centres?|Courts?|Food[- ]Courts?|Inc\.?|Incorporated|Malls?|Outlets?|Premium[- ]Outlets?|Restaurants?|Shops?|Shoppes?|Shopping[- ]Centers?|Squares?|Stores?|Theaters?|Theatres?)[- ]*$//gi; \
"