Skip to content

Instructions to obtain Structured-THP datasets #8

@airalcorn2

Description

@airalcorn2

Could you please provide additional details on how to obtain the 911-Calls and Earthquake datasets used in your paper? The CSV found at the provided webpage has 663,522 calls, all of which are in the EMS, fire, or traffic categories. For the 75 most frequent ZIP codes in this dataset, there are 582,045 total calls, which is considerably more than the 290,293 listed in Table 1 (see below code).

import pandas as pd

df = pd.read_csv("911.csv")
print(len(df))  # 663522
cats = ["EMS: ", "Fire: ", "Traffic: "]
in_cats = 0
for title in df["title"]:
    for cat in cats:
        if cat in title:
            in_cats += 1
            break

print(in_cats)  # 663522
zip_calls = (
    df.groupby("zip")
    .size()
    .reset_index(name="n_calls")
    .sort_values("n_calls", ascending=False)
)
print(zip_calls["n_calls"][:75].sum())  # 582045

The paper also states that:

An undirected edge exists between two vertices if their zipcodes are within 10 of each other.

Does this mean two vertices were considered neighbors if abs(ZIP_{1} - ZIP_{2}) <= 10?

For the Earthquake dataset, the provided website is in Chinese and seems to host a number of datasets. Could you provide precise instructions on where to find the specific earthquake dataset used in your paper?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions