Skip to content

Latest commit

 

History

History
132 lines (102 loc) · 5.93 KB

readme.md

File metadata and controls

132 lines (102 loc) · 5.93 KB

ISO Country Codes

We've referenced countries and country codes in many past datasets, but we've never looked closely at the ISO 3166 standard that defines these codes.

Wikipedia says:

ISO 3166 is a standard published by the International Organization for Standardization (ISO) that defines codes for the names of countries, dependent territories, special areas of geographical interest, and their principal subdivisions (e.g., provinces or states). The official name of the standard is Codes for the representation of names of countries and their subdivisions.

The dataset this week comes from the {ISOcodes} R package. It consists of three tables:

  • countries: Country codes from ISO 3166-1.
  • country_subdivisions: Country subdivision code from ISO 3166-2.
  • former_countries: Code for formerly used names of countries from ISO 3166-3.

Tip: Try the quick_map() function in the {countries} package to produce maps colored by country.

Some questions to consider:

  • When did ISO 3166-3 begin to log the date withdrawn as a full date, rather than just a year?
  • Which countries have the most subdivisions identified by the International Organization for Standardization (ISO)?
  • Is there a pattern to which countries have sub-subdivisions (subdivisions with a parent) and which don't?

You can use this code to explore past datasets that have mentioned countries and/or country codes:

# install.packages("pak")
# pak::pak("r4ds/ttmeta")
ttmeta::tt_datasets_metadata |> 
  dplyr::mutate(
    has_country = purrr::map_lgl(variable_details, function(var_details) {
      "country_code" %in% tolower(var_details$variable) ||
        any(stringr::str_detect(tolower(var_details$variable), "country"))
    })
  ) |> 
  dplyr::filter(has_country)

Thank you to Jon Harmon for curating this week's dataset.

The Data

# Option 1: tidytuesdayR package 
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2024-11-12')
## OR
tuesdata <- tidytuesdayR::tt_load(2024, week = 46)

countries <- tuesdata$countries
country_subdivisions <- tuesdata$country_subdivisions
former_countries <- tuesdata$former_countries

# Option 2: Read directly from GitHub

countries <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-11-12/countries.csv')
country_subdivisions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-11-12/country_subdivisions.csv')
former_countries <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-11-12/former_countries.csv')

How to Participate

  • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
  • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
  • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
  • Submit your own dataset!

Data Dictionary

countries.csv

variable class description
alpha_2 character 2-letter country code.
alpha_3 character 3-letter country code.
numeric integer 3-digit country code.
name character Name of the country (in English).
official_name character Official name of the country (in English).
common_name character Alternate common name for the country (in English).

country_subdivisions.csv

variable class description
code character Code for the subdivision, consisting of a country's alpha_2 code, then a dash, then a code for this subdivision.
name character Name of this subdivision.
type character Type of subdivision, such as "Province", "Municipality", or "District".
parent character Code for the subdivision in which this subdivision is found, if it is not a direct subdivision of the country.
alpha_2 character The parent country's alpha_2 code (extracted from code).

former_countries.csv

variable class description
alpha_4 character 4-letter country code. Only used for these former countries.
alpha_3 character 3-letter country code.
numeric character 3-digit country code.
name character Name of the former country (in English).
date_withdrawn character Year or date on which the code was withdrawn from use.
comment character An optional comment explaining why the code was withdrawn.

Cleaning Script

# Mostly clean data from the ISOcodes package

# install.packages("ISOcodes")
library(ISOcodes)
library(tidyverse)
library(janitor)

countries <- 
  ISOcodes::ISO_3166_1 |> 
  tibble::as_tibble() |> 
  dplyr::mutate(Numeric = as.integer(Numeric)) |> 
  janitor::clean_names()
country_subdivisions <- 
  ISOcodes::ISO_3166_2 |> 
  tibble::as_tibble() |> 
  janitor::clean_names() |> 
  dplyr::mutate(
    alpha_2 = stringr::str_extract(code, "^[^-]+(?=-)")
  )
former_countries <-
  ISOcodes::ISO_3166_3 |> 
  tibble::as_tibble() |> 
  janitor::clean_names()