SaviR Version 0.2 API Changes
Source:vignettes/savir_version_0.2_api_changes.Rmd
savir_version_0.2_api_changes.Rmd
Background
SaviR version 0.2 brings some major revisions to the SaviR API design; some complementary and some breaking. I’ll highlight a few of them below.
Standardization of geography columns across datasets
Black box warning
This will have a pretty substantial impact on all products, and will likely break most existing downstream code…
Motivation
All data pull functions use different naming schemes for ISO country codes and country names. It’s never apparent which column to join on, and after joining, it’s very difficult to reason what came from where. I’m attempting to standardize all ISO code columns to either iso2code or id (iso3code) depending on their value.
There should be no guesswork involved when joining disparate datasets
(it’s always id or iso2code, and they should join without specifying a
by
argument).
Impact
get_covid_df()
-
country_code
->iso2code
- Removing
who_region
,region
- These are both provided in
onetable
, andget_covid_df()
is never used without joining to this metadata table
- These are both provided in
get_vax(), get_vax_manufacturers()
-
iso_code
->id
-
location
->owid_country
- Still not ideal, but it should be apparent where that column comes from
- URL for
get_vax_manufacturers()
changed
onetable
#> # A tibble: 237 × 10
#> id iso2code state_region who_region who_region_desc who_country
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 ABW AW NA AMRO Americas Aruba
#> 2 AFG AF South and Central Asia EMRO Eastern Medite… Afghanistan
#> 3 AGO AO Sub-Saharan Africa AFRO Africa Angola
#> 4 AIA AI NA AMRO Americas Anguilla
#> 5 ALB AL Europe and Eurasia EURO Europe Albania
#> 6 AND AD Europe and Eurasia EURO Europe Andorra
#> 7 ARE AE Near East (Middle East… EMRO Eastern Medite… United Ara…
#> 8 ARG AR Western Hemisphere AMRO Americas Argentina
#> 9 ARM AM Europe and Eurasia EURO Europe Armenia
#> 10 ASM AS NA WPRO Western Pacific American S…
#> # ℹ 227 more rows
#> # ℹ 4 more variables: incomelevel_value <chr>, population <dbl>,
#> # eighteenplus <dbl>, geometry <MULTIPOLYGON [m]>
head(get_covid_df("WHO"))
#> # A tibble: 6 × 8
#> date iso2code country new_cases cumulative_cases new_deaths
#> <date> <chr> <chr> <int> <int> <int>
#> 1 2020-01-05 AD Andorra 0 0 0
#> 2 2020-01-05 AE United Arab Emirates 0 0 0
#> 3 2020-01-05 AF Afghanistan 0 0 0
#> 4 2020-01-05 AG Antigua and Barbuda 0 0 0
#> 5 2020-01-05 AI Anguilla 0 0 0
#> 6 2020-01-05 AL Albania 0 0 0
#> # ℹ 2 more variables: cumulative_deaths <int>, source <chr>
head(get_vax())
#> # A tibble: 6 × 17
#> owid_country id date total_vaccinations people_vaccinated
#> <chr> <chr> <date> <dbl> <dbl>
#> 1 Aruba ABW 2021-03-29 25766 15600
#> 2 Aruba ABW 2021-03-30 25766 15600
#> 3 Aruba ABW 2021-03-31 25766 15600
#> 4 Aruba ABW 2021-04-01 25766 15600
#> 5 Aruba ABW 2021-04-02 25766 15600
#> 6 Aruba ABW 2021-04-03 25766 15600
#> # ℹ 12 more variables: people_fully_vaccinated <dbl>, total_boosters <dbl>,
#> # daily_vaccinations_raw <int>, daily_vaccinations <int>,
#> # total_vaccinations_per_hundred <dbl>, people_vaccinated_per_hundred <dbl>,
#> # people_fully_vaccinated_per_hundred <dbl>,
#> # total_boosters_per_hundred <dbl>, daily_vaccinations_per_million <int>,
#> # daily_people_vaccinated <int>, daily_people_vaccinated_per_hundred <dbl>,
#> # daily_vaccinations_per_hundred <dbl>
New “Starting Block” function get_combined_table()
Many scripts accomplished this task differently, and it was never clear to most exactly how to pull and join these data together.
No one is forced to use it, but there is now an automated way to:
- Pull metadata, case + death data, and vaccine data
- Join all together
- Filter to only WHO source, or WHO + JHU source
- Keep (or remove) geometry column for mapping
get_combined_table()
takes two arguments:
- type: one of “WHO” or “Both”, based on which source you’d like
- geometry: TRUE/FALSE (default:TRUE) based on whether you’d like geometry or not.
who_data <- get_combined_table("WHO")
# is identical to the following sequence:
# (which still works, but is unnecessary)
# onetable %>%
# select(-geometry) %>% # In the case that geometry = FALSE
# right_join(get_covid_df(), by = "iso2code") %>%
# filter(source == "WHO") %>% # In the case of type = "WHO"
# # filter(!(country == "China" & source == "WHO")) %>% # In the case of type = "Both"
# calc_add_risk() %>%
# left_join(get_vax(), by = c("id", "date"))
head(who_data)
#> # A tibble: 6 × 56
#> id iso2code state_region who_region who_region_desc who_country
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 ABW AW NA AMRO Americas Aruba
#> 2 ABW AW NA AMRO Americas Aruba
#> 3 ABW AW NA AMRO Americas Aruba
#> 4 ABW AW NA AMRO Americas Aruba
#> 5 ABW AW NA AMRO Americas Aruba
#> 6 ABW AW NA AMRO Americas Aruba
#> # ℹ 50 more variables: incomelevel_value <chr>, population <dbl>,
#> # eighteenplus <dbl>, date <date>, country <chr>, new_cases <int>,
#> # cumulative_cases <int>, new_deaths <int>, cumulative_deaths <int>,
#> # source <chr>, new_cases_copy <dbl>, new_deaths_copy <dbl>,
#> # cumulative_cases_copy <dbl>, cumulative_deaths_copy <dbl>, weekdate <date>,
#> # new_cases_7dav <dbl>, new_deaths_7dav <dbl>, daily_case_incidence <dbl>,
#> # daily_death_incidence <dbl>, week_case <dbl>, prev_week_case <dbl>, …
Vaccine carry-forward
We were running into issues where vaccination data were sometimes carried forward, but not always.
Since this is generally the behavior we want, I’ve applied it within
SaviR using a new function, calc_vax_carryforward()
calc_vax_carryforward()
This function is used internally in get_vax()
and
get_combined_table()
to carry-forward the following
columns:
total_vaccinations
people_vaccinated
people_fully_vaccinated
total_boosters
total_vaccinations_per_hundred
people_vaccinated_per_hundred
people_fully_vaccinated_per_hundred
total_boosters_per_hundred
Passing columns to calc_vax_carryforward
overrides that
behavior, but there isn’t presently a need for that.
get_vax_dates()
Because vaccine data are carried-forward, the old way of computing
the date the vaccine data were last updated is no longer possible.
Instead, I’ve created a function which computes those dates
automatically, get_vax_dates()
get_vax_dates()
takes no arguments, but returns a data
frame with 1 row per country detailing when each vaccine metric was last
updated.
vax_dates <- get_vax_dates()
head(vax_dates)
#> owid_country id total_doses_date partial_date fully_date booster_date
#> <char> <char> <Date> <Date> <Date> <Date>
#> 1: Afghanistan AFG 2023-12-31 2023-12-31 2023-12-31 2023-12-31
#> 2: Albania ALB 2023-09-10 2023-09-10 2023-09-10 2023-09-10
#> 3: Algeria DZA 2022-09-04 2022-09-04 2022-09-04 2022-09-04
#> 4: Andorra AND 2023-09-24 2023-09-24 2023-09-24 2023-09-24
#> 5: Angola AGO 2023-12-31 2023-12-31 2023-12-31 2023-12-31
#> 6: Anguilla AIA 2023-03-10 2023-03-17 2023-03-17 2023-03-03