Skip to contents

Background

SaviR version 0.2 brings some major revisions to the SaviR API design; some complementary and some breaking. I’ll highlight a few of them below.

Standardization of geography columns across datasets

Black box warning

This will have a pretty substantial impact on all products, and will likely break most existing downstream code…

Motivation

All data pull functions use different naming schemes for ISO country codes and country names. It’s never apparent which column to join on, and after joining, it’s very difficult to reason what came from where. I’m attempting to standardize all ISO code columns to either iso2code or id (iso3code) depending on their value.

There should be no guesswork involved when joining disparate datasets (it’s always id or iso2code, and they should join without specifying a by argument).

Impact

owid_testing_meta, get_owid_testing_meta()
  • iso_code -> id
get_testing() (and all inner functions)
  • iso_code -> id
get_covid_df()
  • country_code -> iso2code
  • Removing who_region, region
    • These are both provided in onetable, and get_covid_df() is never used without joining to this metadata table
get_vax(), get_vax_manufacturers()
  • iso_code -> id
  • location -> owid_country
    • Still not ideal, but it should be apparent where that column comes from
  • URL for get_vax_manufacturers() changed
onetable
#> # A tibble: 237 × 10
#>    id    iso2code state_region            who_region who_region_desc who_country
#>    <chr> <chr>    <chr>                   <chr>      <chr>           <chr>      
#>  1 ABW   AW       NA                      AMRO       Americas        Aruba      
#>  2 AFG   AF       South and Central Asia  EMRO       Eastern Medite… Afghanistan
#>  3 AGO   AO       Sub-Saharan Africa      AFRO       Africa          Angola     
#>  4 AIA   AI       NA                      AMRO       Americas        Anguilla   
#>  5 ALB   AL       Europe and Eurasia      EURO       Europe          Albania    
#>  6 AND   AD       Europe and Eurasia      EURO       Europe          Andorra    
#>  7 ARE   AE       Near East (Middle East… EMRO       Eastern Medite… United Ara…
#>  8 ARG   AR       Western Hemisphere      AMRO       Americas        Argentina  
#>  9 ARM   AM       Europe and Eurasia      EURO       Europe          Armenia    
#> 10 ASM   AS       NA                      WPRO       Western Pacific American S…
#> # ℹ 227 more rows
#> # ℹ 4 more variables: incomelevel_value <chr>, population <dbl>,
#> #   eighteenplus <dbl>, geometry <MULTIPOLYGON [m]>
head(get_covid_df("WHO"))
#> # A tibble: 6 × 8
#>   date       iso2code country              new_cases cumulative_cases new_deaths
#>   <date>     <chr>    <chr>                    <int>            <int>      <int>
#> 1 2020-01-05 AD       Andorra                      0                0          0
#> 2 2020-01-05 AE       United Arab Emirates         0                0          0
#> 3 2020-01-05 AF       Afghanistan                  0                0          0
#> 4 2020-01-05 AG       Antigua and Barbuda          0                0          0
#> 5 2020-01-05 AI       Anguilla                     0                0          0
#> 6 2020-01-05 AL       Albania                      0                0          0
#> # ℹ 2 more variables: cumulative_deaths <int>, source <chr>
head(get_vax())
#> # A tibble: 6 × 17
#>   owid_country id    date       total_vaccinations people_vaccinated
#>   <chr>        <chr> <date>                  <dbl>             <dbl>
#> 1 Aruba        ABW   2021-03-29              25766             15600
#> 2 Aruba        ABW   2021-03-30              25766             15600
#> 3 Aruba        ABW   2021-03-31              25766             15600
#> 4 Aruba        ABW   2021-04-01              25766             15600
#> 5 Aruba        ABW   2021-04-02              25766             15600
#> 6 Aruba        ABW   2021-04-03              25766             15600
#> # ℹ 12 more variables: people_fully_vaccinated <dbl>, total_boosters <dbl>,
#> #   daily_vaccinations_raw <int>, daily_vaccinations <int>,
#> #   total_vaccinations_per_hundred <dbl>, people_vaccinated_per_hundred <dbl>,
#> #   people_fully_vaccinated_per_hundred <dbl>,
#> #   total_boosters_per_hundred <dbl>, daily_vaccinations_per_million <int>,
#> #   daily_people_vaccinated <int>, daily_people_vaccinated_per_hundred <dbl>,
#> #   daily_vaccinations_per_hundred <dbl>

New “Starting Block” function get_combined_table()

Many scripts accomplished this task differently, and it was never clear to most exactly how to pull and join these data together.

No one is forced to use it, but there is now an automated way to:

  • Pull metadata, case + death data, and vaccine data
  • Join all together
  • Filter to only WHO source, or WHO + JHU source
  • Keep (or remove) geometry column for mapping

get_combined_table() takes two arguments:

  • type: one of “WHO” or “Both”, based on which source you’d like
  • geometry: TRUE/FALSE (default:TRUE) based on whether you’d like geometry or not.
who_data <- get_combined_table("WHO")
# is identical to the following sequence:
# (which still works, but is unnecessary)
# onetable %>%
#   select(-geometry) %>% # In the case that geometry = FALSE
#   right_join(get_covid_df(), by = "iso2code") %>%
#   filter(source == "WHO") %>% # In the case of type = "WHO"
#   # filter(!(country == "China" & source == "WHO")) %>% # In the case of type = "Both"
#   calc_add_risk() %>%
#   left_join(get_vax(), by = c("id", "date"))

head(who_data)
#> # A tibble: 6 × 56
#>   id    iso2code state_region who_region who_region_desc who_country
#>   <chr> <chr>    <chr>        <chr>      <chr>           <chr>      
#> 1 ABW   AW       NA           AMRO       Americas        Aruba      
#> 2 ABW   AW       NA           AMRO       Americas        Aruba      
#> 3 ABW   AW       NA           AMRO       Americas        Aruba      
#> 4 ABW   AW       NA           AMRO       Americas        Aruba      
#> 5 ABW   AW       NA           AMRO       Americas        Aruba      
#> 6 ABW   AW       NA           AMRO       Americas        Aruba      
#> # ℹ 50 more variables: incomelevel_value <chr>, population <dbl>,
#> #   eighteenplus <dbl>, date <date>, country <chr>, new_cases <int>,
#> #   cumulative_cases <int>, new_deaths <int>, cumulative_deaths <int>,
#> #   source <chr>, new_cases_copy <dbl>, new_deaths_copy <dbl>,
#> #   cumulative_cases_copy <dbl>, cumulative_deaths_copy <dbl>, weekdate <date>,
#> #   new_cases_7dav <dbl>, new_deaths_7dav <dbl>, daily_case_incidence <dbl>,
#> #   daily_death_incidence <dbl>, week_case <dbl>, prev_week_case <dbl>, …

Vaccine carry-forward

We were running into issues where vaccination data were sometimes carried forward, but not always.

Since this is generally the behavior we want, I’ve applied it within SaviR using a new function, calc_vax_carryforward()

calc_vax_carryforward()

This function is used internally in get_vax() and get_combined_table() to carry-forward the following columns:

  • total_vaccinations
  • people_vaccinated
  • people_fully_vaccinated
  • total_boosters
  • total_vaccinations_per_hundred
  • people_vaccinated_per_hundred
  • people_fully_vaccinated_per_hundred
  • total_boosters_per_hundred

Passing columns to calc_vax_carryforward overrides that behavior, but there isn’t presently a need for that.

get_vax_dates()

Because vaccine data are carried-forward, the old way of computing the date the vaccine data were last updated is no longer possible. Instead, I’ve created a function which computes those dates automatically, get_vax_dates()

get_vax_dates() takes no arguments, but returns a data frame with 1 row per country detailing when each vaccine metric was last updated.

vax_dates <- get_vax_dates()

head(vax_dates)
#>    owid_country     id total_doses_date partial_date fully_date booster_date
#>          <char> <char>           <Date>       <Date>     <Date>       <Date>
#> 1:  Afghanistan    AFG       2023-11-26   2023-11-26 2023-11-26   2023-11-26
#> 2:      Albania    ALB       2023-09-10   2023-09-10 2023-09-10   2023-09-10
#> 3:      Algeria    DZA       2022-09-04   2022-09-04 2022-09-04   2022-09-04
#> 4:      Andorra    AND       2023-09-24   2023-09-24 2023-09-24   2023-09-24
#> 5:       Angola    AGO       2023-11-19   2023-11-19 2023-11-19   2023-11-19
#> 6:     Anguilla    AIA       2023-03-10   2023-03-17 2023-03-17   2023-03-03