Scoring Flusight submissions using scoringutils • forecasttools

library(forecasttools)
library(scoringutils)
library(dplyr)
library(ggplot2)
library(knitr)

In this vignette, we use forecasttools to capture the current state of the FluSight forecast hub (see here), and then score the forecasts according to a proper scoring rule. We do the scoring with scoringutils.

Generating a table of forecasts against truth data.

First, we create a table of quantile forecasts formatted to work with scoringutils functions using hub_to_scorable_quantiles(). Generally, we expect users to use hub_to_scorable_quantiles() with a local path to the forecast repository which updates from GitHub by default. In this case, we download a copy of the Hub from GitHub.

hub_url <- "https://github.com/cdcepi/FluSight-forecast-hub"
hub_path <- fs::path(withr::local_tempdir(), "flusight-hub")
download_hub(
  hub_url = hub_url,
  hub_path = hub_path,
  force = TRUE
)

For reproducibility, in this vignette we will examine the Hub as of a specific git commit, 6ae6919.

As of 6ae6919, the FluSight Hub accepted quantile forecasts for a single “target” quantity: epiweekly incident Influenza hospital admissions. It provides a timeseries of that data in a file named target-data/target-hospital-admissions.csv.

The schema for forecasts is standardized across Hubverse Hubs, but the schema for target data is not (yet). For that reason hub_to_scorable_quantiles() asks you to provide:

target_data_rel_path: a path to the target data you want relative to the Hub root directory.
obs_date_col: the name of the column in the target data table that corresponds to the target_end_date for forecasts. Default "date".
obs_value_col: the name of the column in the target data table that corresponds to observed values of the target quantity. Default "value".
id_cols: any additional ID columns besides the dates that should be used to join the target data to the forecast data. Default c("target", "location").

For the FluSight hub target data, the date column is "date" and the value column is "value", so we will leave those defaults. Forecasts are stratified by "location" but not by target, so we’ll use that as an additional ID column.

target_data_path <- fs::path("target-data",
  "target-hospital-admissions",
  ext = "csv"
)
forecast_and_target <- hub_to_scorable_quantiles(hub_path,
  target_data_rel_path =
    target_data_path,
  id_cols = "location"
)
#> ℹ Updating superseded URL `Infectious-Disease-Modeling-hubs` to `hubverse-org`
#> ℹ Updating superseded URL `Infectious-Disease-Modeling-hubs` to `hubverse-org`
#> New names:
#> Rows: 6148 Columns: 6
#> ── Column specification
#> ──────────────────────────────────────────────────────── Delimiter: "," chr
#> (2): location, location_name dbl (3): ...1, value, weekly_rate date (1): date
#> ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
#> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> • `` -> `...1`

hub_to_scorable_quantiles() outputs a scoringutils object, specifically the output of scoringutils::as_forecast_quantile(). Note that while hubData::collect_hub() identifies individual models with a model_id column, hub_to_scorable_quantiles() renames this to the scoringutils standard model column.

There were 39 different models that had been submitted to FluSight as of the commit examined in this vignette

unique(forecast_and_target$model)
#>  [1] "CADPH-FluCAT_Ensemble"    "CEPH-Rtrend_fluH"        
#>  [3] "CMU-TimeSeries"           "CU-ensemble"             
#>  [5] "FluSight-baseline"        "FluSight-ensemble"       
#>  [7] "FluSight-lop_norm"        "GH-model"                
#>  [9] "GT-FluFNP"                "ISU_NiemiLab-ENS"        
#> [11] "ISU_NiemiLab-NLH"         "ISU_NiemiLab-SIR"        
#> [13] "JHU_CSSE-CSSE_Ensemble"   "LUcompUncertLab-chimera" 
#> [15] "LosAlamos_NAU-CModel_Flu" "MIGHTE-Nsemble"          
#> [17] "MOBS-GLEAM_FLUH"          "NIH-Flu_ARIMA"           
#> [19] "NU_UCSD-GLEAM_AI_FLUH"    "PSI-PROF"                
#> [21] "PSI-PROF_beta"            "SGroup-RandomForest"     
#> [23] "SigSci-CREG"              "SigSci-TSENS"            
#> [25] "Stevens-GBR"              "UGA_flucast-Copycat"     
#> [27] "UGA_flucast-INFLAenza"    "UGA_flucast-OKeeffe"     
#> [29] "UGuelph-CompositeCurve"   "UGuelphensemble-GRYPHON" 
#> [31] "UM-DeepOutbreak"          "UMass-flusion"           
#> [33] "UMass-trends_ensemble"    "UNC_IDD-InfluPaint"      
#> [35] "UVAFluX-Ensemble"         "VTSanghani-Ensemble"     
#> [37] "cfa-flumech"              "cfarenewal-cfaepimlight" 
#> [39] "fjordhest-ensemble"

There are 53 locations, either states or territories, for which there are available forecasts. They are stored as two-digit codes, but can re-code them as the more familiar USPS-style two-letter abbreviations via us_loc_code_to_abbr():

unique(forecast_and_target$location)
#>  [1] "06" "01" "02" "04" "05" "08" "09" "10" "11" "12" "13" "15" "16" "17" "18"
#> [16] "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33"
#> [31] "34" "35" "36" "37" "38" "39" "40" "41" "42" "44" "45" "46" "47" "48" "49"
#> [46] "50" "51" "53" "54" "55" "56" "72" "US"

forecast_and_target <- forecast_and_target |>
  mutate(location = us_loc_code_to_abbr(location))

unique(forecast_and_target$location)
#>  [1] "CA" "AL" "AK" "AZ" "AR" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
#> [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
#> [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
#> [46] "VT" "VA" "WA" "WV" "WI" "WY" "PR" "US"

Tabular scoring of forecasts

scoringutils provides various forecast evaluation metrics including interval scores, skill relative to a chosen baseline, and coverage at different prediction quantiles. Here we show the metrics for US overall forecasts by model for all forecast dates so far, rounding to two significant figures

chosen_location <- "US"

forecast_and_target |>
  filter(location == !!chosen_location) |>
  score() |>
  summarise_scores(
    by = "model",
    relative_skill = TRUE,
    baseline = "FluSight-ensemble"
  ) |>
  summarise_scores(
    by = "model",
    fun = signif,
    digits = 2
  ) |>
  kable()

model	wis	overprediction	underprediction	dispersion	bias	interval_coverage_50	interval_coverage_90	ae_median
CEPH-Rtrend_fluH	1500	210	800	490	-0.340	0.540	0.840	2400
CMU-TimeSeries	1900	230	740	900	-0.220	0.600	0.950	2900
CU-ensemble	1700	570	680	480	-0.220	0.510	0.780	2500
FluSight-baseline	1700	610	870	210	0.110	0.086	0.640	2400
FluSight-ensemble	1200	270	510	440	-0.190	0.560	0.920	1900
FluSight-lop_norm	1200	200	410	590	-0.180	0.640	0.990	1900
GH-model	7400	0	7200	170	-0.970	0.022	0.089	7800
GT-FluFNP	2700	580	1800	360	-0.300	0.260	0.450	3600
ISU_NiemiLab-ENS	1900	190	1400	400	-0.520	0.330	0.570	2600
ISU_NiemiLab-NLH	1600	140	1100	310	-0.380	0.410	0.610	2100
ISU_NiemiLab-SIR	2600	590	1500	510	-0.480	0.310	0.550	3600
JHU_CSSE-CSSE_Ensemble	900	160	310	440	-0.059	0.560	0.960	1400
LUcompUncertLab-chimera	1700	690	770	260	-0.280	0.200	0.510	2300
LosAlamos_NAU-CModel_Flu	7000	5200	1600	190	-0.190	0.048	0.210	7700
MIGHTE-Nsemble	1300	310	600	380	-0.120	0.530	0.830	2000
MOBS-GLEAM_FLUH	1200	140	490	610	-0.330	0.630	0.940	1900
NIH-Flu_ARIMA	2200	100	710	1400	-0.160	0.570	0.930	2100
NU_UCSD-GLEAM_AI_FLUH	1900	520	580	760	-0.130	0.590	0.890	2900
PSI-PROF	1300	340	380	620	0.045	0.540	0.860	2100
PSI-PROF_beta	1800	430	650	730	0.073	0.540	0.840	2700
SGroup-RandomForest	1500	89	800	600	-0.210	0.590	0.940	2300
SigSci-CREG	1100	370	370	320	-0.130	0.320	0.780	1700
SigSci-TSENS	1600	370	690	530	-0.100	0.560	0.850	2300
Stevens-GBR	2300	60	1800	440	-0.530	0.270	0.490	3100
UGA_flucast-Copycat	1700	190	910	570	-0.280	0.470	0.870	2600
UGA_flucast-INFLAenza	1700	180	1100	380	-0.071	0.350	0.810	2500
UGA_flucast-OKeeffe	430	0	320	110	-0.680	0.270	0.800	700
UGuelph-CompositeCurve	3100	1800	770	490	0.031	0.081	0.550	4400
UGuelphensemble-GRYPHON	1800	530	860	440	-0.150	0.320	0.850	2700
UM-DeepOutbreak	2000	220	440	1300	-0.081	0.720	0.830	2100
UMass-flusion	1100	240	320	510	-0.033	0.590	0.990	1700
UMass-trends_ensemble	1900	680	840	350	-0.038	0.320	0.590	2500
UNC_IDD-InfluPaint	2800	1700	850	240	-0.180	0.160	0.380	3600
UVAFluX-Ensemble	1700	850	450	400	-0.084	0.520	0.750	2300
VTSanghani-Ensemble	2200	800	1100	340	-0.100	0.230	0.490	3000
cfa-flumech	2400	1500	430	460	0.079	0.310	0.650	3400
cfarenewal-cfaepimlight	1500	420	690	410	-0.230	0.440	0.830	2400
fjordhest-ensemble	1400	370	450	570	-0.180	0.580	0.940	2200