Scoring Flusight submissions using scoringutils
Source:vignettes/scoring-flu-forecasts.Rmd
scoring-flu-forecasts.Rmd
In this vignette, we use forecasttools
to capture the
current state of the FluSight forecast hub (see here), and
then score the forecasts according to a proper scoring
rule. We do the scoring with scoringutils
.
Generating a table of forecasts against truth data.
First, we create a table of quantile forecasts formatted to work with
scoringutils
functions using
hub_to_scorable_quantiles()
. Generally, we expect users to
use hub_to_scorable_quantiles()
with a local path to the
forecast repository which updates from GitHub by default. In this case,
we download a copy of the Hub from GitHub.
hub_url <- "https://github.com/cdcepi/FluSight-forecast-hub"
hub_path <- fs::path(withr::local_tempdir(), "flusight-hub")
download_hub(
hub_url = hub_url,
hub_path = hub_path,
force = TRUE
)
For reproducibility, in this vignette we will examine the Hub as of a
specific git
commit, 6ae6919
.
As of 6ae6919
,
the FluSight Hub accepted quantile forecasts for a single “target”
quantity: epiweekly incident Influenza hospital admissions. It provides
a timeseries of that data in a file named target-data/target-hospital-admissions.csv
.
The schema for forecasts is standardized across Hubverse Hubs, but
the schema for target data is not (yet). For that reason
hub_to_scorable_quantiles()
asks you to provide:
-
target_data_rel_path
: a path to the target data you want relative to the Hub root directory. -
obs_date_col
: the name of the column in the target data table that corresponds to thetarget_end_date
for forecasts. Default"date"
. -
obs_value_col
: the name of the column in the target data table that corresponds to observed values of the target quantity. Default"value"
. -
id_cols
: any additional ID columns besides the dates that should be used to join the target data to the forecast data. Defaultc("target", "location")
.
For the FluSight hub target data, the date column is
"date"
and the value column is "value"
, so we
will leave those defaults. Forecasts are stratified by
"location"
but not by target, so we’ll use that as an
additional ID column.
target_data_path <- fs::path("target-data",
"target-hospital-admissions",
ext = "csv"
)
forecast_and_target <- hub_to_scorable_quantiles(hub_path,
target_data_rel_path =
target_data_path,
id_cols = "location"
)
#> ℹ Updating superseded URL `Infectious-Disease-Modeling-hubs` to `hubverse-org`
#> ℹ Updating superseded URL `Infectious-Disease-Modeling-hubs` to `hubverse-org`
#> New names:
#> Rows: 6148 Columns: 6
#> ── Column specification
#> ──────────────────────────────────────────────────────── Delimiter: "," chr
#> (2): location, location_name dbl (3): ...1, value, weekly_rate date (1): date
#> ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
#> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> • `` -> `...1`
hub_to_scorable_quantiles()
outputs a
scoringutils
object, specifically the output of
scoringutils::as_forecast_quantile()
. Note that while
hubData::collect_hub()
identifies individual models with a
model_id
column, hub_to_scorable_quantiles()
renames this to the scoringutils
standard
model
column.
There were 39 different models that had been submitted to FluSight as of the commit examined in this vignette
unique(forecast_and_target$model)
#> [1] "CADPH-FluCAT_Ensemble" "CEPH-Rtrend_fluH"
#> [3] "CMU-TimeSeries" "CU-ensemble"
#> [5] "FluSight-baseline" "FluSight-ensemble"
#> [7] "FluSight-lop_norm" "GH-model"
#> [9] "GT-FluFNP" "ISU_NiemiLab-ENS"
#> [11] "ISU_NiemiLab-NLH" "ISU_NiemiLab-SIR"
#> [13] "JHU_CSSE-CSSE_Ensemble" "LUcompUncertLab-chimera"
#> [15] "LosAlamos_NAU-CModel_Flu" "MIGHTE-Nsemble"
#> [17] "MOBS-GLEAM_FLUH" "NIH-Flu_ARIMA"
#> [19] "NU_UCSD-GLEAM_AI_FLUH" "PSI-PROF"
#> [21] "PSI-PROF_beta" "SGroup-RandomForest"
#> [23] "SigSci-CREG" "SigSci-TSENS"
#> [25] "Stevens-GBR" "UGA_flucast-Copycat"
#> [27] "UGA_flucast-INFLAenza" "UGA_flucast-OKeeffe"
#> [29] "UGuelph-CompositeCurve" "UGuelphensemble-GRYPHON"
#> [31] "UM-DeepOutbreak" "UMass-flusion"
#> [33] "UMass-trends_ensemble" "UNC_IDD-InfluPaint"
#> [35] "UVAFluX-Ensemble" "VTSanghani-Ensemble"
#> [37] "cfa-flumech" "cfarenewal-cfaepimlight"
#> [39] "fjordhest-ensemble"
There are 53 locations, either states or territories, for which there
are available forecasts. They are stored as two-digit codes, but can
re-code them as the more familiar USPS-style two-letter abbreviations
via us_loc_code_to_abbr()
:
unique(forecast_and_target$location)
#> [1] "06" "01" "02" "04" "05" "08" "09" "10" "11" "12" "13" "15" "16" "17" "18"
#> [16] "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33"
#> [31] "34" "35" "36" "37" "38" "39" "40" "41" "42" "44" "45" "46" "47" "48" "49"
#> [46] "50" "51" "53" "54" "55" "56" "72" "US"
forecast_and_target <- forecast_and_target |>
mutate(location = us_loc_code_to_abbr(location))
unique(forecast_and_target$location)
#> [1] "CA" "AL" "AK" "AZ" "AR" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
#> [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
#> [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
#> [46] "VT" "VA" "WA" "WV" "WI" "WY" "PR" "US"
Tabular scoring of forecasts
scoringutils
provides various forecast evaluation
metrics including interval scores, skill relative to a chosen baseline,
and coverage at different prediction quantiles. Here we show the metrics
for US overall forecasts by model for all forecast dates so far,
rounding to two significant figures
chosen_location <- "US"
forecast_and_target |>
filter(location == !!chosen_location) |>
score() |>
summarise_scores(
by = "model",
relative_skill = TRUE,
baseline = "FluSight-ensemble"
) |>
summarise_scores(
by = "model",
fun = signif,
digits = 2
) |>
kable()
model | wis | overprediction | underprediction | dispersion | bias | interval_coverage_50 | interval_coverage_90 | ae_median |
---|---|---|---|---|---|---|---|---|
CEPH-Rtrend_fluH | 1500 | 210 | 800 | 490 | -0.340 | 0.540 | 0.840 | 2400 |
CMU-TimeSeries | 1900 | 230 | 740 | 900 | -0.220 | 0.600 | 0.950 | 2900 |
CU-ensemble | 1700 | 570 | 680 | 480 | -0.220 | 0.510 | 0.780 | 2500 |
FluSight-baseline | 1700 | 610 | 870 | 210 | 0.110 | 0.086 | 0.640 | 2400 |
FluSight-ensemble | 1200 | 270 | 510 | 440 | -0.190 | 0.560 | 0.920 | 1900 |
FluSight-lop_norm | 1200 | 200 | 410 | 590 | -0.180 | 0.640 | 0.990 | 1900 |
GH-model | 7400 | 0 | 7200 | 170 | -0.970 | 0.022 | 0.089 | 7800 |
GT-FluFNP | 2700 | 580 | 1800 | 360 | -0.300 | 0.260 | 0.450 | 3600 |
ISU_NiemiLab-ENS | 1900 | 190 | 1400 | 400 | -0.520 | 0.330 | 0.570 | 2600 |
ISU_NiemiLab-NLH | 1600 | 140 | 1100 | 310 | -0.380 | 0.410 | 0.610 | 2100 |
ISU_NiemiLab-SIR | 2600 | 590 | 1500 | 510 | -0.480 | 0.310 | 0.550 | 3600 |
JHU_CSSE-CSSE_Ensemble | 900 | 160 | 310 | 440 | -0.059 | 0.560 | 0.960 | 1400 |
LUcompUncertLab-chimera | 1700 | 690 | 770 | 260 | -0.280 | 0.200 | 0.510 | 2300 |
LosAlamos_NAU-CModel_Flu | 7000 | 5200 | 1600 | 190 | -0.190 | 0.048 | 0.210 | 7700 |
MIGHTE-Nsemble | 1300 | 310 | 600 | 380 | -0.120 | 0.530 | 0.830 | 2000 |
MOBS-GLEAM_FLUH | 1200 | 140 | 490 | 610 | -0.330 | 0.630 | 0.940 | 1900 |
NIH-Flu_ARIMA | 2200 | 100 | 710 | 1400 | -0.160 | 0.570 | 0.930 | 2100 |
NU_UCSD-GLEAM_AI_FLUH | 1900 | 520 | 580 | 760 | -0.130 | 0.590 | 0.890 | 2900 |
PSI-PROF | 1300 | 340 | 380 | 620 | 0.045 | 0.540 | 0.860 | 2100 |
PSI-PROF_beta | 1800 | 430 | 650 | 730 | 0.073 | 0.540 | 0.840 | 2700 |
SGroup-RandomForest | 1500 | 89 | 800 | 600 | -0.210 | 0.590 | 0.940 | 2300 |
SigSci-CREG | 1100 | 370 | 370 | 320 | -0.130 | 0.320 | 0.780 | 1700 |
SigSci-TSENS | 1600 | 370 | 690 | 530 | -0.100 | 0.560 | 0.850 | 2300 |
Stevens-GBR | 2300 | 60 | 1800 | 440 | -0.530 | 0.270 | 0.490 | 3100 |
UGA_flucast-Copycat | 1700 | 190 | 910 | 570 | -0.280 | 0.470 | 0.870 | 2600 |
UGA_flucast-INFLAenza | 1700 | 180 | 1100 | 380 | -0.071 | 0.350 | 0.810 | 2500 |
UGA_flucast-OKeeffe | 430 | 0 | 320 | 110 | -0.680 | 0.270 | 0.800 | 700 |
UGuelph-CompositeCurve | 3100 | 1800 | 770 | 490 | 0.031 | 0.081 | 0.550 | 4400 |
UGuelphensemble-GRYPHON | 1800 | 530 | 860 | 440 | -0.150 | 0.320 | 0.850 | 2700 |
UM-DeepOutbreak | 2000 | 220 | 440 | 1300 | -0.081 | 0.720 | 0.830 | 2100 |
UMass-flusion | 1100 | 240 | 320 | 510 | -0.033 | 0.590 | 0.990 | 1700 |
UMass-trends_ensemble | 1900 | 680 | 840 | 350 | -0.038 | 0.320 | 0.590 | 2500 |
UNC_IDD-InfluPaint | 2800 | 1700 | 850 | 240 | -0.180 | 0.160 | 0.380 | 3600 |
UVAFluX-Ensemble | 1700 | 850 | 450 | 400 | -0.084 | 0.520 | 0.750 | 2300 |
VTSanghani-Ensemble | 2200 | 800 | 1100 | 340 | -0.100 | 0.230 | 0.490 | 3000 |
cfa-flumech | 2400 | 1500 | 430 | 460 | 0.079 | 0.310 | 0.650 | 3400 |
cfarenewal-cfaepimlight | 1500 | 420 | 690 | 410 | -0.230 | 0.440 | 0.830 | 2400 |
fjordhest-ensemble | 1400 | 370 | 450 | 570 | -0.180 | 0.580 | 0.940 | 2200 |