Scoring Flusight submissions using scoringutils
Source:vignettes/scoring-flu-forecasts.Rmd
scoring-flu-forecasts.Rmd
In this vignette, we use forecasttools
to capture the
current state of the FluSight forecast hub (see here), and
then score the forecasts according to a proper scoring
rule. We do the scoring with scoringutils
.
Generating a table of forecasts against truth data.
First, we create a table of quantile forecasts formatted to work with
scoringutils
functions using
hub_to_scorable_quantiles()
. Generally, we expect users to
use hub_to_scorable_quantiles()
with a local path to the
forecast repository which updates from GitHub by default. In this case,
we download a copy of the Hub from GitHub.
hub_url <- "https://github.com/cdcepi/FluSight-forecast-hub"
hub_path <- fs::path(withr::local_tempdir(), "flusight-hub")
download_hub(
hub_url = hub_url,
hub_path = hub_path,
force = TRUE
)
For reproducibility, in this vignette we will examine the Hub as of a
specific git
commit, b311e92
.
The schema for both forecasts and “target
data” is standardized across Hubverse hubs. There are two standard
formats for Hubverse target data: “timeseries” format and “oracle
output” format. To work with hub_to_scorable_quantiles()
, a
forecast hub must provide “oracle output” format target data.
As of b311e92
,
the FluSight Hub accepted quantile forecasts for a single “target”
quantity: epiweekly incident influenza hospital admissions. It provided
a oracle output target data in a file called oracle-output.csv
.
hub_to_scorable_quantiles()
just needs to be pointed at
the local copy of the Hub, and it will extract available quantile
forecasts (via hubData::connect_hub()
), combine them with
the appropriate associated target data fetched from the “oracle output”
(via hubData::connect_target_oracle_output()
), and produce
a scoringutils
-ready table via
scoringutils::as_forecast_quantile()
:
forecast_and_target <- hub_to_scorable_quantiles(hub_path)
#> ℹ Some rows containing NA values may be removed. This is fine if not
#> unexpected.
tail(forecast_and_target)
#> model reference_date target horizon target_end_date
#> <char> <Date> <char> <int> <Date>
#> 1: fjordhest-ensemble 2025-05-31 wk inc flu hosp 3 2025-06-21
#> 2: fjordhest-ensemble 2025-05-31 wk inc flu hosp 3 2025-06-21
#> 3: fjordhest-ensemble 2025-05-31 wk inc flu hosp 3 2025-06-21
#> 4: fjordhest-ensemble 2025-05-31 wk inc flu hosp 3 2025-06-21
#> 5: fjordhest-ensemble 2025-05-31 wk inc flu hosp 3 2025-06-21
#> 6: fjordhest-ensemble 2025-05-31 wk inc flu hosp 3 2025-06-21
#> location output_type quantile_level predicted observed
#> <char> <char> <num> <num> <num>
#> 1: US quantile 0.800 2934.46 1324
#> 2: US quantile 0.850 3301.63 1324
#> 3: US quantile 0.900 3916.49 1324
#> 4: US quantile 0.950 4582.54 1324
#> 5: US quantile 0.975 5290.78 1324
#> 6: US quantile 0.990 5901.82 1324
hub_to_scorable_quantiles()
performs a left
join of the forecast data to the target data, so the table includes
forcasts for which there is no available evaluation data, with the
observed
column set to NA
.
scoringutils
handles this.
Note that while the hubData
package identifies
individual models with a model_id
column,
hub_to_scorable_quantiles()
renames this to the
scoringutils
standard column name for models:
model
.
There were 65 different models that had submitted quantile forecasts
to FluSight as of b311e92
.
unique(forecast_and_target$model)
#> [1] "CADPH-FluCAT_Ensemble" "CEPH-Rtrend_fluH"
#> [3] "CFA_Pyrenew-Pyrenew_HE_Flu" "CFA_Pyrenew-Pyrenew_H_Flu"
#> [5] "CMU-TimeSeries" "CMU-climate_baseline"
#> [7] "CU-ensemble" "FluSight-base_seasonal"
#> [9] "FluSight-baseline" "FluSight-ensemble"
#> [11] "FluSight-lop_norm" "FluSight-trained_mean"
#> [13] "FluSight-trained_med" "GH-model"
#> [15] "GT-FluFNP" "Gatech-ensemble_point"
#> [17] "Gatech-ensemble_prob" "Google_SAI-FluBoostQR"
#> [19] "ISU_NiemiLab-ENS" "ISU_NiemiLab-GPE"
#> [21] "ISU_NiemiLab-NLH" "ISU_NiemiLab-SIR"
#> [23] "JHUAPL-DMD" "JHUAPL-Morris"
#> [25] "JHU_CSSE-CSSE_Ensemble" "LUcompUncertLab-chimera"
#> [27] "LosAlamos_NAU-CModel_Flu" "MDPredict-SIRS"
#> [29] "MIGHTE-Joint" "MIGHTE-Nsemble"
#> [31] "MOBS-GLEAM_FLUH" "Metaculus-cp"
#> [33] "NEU_ISI-AdaptiveEnsemble" "NEU_ISI-FluBcast"
#> [35] "NIH-Flu_ARIMA" "NU_UCSD-GLEAM_AI_FLUH"
#> [37] "OHT_JHU-nbxd" "PSI-PROF"
#> [39] "PSI-PROF_beta" "SGroup-RandomForest"
#> [41] "SigSci-CREG" "SigSci-TSENS"
#> [43] "Stevens-GBR" "Stevens-ILIForecast"
#> [45] "UGA_CEID-Walk" "UGA_flucast-Copycat"
#> [47] "UGA_flucast-INFLAenza" "UGA_flucast-OKeeffe"
#> [49] "UGA_flucast-Scenariocast" "UGuelph-CompositeCurve"
#> [51] "UGuelphensemble-GRYPHON" "UI_CompEpi-EpiGen"
#> [53] "UM-DeepOutbreak" "UMass-AR2"
#> [55] "UMass-flusion" "UMass-trends_ensemble"
#> [57] "UNC_IDD-InfluPaint" "UVAFluX-CESGCN"
#> [59] "UVAFluX-Ensemble" "UVAFluX-OptimWISE"
#> [61] "VTSanghani-Ensemble" "VTSanghani-PRIME"
#> [63] "cfa-flumech" "cfarenewal-cfaepimlight"
#> [65] "fjordhest-ensemble"
There are 53 locations, either states or territories, for which there
are available forecasts. They are stored as two-digit codes, but can
re-code them as the more familiar USPS-style two-letter abbreviations
via us_loc_code_to_abbr()
:
unique(forecast_and_target$location)
#> [1] "06" "01" "02" "04" "05" "08" "09" "10" "11" "12" "13" "15" "16" "17" "18"
#> [16] "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33"
#> [31] "34" "35" "36" "37" "38" "39" "40" "41" "42" "44" "45" "46" "47" "48" "49"
#> [46] "50" "51" "53" "54" "55" "56" "72" "US"
forecast_and_target <- forecast_and_target |>
mutate(location = us_loc_code_to_abbr(location))
unique(forecast_and_target$location)
#> [1] "CA" "AL" "AK" "AZ" "AR" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
#> [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
#> [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
#> [46] "VT" "VA" "WA" "WV" "WI" "WY" "PR" "US"
Tabular scoring of forecasts
scoringutils
provides various forecast evaluation
metrics including interval scores, skill relative to a chosen baseline,
and coverage at different prediction quantiles. Here we show the metrics
for US overall forecasts by model for all forecast dates so far,
rounding to two significant figures
chosen_location <- "US"
forecast_and_target |>
filter(location == !!chosen_location) |>
score() |>
summarise_scores(
by = "model",
relative_skill = TRUE,
baseline = "FluSight-ensemble"
) |>
summarise_scores(
by = "model",
fun = signif,
digits = 2
) |>
kable()
model | wis | overprediction | underprediction | dispersion | bias | interval_coverage_50 | interval_coverage_90 | ae_median |
---|---|---|---|---|---|---|---|---|
CEPH-Rtrend_fluH | 2600 | 110.00 | 1800 | 700 | -0.3800 | 0.570 | 0.84 | 3700 |
CFA_Pyrenew-Pyrenew_HE_Flu | 220 | 66.00 | 21 | 130 | 0.1500 | 0.590 | 1.00 | 310 |
CFA_Pyrenew-Pyrenew_H_Flu | 2700 | 460.00 | 1300 | 910 | -0.4000 | 0.470 | 0.84 | 3700 |
CMU-TimeSeries | 3700 | 120.00 | 2400 | 1100 | -0.3400 | 0.560 | 0.86 | 5000 |
CMU-climate_baseline | 7100 | 17.00 | 5400 | 1600 | -0.6300 | 0.320 | 0.77 | 11000 |
CU-ensemble | 2900 | 350.00 | 2000 | 630 | -0.2600 | 0.490 | 0.77 | 4000 |
FluSight-baseline | 4000 | 1400.00 | 2200 | 410 | 0.0640 | 0.200 | 0.69 | 5300 |
FluSight-ensemble | 2600 | 230.00 | 1700 | 680 | -0.2300 | 0.480 | 0.85 | 3800 |
FluSight-lop_norm | 2500 | 150.00 | 1400 | 940 | -0.2400 | 0.570 | 0.93 | 3700 |
FluSight-trained_mean | 3300 | 600.00 | 1300 | 1400 | -0.0830 | 0.600 | 0.95 | 4900 |
FluSight-trained_med | 3200 | 200.00 | 1900 | 1100 | -0.1700 | 0.650 | 0.90 | 4500 |
GH-model | 9200 | 0.00 | 9100 | 73 | -1.0000 | 0.000 | 0.00 | 9400 |
GT-FluFNP | 2600 | 490.00 | 1700 | 370 | -0.2500 | 0.260 | 0.51 | 3400 |
Google_SAI-FluBoostQR | 180 | 0.00 | 140 | 37 | -0.8400 | 0.062 | 0.62 | 290 |
ISU_NiemiLab-ENS | 2200 | 190.00 | 1600 | 420 | -0.5100 | 0.330 | 0.56 | 3000 |
ISU_NiemiLab-GPE | 4700 | 730.00 | 3000 | 910 | -0.1600 | 0.470 | 0.74 | 6400 |
ISU_NiemiLab-NLH | 1800 | 140.00 | 1300 | 330 | -0.3800 | 0.350 | 0.60 | 2400 |
ISU_NiemiLab-SIR | 2900 | 680.00 | 1700 | 520 | -0.4100 | 0.270 | 0.48 | 4000 |
JHUAPL-DMD | 8500 | 4500.00 | 1900 | 2000 | -0.1700 | 0.600 | 0.79 | 12000 |
JHUAPL-Morris | 23000 | 1300.00 | 18000 | 3500 | -0.6000 | 0.083 | 0.83 | 35000 |
JHU_CSSE-CSSE_Ensemble | 3700 | 680.00 | 1600 | 1400 | -0.0870 | 0.570 | 0.89 | 5500 |
LUcompUncertLab-chimera | 4500 | 440.00 | 3500 | 550 | -0.3800 | 0.280 | 0.52 | 5600 |
LosAlamos_NAU-CModel_Flu | 6000 | 3100.00 | 1900 | 1000 | 0.0034 | 0.200 | 0.48 | 7500 |
MDPredict-SIRS | 3200 | 470.00 | 2100 | 660 | -0.0660 | 0.460 | 0.81 | 4500 |
MIGHTE-Joint | 6300 | 310.00 | 5400 | 600 | -0.4200 | 0.340 | 0.52 | 7700 |
MIGHTE-Nsemble | 3400 | 280.00 | 2700 | 460 | -0.2800 | 0.360 | 0.67 | 4500 |
MOBS-GLEAM_FLUH | 3300 | 130.00 | 2500 | 610 | -0.5500 | 0.280 | 0.62 | 4600 |
Metaculus-cp | 9200 | 0.48 | 7800 | 1400 | -0.6700 | 0.360 | 0.67 | 12000 |
NEU_ISI-AdaptiveEnsemble | 3500 | 68.00 | 2800 | 600 | -0.2800 | 0.320 | 0.70 | 5100 |
NEU_ISI-FluBcast | 5000 | 41.00 | 4500 | 460 | -0.7400 | 0.240 | 0.55 | 6400 |
NIH-Flu_ARIMA | 3000 | 370.00 | 1600 | 980 | -0.2000 | 0.390 | 0.83 | 3800 |
NU_UCSD-GLEAM_AI_FLUH | 2200 | 580.00 | 780 | 790 | -0.0810 | 0.430 | 0.88 | 3500 |
OHT_JHU-nbxd | 5700 | 430.00 | 4000 | 1200 | -0.2200 | 0.490 | 0.77 | 8300 |
PSI-PROF | 2700 | 230.00 | 1700 | 780 | -0.1400 | 0.480 | 0.81 | 3900 |
PSI-PROF_beta | 3000 | 350.00 | 1700 | 930 | -0.0790 | 0.550 | 0.82 | 4400 |
SGroup-RandomForest | 1800 | 130.00 | 1000 | 610 | -0.1800 | 0.460 | 0.92 | 2800 |
SigSci-CREG | 1100 | 360.00 | 380 | 320 | -0.0370 | 0.300 | 0.74 | 1700 |
SigSci-TSENS | 3200 | 720.00 | 1800 | 650 | -0.1300 | 0.500 | 0.76 | 4400 |
Stevens-GBR | 2400 | 110.00 | 1900 | 420 | -0.4300 | 0.220 | 0.45 | 3200 |
Stevens-ILIForecast | 8400 | 150.00 | 7900 | 300 | -0.7500 | 0.057 | 0.15 | 9300 |
UGA_CEID-Walk | 6700 | 1300.00 | 4100 | 1400 | -0.0220 | 0.520 | 0.74 | 9400 |
UGA_flucast-Copycat | 2600 | 260.00 | 1400 | 910 | -0.1900 | 0.520 | 0.87 | 4000 |
UGA_flucast-INFLAenza | 3500 | 460.00 | 2400 | 620 | -0.0570 | 0.320 | 0.72 | 4900 |
UGA_flucast-OKeeffe | 530 | 0.00 | 420 | 110 | -0.7500 | 0.170 | 0.75 | 840 |
UGA_flucast-Scenariocast | 3600 | 800.00 | 950 | 1800 | -0.2400 | 0.590 | 1.00 | 5800 |
UGuelph-CompositeCurve | 3300 | 1300.00 | 1100 | 940 | -0.0630 | 0.300 | 0.62 | 4500 |
UGuelphensemble-GRYPHON | 2900 | 940.00 | 1200 | 680 | 0.0130 | 0.330 | 0.82 | 4100 |
UI_CompEpi-EpiGen | 5600 | 150.00 | 4700 | 780 | -0.4700 | 0.560 | 0.68 | 6600 |
UM-DeepOutbreak | 4200 | 320.00 | 2200 | 1600 | -0.1700 | 0.670 | 0.77 | 5000 |
UMass-AR2 | 4500 | 560.00 | 3000 | 870 | -0.2800 | 0.520 | 0.71 | 5900 |
UMass-flusion | 2500 | 270.00 | 1600 | 560 | -0.1400 | 0.490 | 0.87 | 3400 |
UMass-trends_ensemble | 3300 | 970.00 | 1600 | 650 | -0.0900 | 0.390 | 0.64 | 4400 |
UNC_IDD-InfluPaint | 3200 | 870.00 | 2100 | 220 | -0.4000 | 0.100 | 0.26 | 3900 |
UVAFluX-CESGCN | 31000 | 0.00 | 31000 | 120 | -1.0000 | 0.000 | 0.00 | 31000 |
UVAFluX-Ensemble | 3600 | 810.00 | 1600 | 1200 | -0.1300 | 0.460 | 0.68 | 4900 |
UVAFluX-OptimWISE | 380 | 41.00 | 56 | 280 | -0.1500 | 0.800 | 1.00 | 490 |
VTSanghani-Ensemble | 2600 | 930.00 | 1300 | 370 | -0.0570 | 0.170 | 0.38 | 3500 |
VTSanghani-PRIME | 6900 | 880.00 | 4600 | 1400 | -0.3100 | 0.420 | 0.61 | 8800 |
cfa-flumech | 2400 | 1400.00 | 530 | 460 | 0.0940 | 0.260 | 0.61 | 3500 |
cfarenewal-cfaepimlight | 1900 | 450.00 | 930 | 470 | -0.2800 | 0.310 | 0.77 | 2900 |
fjordhest-ensemble | 2700 | 270.00 | 1700 | 810 | -0.3000 | 0.470 | 0.86 | 4000 |