Scoring Flusight submissions using scoringutils
Source:vignettes/scoring-flu-forecasts.Rmd
scoring-flu-forecasts.Rmd
library(forecasttools)
library(scoringutils)
#> Error in get(paste0(generic, ".", class), envir = get_method_env()) :
#> object 'type_sum.accel' not found
library(dplyr)
library(ggplot2)
library(knitr)
In this vignette, we use forecasttools
to capture the
current state of the FluSight forecast hub (see here), and
then score the forecasts according to a proper scoring
rule. We do the scoring with scoringutils
.
Generating a table of forecasts against truth data.
First, we create a table of forecast predictions formatted to work
with scoringutils
functions using
hub_to_scorable_quantiles()
. Generally, we expect users to
use hub_to_scorable_quantiles()
with a local path to the
forecast repository which updates from GitHub by default. In this case,
we download the hub first.
hub_url <- "https://github.com/cdcepi/FluSight-forecast-hub"
hub_path <- fs::path(withr::local_tempdir(), "flusight-hub")
download_hub(
hub_url = hub_url,
hub_path = hub_path,
force = TRUE
)
forecast_and_target <- hub_to_scorable_quantiles(hub_path)
#> ℹ Updating superseded URL `Infectious-Disease-Modeling-hubs` to `hubverse-org`
#> ℹ Updating superseded URL `Infectious-Disease-Modeling-hubs` to `hubverse-org`
#> New names:
#> Rows: 6148 Columns: 6
#> ── Column specification
#> ──────────────────────────────────────────────────────── Delimiter: "," chr
#> (2): location, location_name dbl (3): ...1, value, weekly_rate date (1): date
#> ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
#> Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> • `` -> `...1`
There are 39 different models that have been submitted to FluSight.
unique(forecast_and_target$model_id)
#> [1] "CADPH-FluCAT_Ensemble" "CEPH-Rtrend_fluH"
#> [3] "CMU-TimeSeries" "CU-ensemble"
#> [5] "FluSight-baseline" "FluSight-ensemble"
#> [7] "FluSight-lop_norm" "GH-model"
#> [9] "GT-FluFNP" "ISU_NiemiLab-ENS"
#> [11] "ISU_NiemiLab-NLH" "ISU_NiemiLab-SIR"
#> [13] "JHU_CSSE-CSSE_Ensemble" "LUcompUncertLab-chimera"
#> [15] "LosAlamos_NAU-CModel_Flu" "MIGHTE-Nsemble"
#> [17] "MOBS-GLEAM_FLUH" "NIH-Flu_ARIMA"
#> [19] "NU_UCSD-GLEAM_AI_FLUH" "PSI-PROF"
#> [21] "PSI-PROF_beta" "SGroup-RandomForest"
#> [23] "SigSci-CREG" "SigSci-TSENS"
#> [25] "Stevens-GBR" "UGA_flucast-Copycat"
#> [27] "UGA_flucast-INFLAenza" "UGA_flucast-OKeeffe"
#> [29] "UGuelph-CompositeCurve" "UGuelphensemble-GRYPHON"
#> [31] "UM-DeepOutbreak" "UMass-flusion"
#> [33] "UMass-trends_ensemble" "UNC_IDD-InfluPaint"
#> [35] "UVAFluX-Ensemble" "VTSanghani-Ensemble"
#> [37] "cfa-flumech" "cfarenewal-cfaepimlight"
#> [39] "fjordhest-ensemble"
There are 53 locations, either states or territories, for which there
are available fforecasts. They are stored as two-digit codes, but can
re-code them as the more familiar USPS-style two-letter abbreviations
via us_loc_code_to_abbr()
:
unique(forecast_and_target$location)
#> [1] "06" "01" "02" "04" "05" "08" "09" "10" "11" "12" "13" "15" "16" "17" "18"
#> [16] "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33"
#> [31] "34" "35" "36" "37" "38" "39" "40" "41" "42" "44" "45" "46" "47" "48" "49"
#> [46] "50" "51" "53" "54" "55" "56" "72" "US"
forecast_and_target <- forecast_and_target |>
mutate(location = us_loc_code_to_abbr(location))
unique(forecast_and_target$location)
#> [1] "CA" "AL" "AK" "AZ" "AR" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
#> [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
#> [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
#> [46] "VT" "VA" "WA" "WV" "WI" "WY" "PR" "US"
Tabular scoring of forecasts
scoringutils
provides various forecast evaluation
metrics including interval scores, skill relative to a chosen baseline,
and coverage at different prediction quantiles. Here we show the metrics
for US overall forecasts by model for all forecast dates so far,
rounding to two significant figures
chosen_location <- "US"
forecast_and_target |>
filter(location == !!chosen_location) |>
score() |>
summarise_scores(
by = "model_id",
relative_skill = TRUE,
baseline = "FluSight-ensemble"
) |>
summarise_scores(
by = "model_id",
fun = signif,
digits = 2
) |>
kable()
model_id | wis | overprediction | underprediction | dispersion | bias | interval_coverage_50 | interval_coverage_90 | ae_median |
---|---|---|---|---|---|---|---|---|
CEPH-Rtrend_fluH | 1500 | 210 | 800 | 490 | -0.340 | 0.540 | 0.840 | 2400 |
CMU-TimeSeries | 1900 | 230 | 740 | 900 | -0.220 | 0.600 | 0.950 | 2900 |
CU-ensemble | 1700 | 570 | 680 | 480 | -0.220 | 0.510 | 0.780 | 2500 |
FluSight-baseline | 1700 | 610 | 870 | 210 | 0.110 | 0.086 | 0.640 | 2400 |
FluSight-ensemble | 1200 | 270 | 510 | 440 | -0.190 | 0.560 | 0.920 | 1900 |
FluSight-lop_norm | 1200 | 200 | 410 | 590 | -0.180 | 0.640 | 0.990 | 1900 |
GH-model | 7400 | 0 | 7200 | 170 | -0.970 | 0.022 | 0.089 | 7800 |
GT-FluFNP | 2700 | 580 | 1800 | 360 | -0.300 | 0.260 | 0.450 | 3600 |
ISU_NiemiLab-ENS | 1900 | 190 | 1400 | 400 | -0.520 | 0.330 | 0.570 | 2600 |
ISU_NiemiLab-NLH | 1600 | 140 | 1100 | 310 | -0.380 | 0.410 | 0.610 | 2100 |
ISU_NiemiLab-SIR | 2600 | 590 | 1500 | 510 | -0.480 | 0.310 | 0.550 | 3600 |
JHU_CSSE-CSSE_Ensemble | 900 | 160 | 310 | 440 | -0.059 | 0.560 | 0.960 | 1400 |
LUcompUncertLab-chimera | 1700 | 690 | 770 | 260 | -0.280 | 0.200 | 0.510 | 2300 |
LosAlamos_NAU-CModel_Flu | 7000 | 5200 | 1600 | 190 | -0.190 | 0.048 | 0.210 | 7700 |
MIGHTE-Nsemble | 1300 | 310 | 600 | 380 | -0.120 | 0.530 | 0.830 | 2000 |
MOBS-GLEAM_FLUH | 1200 | 140 | 490 | 610 | -0.330 | 0.630 | 0.940 | 1900 |
NIH-Flu_ARIMA | 2200 | 100 | 710 | 1400 | -0.160 | 0.570 | 0.930 | 2100 |
NU_UCSD-GLEAM_AI_FLUH | 1900 | 520 | 580 | 760 | -0.130 | 0.590 | 0.890 | 2900 |
PSI-PROF | 1300 | 340 | 380 | 620 | 0.045 | 0.540 | 0.860 | 2100 |
PSI-PROF_beta | 1800 | 430 | 650 | 730 | 0.073 | 0.540 | 0.840 | 2700 |
SGroup-RandomForest | 1500 | 89 | 800 | 600 | -0.210 | 0.590 | 0.940 | 2300 |
SigSci-CREG | 1100 | 370 | 370 | 320 | -0.130 | 0.320 | 0.780 | 1700 |
SigSci-TSENS | 1600 | 370 | 690 | 530 | -0.100 | 0.560 | 0.850 | 2300 |
Stevens-GBR | 2300 | 60 | 1800 | 440 | -0.530 | 0.270 | 0.490 | 3100 |
UGA_flucast-Copycat | 1700 | 190 | 910 | 570 | -0.280 | 0.470 | 0.870 | 2600 |
UGA_flucast-INFLAenza | 1700 | 180 | 1100 | 380 | -0.071 | 0.350 | 0.810 | 2500 |
UGA_flucast-OKeeffe | 430 | 0 | 320 | 110 | -0.680 | 0.270 | 0.800 | 700 |
UGuelph-CompositeCurve | 3100 | 1800 | 770 | 490 | 0.031 | 0.081 | 0.550 | 4400 |
UGuelphensemble-GRYPHON | 1800 | 530 | 860 | 440 | -0.150 | 0.320 | 0.850 | 2700 |
UM-DeepOutbreak | 2000 | 220 | 440 | 1300 | -0.081 | 0.720 | 0.830 | 2100 |
UMass-flusion | 1100 | 240 | 320 | 510 | -0.033 | 0.590 | 0.990 | 1700 |
UMass-trends_ensemble | 1900 | 680 | 840 | 350 | -0.038 | 0.320 | 0.590 | 2500 |
UNC_IDD-InfluPaint | 2800 | 1700 | 850 | 240 | -0.180 | 0.160 | 0.380 | 3600 |
UVAFluX-Ensemble | 1700 | 850 | 450 | 400 | -0.084 | 0.520 | 0.750 | 2300 |
VTSanghani-Ensemble | 2200 | 800 | 1100 | 340 | -0.100 | 0.230 | 0.490 | 3000 |
cfa-flumech | 2400 | 1500 | 430 | 460 | 0.079 | 0.310 | 0.650 | 3400 |
cfarenewal-cfaepimlight | 1500 | 420 | 690 | 410 | -0.230 | 0.440 | 0.830 | 2400 |
fjordhest-ensemble | 1400 | 370 | 450 | 570 | -0.180 | 0.580 | 0.940 | 2200 |