Skip to contents

In this vignette, we use forecasttools to capture the current state of the FluSight forecast hub (see here), and then score the forecasts according to a proper scoring rule. We do the scoring with scoringutils.

Generating a table of forecasts against truth data.

First, we create a table of quantile forecasts formatted to work with scoringutils functions using hub_to_scorable_quantiles(). Generally, we expect users to use hub_to_scorable_quantiles() with a local path to the forecast repository which updates from GitHub by default. In this case, we download a copy of the Hub from GitHub.

hub_url <- "https://github.com/cdcepi/FluSight-forecast-hub"
hub_path <- fs::path(withr::local_tempdir(), "flusight-hub")
download_hub(
  hub_url = hub_url,
  hub_path = hub_path,
  force = TRUE
)

For reproducibility, in this vignette we will examine the Hub as of a specific git commit, b311e92.

The schema for both forecasts and “target data” is standardized across Hubverse hubs. There are two standard formats for Hubverse target data: “timeseries” format and “oracle output” format. To work with hub_to_scorable_quantiles(), a forecast hub must provide “oracle output” format target data.

As of b311e92, the FluSight Hub accepted quantile forecasts for a single “target” quantity: epiweekly incident influenza hospital admissions. It provided a oracle output target data in a file called oracle-output.csv.

hub_to_scorable_quantiles() just needs to be pointed at the local copy of the Hub, and it will extract available quantile forecasts (via hubData::connect_hub()), combine them with the appropriate associated target data fetched from the “oracle output” (via hubData::connect_target_oracle_output()), and produce a scoringutils-ready table via scoringutils::as_forecast_quantile():

forecast_and_target <- hub_to_scorable_quantiles(hub_path)
#>  Some rows containing NA values may be removed. This is fine if not
#>   unexpected.

tail(forecast_and_target)
#>                 model reference_date          target horizon target_end_date
#>                <char>         <Date>          <char>   <int>          <Date>
#> 1: fjordhest-ensemble     2025-05-31 wk inc flu hosp       3      2025-06-21
#> 2: fjordhest-ensemble     2025-05-31 wk inc flu hosp       3      2025-06-21
#> 3: fjordhest-ensemble     2025-05-31 wk inc flu hosp       3      2025-06-21
#> 4: fjordhest-ensemble     2025-05-31 wk inc flu hosp       3      2025-06-21
#> 5: fjordhest-ensemble     2025-05-31 wk inc flu hosp       3      2025-06-21
#> 6: fjordhest-ensemble     2025-05-31 wk inc flu hosp       3      2025-06-21
#>    location output_type quantile_level predicted observed
#>      <char>      <char>          <num>     <num>    <num>
#> 1:       US    quantile          0.800   2934.46     1324
#> 2:       US    quantile          0.850   3301.63     1324
#> 3:       US    quantile          0.900   3916.49     1324
#> 4:       US    quantile          0.950   4582.54     1324
#> 5:       US    quantile          0.975   5290.78     1324
#> 6:       US    quantile          0.990   5901.82     1324

hub_to_scorable_quantiles() performs a left join of the forecast data to the target data, so the table includes forcasts for which there is no available evaluation data, with the observed column set to NA. scoringutils handles this.

Note that while the hubData package identifies individual models with a model_id column, hub_to_scorable_quantiles() renames this to the scoringutils standard column name for models: model.

There were 65 different models that had submitted quantile forecasts to FluSight as of b311e92.

unique(forecast_and_target$model)
#>  [1] "CADPH-FluCAT_Ensemble"      "CEPH-Rtrend_fluH"          
#>  [3] "CFA_Pyrenew-Pyrenew_HE_Flu" "CFA_Pyrenew-Pyrenew_H_Flu" 
#>  [5] "CMU-TimeSeries"             "CMU-climate_baseline"      
#>  [7] "CU-ensemble"                "FluSight-base_seasonal"    
#>  [9] "FluSight-baseline"          "FluSight-ensemble"         
#> [11] "FluSight-lop_norm"          "FluSight-trained_mean"     
#> [13] "FluSight-trained_med"       "GH-model"                  
#> [15] "GT-FluFNP"                  "Gatech-ensemble_point"     
#> [17] "Gatech-ensemble_prob"       "Google_SAI-FluBoostQR"     
#> [19] "ISU_NiemiLab-ENS"           "ISU_NiemiLab-GPE"          
#> [21] "ISU_NiemiLab-NLH"           "ISU_NiemiLab-SIR"          
#> [23] "JHUAPL-DMD"                 "JHUAPL-Morris"             
#> [25] "JHU_CSSE-CSSE_Ensemble"     "LUcompUncertLab-chimera"   
#> [27] "LosAlamos_NAU-CModel_Flu"   "MDPredict-SIRS"            
#> [29] "MIGHTE-Joint"               "MIGHTE-Nsemble"            
#> [31] "MOBS-GLEAM_FLUH"            "Metaculus-cp"              
#> [33] "NEU_ISI-AdaptiveEnsemble"   "NEU_ISI-FluBcast"          
#> [35] "NIH-Flu_ARIMA"              "NU_UCSD-GLEAM_AI_FLUH"     
#> [37] "OHT_JHU-nbxd"               "PSI-PROF"                  
#> [39] "PSI-PROF_beta"              "SGroup-RandomForest"       
#> [41] "SigSci-CREG"                "SigSci-TSENS"              
#> [43] "Stevens-GBR"                "Stevens-ILIForecast"       
#> [45] "UGA_CEID-Walk"              "UGA_flucast-Copycat"       
#> [47] "UGA_flucast-INFLAenza"      "UGA_flucast-OKeeffe"       
#> [49] "UGA_flucast-Scenariocast"   "UGuelph-CompositeCurve"    
#> [51] "UGuelphensemble-GRYPHON"    "UI_CompEpi-EpiGen"         
#> [53] "UM-DeepOutbreak"            "UMass-AR2"                 
#> [55] "UMass-flusion"              "UMass-trends_ensemble"     
#> [57] "UNC_IDD-InfluPaint"         "UVAFluX-CESGCN"            
#> [59] "UVAFluX-Ensemble"           "UVAFluX-OptimWISE"         
#> [61] "VTSanghani-Ensemble"        "VTSanghani-PRIME"          
#> [63] "cfa-flumech"                "cfarenewal-cfaepimlight"   
#> [65] "fjordhest-ensemble"

There are 53 locations, either states or territories, for which there are available forecasts. They are stored as two-digit codes, but can re-code them as the more familiar USPS-style two-letter abbreviations via us_loc_code_to_abbr():

unique(forecast_and_target$location)
#>  [1] "06" "01" "02" "04" "05" "08" "09" "10" "11" "12" "13" "15" "16" "17" "18"
#> [16] "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33"
#> [31] "34" "35" "36" "37" "38" "39" "40" "41" "42" "44" "45" "46" "47" "48" "49"
#> [46] "50" "51" "53" "54" "55" "56" "72" "US"

forecast_and_target <- forecast_and_target |>
  mutate(location = us_loc_code_to_abbr(location))

unique(forecast_and_target$location)
#>  [1] "CA" "AL" "AK" "AZ" "AR" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
#> [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
#> [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
#> [46] "VT" "VA" "WA" "WV" "WI" "WY" "PR" "US"

Tabular scoring of forecasts

scoringutils provides various forecast evaluation metrics including interval scores, skill relative to a chosen baseline, and coverage at different prediction quantiles. Here we show the metrics for US overall forecasts by model for all forecast dates so far, rounding to two significant figures

chosen_location <- "US"

forecast_and_target |>
  filter(location == !!chosen_location) |>
  score() |>
  summarise_scores(
    by = "model",
    relative_skill = TRUE,
    baseline = "FluSight-ensemble"
  ) |>
  summarise_scores(
    by = "model",
    fun = signif,
    digits = 2
  ) |>
  kable()
model wis overprediction underprediction dispersion bias interval_coverage_50 interval_coverage_90 ae_median
CEPH-Rtrend_fluH 2600 110.00 1800 700 -0.3800 0.570 0.84 3700
CFA_Pyrenew-Pyrenew_HE_Flu 220 66.00 21 130 0.1500 0.590 1.00 310
CFA_Pyrenew-Pyrenew_H_Flu 2700 460.00 1300 910 -0.4000 0.470 0.84 3700
CMU-TimeSeries 3700 120.00 2400 1100 -0.3400 0.560 0.86 5000
CMU-climate_baseline 7100 17.00 5400 1600 -0.6300 0.320 0.77 11000
CU-ensemble 2900 350.00 2000 630 -0.2600 0.490 0.77 4000
FluSight-baseline 4000 1400.00 2200 410 0.0640 0.200 0.69 5300
FluSight-ensemble 2600 230.00 1700 680 -0.2300 0.480 0.85 3800
FluSight-lop_norm 2500 150.00 1400 940 -0.2400 0.570 0.93 3700
FluSight-trained_mean 3300 600.00 1300 1400 -0.0830 0.600 0.95 4900
FluSight-trained_med 3200 200.00 1900 1100 -0.1700 0.650 0.90 4500
GH-model 9200 0.00 9100 73 -1.0000 0.000 0.00 9400
GT-FluFNP 2600 490.00 1700 370 -0.2500 0.260 0.51 3400
Google_SAI-FluBoostQR 180 0.00 140 37 -0.8400 0.062 0.62 290
ISU_NiemiLab-ENS 2200 190.00 1600 420 -0.5100 0.330 0.56 3000
ISU_NiemiLab-GPE 4700 730.00 3000 910 -0.1600 0.470 0.74 6400
ISU_NiemiLab-NLH 1800 140.00 1300 330 -0.3800 0.350 0.60 2400
ISU_NiemiLab-SIR 2900 680.00 1700 520 -0.4100 0.270 0.48 4000
JHUAPL-DMD 8500 4500.00 1900 2000 -0.1700 0.600 0.79 12000
JHUAPL-Morris 23000 1300.00 18000 3500 -0.6000 0.083 0.83 35000
JHU_CSSE-CSSE_Ensemble 3700 680.00 1600 1400 -0.0870 0.570 0.89 5500
LUcompUncertLab-chimera 4500 440.00 3500 550 -0.3800 0.280 0.52 5600
LosAlamos_NAU-CModel_Flu 6000 3100.00 1900 1000 0.0034 0.200 0.48 7500
MDPredict-SIRS 3200 470.00 2100 660 -0.0660 0.460 0.81 4500
MIGHTE-Joint 6300 310.00 5400 600 -0.4200 0.340 0.52 7700
MIGHTE-Nsemble 3400 280.00 2700 460 -0.2800 0.360 0.67 4500
MOBS-GLEAM_FLUH 3300 130.00 2500 610 -0.5500 0.280 0.62 4600
Metaculus-cp 9200 0.48 7800 1400 -0.6700 0.360 0.67 12000
NEU_ISI-AdaptiveEnsemble 3500 68.00 2800 600 -0.2800 0.320 0.70 5100
NEU_ISI-FluBcast 5000 41.00 4500 460 -0.7400 0.240 0.55 6400
NIH-Flu_ARIMA 3000 370.00 1600 980 -0.2000 0.390 0.83 3800
NU_UCSD-GLEAM_AI_FLUH 2200 580.00 780 790 -0.0810 0.430 0.88 3500
OHT_JHU-nbxd 5700 430.00 4000 1200 -0.2200 0.490 0.77 8300
PSI-PROF 2700 230.00 1700 780 -0.1400 0.480 0.81 3900
PSI-PROF_beta 3000 350.00 1700 930 -0.0790 0.550 0.82 4400
SGroup-RandomForest 1800 130.00 1000 610 -0.1800 0.460 0.92 2800
SigSci-CREG 1100 360.00 380 320 -0.0370 0.300 0.74 1700
SigSci-TSENS 3200 720.00 1800 650 -0.1300 0.500 0.76 4400
Stevens-GBR 2400 110.00 1900 420 -0.4300 0.220 0.45 3200
Stevens-ILIForecast 8400 150.00 7900 300 -0.7500 0.057 0.15 9300
UGA_CEID-Walk 6700 1300.00 4100 1400 -0.0220 0.520 0.74 9400
UGA_flucast-Copycat 2600 260.00 1400 910 -0.1900 0.520 0.87 4000
UGA_flucast-INFLAenza 3500 460.00 2400 620 -0.0570 0.320 0.72 4900
UGA_flucast-OKeeffe 530 0.00 420 110 -0.7500 0.170 0.75 840
UGA_flucast-Scenariocast 3600 800.00 950 1800 -0.2400 0.590 1.00 5800
UGuelph-CompositeCurve 3300 1300.00 1100 940 -0.0630 0.300 0.62 4500
UGuelphensemble-GRYPHON 2900 940.00 1200 680 0.0130 0.330 0.82 4100
UI_CompEpi-EpiGen 5600 150.00 4700 780 -0.4700 0.560 0.68 6600
UM-DeepOutbreak 4200 320.00 2200 1600 -0.1700 0.670 0.77 5000
UMass-AR2 4500 560.00 3000 870 -0.2800 0.520 0.71 5900
UMass-flusion 2500 270.00 1600 560 -0.1400 0.490 0.87 3400
UMass-trends_ensemble 3300 970.00 1600 650 -0.0900 0.390 0.64 4400
UNC_IDD-InfluPaint 3200 870.00 2100 220 -0.4000 0.100 0.26 3900
UVAFluX-CESGCN 31000 0.00 31000 120 -1.0000 0.000 0.00 31000
UVAFluX-Ensemble 3600 810.00 1600 1200 -0.1300 0.460 0.68 4900
UVAFluX-OptimWISE 380 41.00 56 280 -0.1500 0.800 1.00 490
VTSanghani-Ensemble 2600 930.00 1300 370 -0.0570 0.170 0.38 3500
VTSanghani-PRIME 6900 880.00 4600 1400 -0.3100 0.420 0.61 8800
cfa-flumech 2400 1400.00 530 460 0.0940 0.260 0.61 3500
cfarenewal-cfaepimlight 1900 450.00 930 470 -0.2800 0.310 0.77 2900
fjordhest-ensemble 2700 270.00 1700 810 -0.3000 0.470 0.86 4000