Read in the dataset of incident case counts
read_data.Rd
Each row of the table corresponds to a single facilities' cases for a reference-date/report-date/disease tuple. We want to aggregate these counts to the level of geographic aggregate/report-date/reference-date/disease.
Usage
read_data(
data_path,
disease = c("COVID-19", "Influenza", "test"),
state_abb,
report_date,
max_reference_date,
min_reference_date
)
Arguments
- data_path
The path to the local file. This could contain a glob and must be in parquet format.
- disease
One of "COVID-19" or "Influenza"
- state_abb
A two-letter uppercase abbreviation. "US" is also an option
- report_date
The desired single report date
- max_reference_date, min_reference_date
The first and last reference dates, inclusive, of the timeseries
Details
We handle two distinct cases for geographic aggregates:
A single state: Subset to facilities in that state only and aggregate up to the state level 2. The US overall: Aggregate over all facilities without any subsetting
Note that we do not apply exclusions here. The exclusions are applied later, after the aggregations. That means that for the US overall, we aggregate over points that might potentially be excluded at the state level. Our recourse in this case is to exclude the US overall aggregate point.