The algorithm fits rolling binomial models to a daily time series of percentages or proportions in order to classify the overall trend during the baseline period as significantly increasing, significantly decreasing, or stable.
classify_trend(
df,
t = date,
data_count = dataCount,
all_count = allCount,
B = 12
)
A data frame, data frame extension (e.g. a tibble), or a lazy data frame.
Name of the column of type Date containing the dates
Name of the column with counts for positive encounters
Name of the column with total counts of encounters
Baseline parameter. The baseline length is the number of days to which each binomial model is fit (default is 12)
A data frame. The first B rows within each group will be missing.
The test statistic and p-value are extracted from each individual model and are used in the following classification scheme:
p-value < 0.01 and sign(test_statistic) > 0 ~ "Significant Increase"
p-value < 0.01 and sign(test_statistic) < 0 ~ "Significant Decrease"
p-value >= 0.01 ~ "Stable"
If there are fewer than 10 encounters/counts in the baseline period, a model is not fit and a value of NA is returned for the test statistic and p-value
# Example 1
df <- data.frame(
date = seq.Date(as.Date("2020-01-01"), as.Date("2020-12-31"), by = 1),
dataCount = floor(runif(366, min = 0, max = 101)),
allCount = floor(runif(366, min = 101, max = 500))
)
df_trend <- classify_trend(df)
head(df_trend)
if (FALSE) {
# Example 2 with Data from NSSP-ESSENCE
library(ggplot2)
library(ggthemes)
myProfile <- Credentials$new(askme("Enter your username:"), askme())
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/timeSeries?
endDate=20Nov20&percentParam=ccddCategory&datasource=va_hosp&startDate=22Aug20
&medicalGroupingSystem=essencesyndromes&userId=2362&aqtTarget=TimeSeries
&ccddCategory=cli%20cc%20with%20cli%20dd%20and%20coronavirus%20dd%20v2
&geographySystem=hospitalstate&detector=probregv2&timeResolution=daily&hasBeenE=1
&stratVal=&multiStratVal=geography&graphOnly=true&numSeries=0&graphOptions=multipleSmall
&seriesPerYear=false&nonZeroComposite=false&removeZeroSeries=true&sigDigits=true
&startMonth=January&stratVal=&multiStratVal=geography&graphOnly=true&numSeries=0
&graphOptions=multipleSmall&seriesPerYear=false&startMonth=January&nonZeroComposite=false"
url <- url %>% gsub("\n", "", .)
api_data <- myProfile$get_api_data(url)
df <- api_data$timeSeriesData
data_trend <- classify_trend(df, data_count = dataCount, all_count = allCount)
# Visualize Montana State trend
pal <- c("#FF0000", "#1D8AFF", "#FFF70E", "grey90")
data_trend %>%
mutate(percent = data_count / all_count * 100) %>%
filter(title == "Montana") %>%
ggplot(., aes(x = t, y = percent)) +
geom_line(color = pal[2], alpha = 0.5) +
geom_hline(yintercept = -0.4, size = 4.5, color = "white") +
geom_segment(aes(x = t, xend = max(t), y = -0.4, yend = -0.4, color = trend_classification), size = 3) +
scale_color_manual(values = pal, name = "Trend Classification") +
theme_few() +
labs(
title = "Percent of Emergency Department Visits with Diagnosed COVID-19",
subtitle = "November 1st, 2020 to February 27th, 2020",
x = "Date",
y = "Percent"
)
}