The surveytable
package can be used with different types
of data. Here are some examples.
Unweighted data
Unweighted data is stored in a data.frame
or a similar
object. One example of such a similar object is a tibble
(tbl
), which can be produced by the tibble
package. data.frame
s and similar objects do not contain
information about survey design variables. Thus,
surveytable
treats these objects as unweighted data, with
each observation having a weight of 1
.
The example below illustrates how to use surveytable
with unweighted data. We
- create a tibble with unweighted data;
- tell
surveytable
to work with this object; and - tabulate the
SPECCAT
(physician specialty) variable from these data.
library(surveytable)
library(tibble)
mytbl = as_tibble(namcs2019sv_df)
set_survey(mytbl)
#> * mytbl: the survey is unweighted.
Survey info {mytbl (unweighted)} | ||
Variables | Observations | Design |
---|---|---|
tab("SPECCAT")
Type of specialty (Primary, Medical, Surgical) {mytbl (unweighted)} | |||||||||
Level | n | Number | SE | LL | UL | Percent | SE | LL | UL |
---|---|---|---|---|---|---|---|---|---|
N = 8250. |
Complex survey
A complex survey is defined by its data as well as its survey design variables. In R, a complex survey is stored in a survey object. This object, in addition to containing the survey data, also contains information about the survey design variables. These include variables that specify such things as:
- cluster ID’s, also known as primary sampling units (PSU’s);
- cluster sampling probabilities;
- strata;
- finite population correction; and
- sampling weights.
You can convert a data.frame
or a similar object to a
survey object using the survey::svydesign()
command. Before
using this command, you should consult the documentation for the survey
that you are analyzing to find out what the survey design variables
are.
The example below illustrates how to use surveytable
with a complex survey. We
- create a survey object;
- tell
surveytable
to work with this object; and - tabulate the
SPECCAT
variable from the survey.
library(surveytable)
mysurvey = survey::svydesign(ids = ~ CPSUM
, strata = ~ CSTRATM
, weights = ~ PATWT
, data = namcs2019sv_df)
set_survey(mysurvey)
Survey info {mysurvey} | ||
Variables | Observations | Design |
---|---|---|
tab("SPECCAT")
Type of specialty (Primary, Medical, Surgical) {mysurvey} | |||||||||
Level | n | Number | SE | LL | UL | Percent | SE | LL | UL |
---|---|---|---|---|---|---|---|---|---|
N = 8250. |
Spark-based complex survey
Especially if you are working with big data, that data might be
stored in a database, such as Apache Spark. mysurvey
can
work with a survey whose data lives in a database.
The example below illustrates how to use surveytable
with a Spark-based complex survey. We
- connect to Spark;
- copy some data into a Spark DataFrame;
- create a Spark-based survey object;
- tell
surveytable
to work with this object; - tabulate the
SPECCAT
variable from the survey; and finally - disconnect from Spark.
Note that, for this example, we are using a "local"
Spark connection – how you connect to Spark depends on your setup.
library(surveytable)
library(sparklyr)
#> Warning: package 'sparklyr' was built under R version 4.4.3
library(dplyr)
sc = spark_connect(master = "local")
#> * Using Spark: 3.5.5
mysparkdf = copy_to(sc, namcs2019sv_df)
mysurvey = survey::svydesign(ids = ~CPSUM, strata = ~CSTRATM
, weights = ~PATWT, data = mysparkdf)
set_survey(mysurvey)
Survey info {mysurvey} | ||
Variables | Observations | Design |
---|---|---|
tab("SPECCAT")
SPECCAT {mysurvey} | |||||||||
Level | n | Number | SE | LL | UL | Percent | SE | LL | UL |
---|---|---|---|---|---|---|---|---|---|
N = 8250. |
spark_disconnect_all()
#> [1] 1
Complex survey with replicate weights
Some surveys, instead of specifying survey design variables, specify replicate weights. They might do this, for example, for privacy reasons.
You can convert a data.frame
or a similar object to a
survey object that uses replicate weights using the
survey::svrepdesign()
command.
The example below illustrates how to use surveytable
with a complex survey that uses replicate weights. We
- create fake replicate and sampling weights, for use in this example;
- create a replicate weights-based survey object;
- tell
surveytable
to work with this object; and - tabulate the
SPECCAT
variable from the survey.
library(surveytable)
mydata = namcs2019sv_df
nr = nrow(mydata)
set.seed(42)
for (ii in 1:20) {
mydata[,paste0("fake_repw", ii)] = runif(nr, 10, 1000)
}
mydata$fake_w = runif(nr, 10, 1000)
mysurvey = survey::svrepdesign(
repweights = "fake_repw*"
, weights = ~fake_w
, data = mydata
)
set_survey(mysurvey)
Survey info {mysurvey} | ||
Variables | Observations | Design |
---|---|---|
tab("SPECCAT")
Type of specialty (Primary, Medical, Surgical) {mysurvey} | |||||||||
Level | n | Number | SE | LL | UL | Percent | SE | LL | UL |
---|---|---|---|---|---|---|---|---|---|
N = 8250. |