This example uses the National Study of Long-Term Care Providers
(NSLTCP) Residential Care Community (RCC) Services User (SU) 2018 Public
Use File (PUF) to replicate the estimates from a report called Residential
Care Community Resident Characteristics: United States, 2018. “The
survey used a sample of residential care community residents, obtained
from a frame that was constructed from lists of licensed residential
care communities acquired from the state licensing agencies in each of
the 50 states and the District of Columbia.”
The RCC SU 2018 survey comes with the surveytable
package, for use in examples, in an object called
rccsu2018
.
Begin
Begin by loading the surveytable
package.
Now, specify the survey that you’d like to analyze.
Survey info {RCC SU 2018 PUF}
|
Variables
|
Observations
|
Design
|
81
|
904
|
Stratified Independent Sampling design svydesign(ids = ~1, strata =
~pufstrata2 + su_facid, fpc = ~pufpopfac2, weights = ~suwt, data = d1)
|
Check the survey name, survey design variables, and the number of
observations to verify that it all looks correct.
For this example, we do want to turn on certain NCHS-specific
options, such as identifying low-precision estimates. If you do not care
about identifying low-precision estimates, you can skip this command. To
turn on the NCHS-specific options:
set_opts(mode = "NCHS")
## * Mode: NCHS.
Alternatively, you can combine these two commands into a single
command, like so:
set_survey(rccsu2018, mode = "NCHS")
## * Mode: NCHS.
Survey info {RCC SU 2018 PUF}
|
Variables
|
Observations
|
Design
|
81
|
904
|
Stratified Independent Sampling design svydesign(ids = ~1, strata =
~pufstrata2 + su_facid, fpc = ~pufpopfac2, weights = ~suwt, data = d1)
|
This figure shows the percentage of residents by sex, race /
ethnicity, and age group.
Sex.
Resident’s gender {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Male
|
272
|
299
|
24
|
255
|
352
|
32.6
|
2.5
|
27.7
|
37.7
|
Female
|
632
|
619
|
26
|
570
|
673
|
67.4
|
2.5
|
62.3
|
72.3
|
Race / ethnicity.
Variables beginning with ‘race’ {RCC SU 2018 PUF}
|
Variable
|
Class
|
Long name
|
raceeth2
|
factor
|
Resident’s race/ethnicity
|
Resident’s race/ethnicity {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Flags
|
White
|
816
|
821
|
23
|
776
|
868
|
89.3
|
1.8
|
85.4
|
92.6
|
|
Black
|
40
|
54
|
14
|
31
|
95
|
5.9
|
1.5
|
3.3
|
9.6
|
|
Hispanic
|
23
|
18
|
5
|
9
|
34
|
1.9
|
0.6
|
1.0
|
3.4
|
|
Other
|
25
|
26
|
8
|
12
|
55
|
2.8
|
0.9
|
1.3
|
5.3
|
Cx
|
In the published figure, the Hispanic and Other categories have been
merged into a single category called “Another race or ethnicity”. We can
do that using the var_collapse()
function.
var_collapse("raceeth2"
, "Another race or ethnicity"
, c("Hispanic", "Other"))
tab("raceeth2")
Resident’s race/ethnicity {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
White
|
816
|
821
|
23
|
776
|
868
|
89.3
|
1.8
|
85.4
|
92.6
|
Black
|
40
|
54
|
14
|
31
|
95
|
5.9
|
1.5
|
3.3
|
9.6
|
Another race or ethnicity
|
48
|
44
|
10
|
27
|
70
|
4.8
|
1.1
|
2.9
|
7.3
|
Age group.
Variables beginning with ‘age’ {RCC SU 2018 PUF}
|
Variable
|
Class
|
Long name
|
age2
|
numeric
|
Resident’s age
|
age2
is a numeric variable. We need to create a
categorical variable based on this numeric variable. This is done using
the var_cut()
function.
var_cut("Age", "age2"
, c(-Inf, 64, 74, 84, Inf)
, c("Under 65", "65-74", "75-84", "85 and over") )
tab("Age")
Age {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Under 65
|
75
|
69
|
11
|
49
|
96
|
7.5
|
1.2
|
5.2
|
10.3
|
65-74
|
98
|
111
|
17
|
82
|
151
|
12.1
|
1.8
|
8.8
|
16.1
|
75-84
|
221
|
235
|
22
|
195
|
282
|
25.5
|
2.2
|
21.3
|
30.2
|
85 and over
|
510
|
504
|
26
|
456
|
557
|
54.9
|
2.6
|
49.7
|
60.0
|
This figure shows the percentage of residents with Medicaid, overall
and by age group.
Used Medicaid to pay for services {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
674
|
674
|
24
|
628
|
723
|
73.3
|
2.1
|
68.9
|
77.4
|
TRUE
|
143
|
160
|
18
|
128
|
201
|
17.5
|
1.9
|
13.9
|
21.5
|
<N/A>
|
87
|
85
|
11
|
64
|
111
|
9.2
|
1.2
|
6.9
|
11.9
|
As we can see, for some observations, the value of this variable is
unknown (it’s missing or NA
). The above command calculates
percentages based on all observations, including the ones with missing
(NA
) values. However, in the published figure, the
percentages are based on the knowns only. To exclude the
NA
’s from the calculation, use the drop_na
argument:
tab("medicaid2", drop_na = TRUE)
Used Medicaid to pay for services (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
674
|
674
|
24
|
628
|
723
|
80.8
|
2.1
|
76.4
|
84.7
|
TRUE
|
143
|
160
|
18
|
128
|
201
|
19.2
|
2.1
|
15.3
|
23.6
|
Note that the table title alerts you to the fact that you are using
known values only.
By age group:
Used Medicaid to pay for services (Age = Under 65) (knowns only) {RCC SU
2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Flags
|
FALSE
|
31
|
30
|
8
|
17
|
56
|
49.8
|
9.5
|
30.3
|
69.4
|
Px
|
TRUE
|
35
|
31
|
8
|
18
|
52
|
50.2
|
9.5
|
30.6
|
69.7
|
Px
|
Used Medicaid to pay for services (Age = 65-74) (knowns only) {RCC SU
2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Flags
|
FALSE
|
53
|
55
|
11
|
37
|
83
|
62
|
7.6
|
45.4
|
76.7
|
Px
|
TRUE
|
30
|
34
|
8
|
20
|
58
|
38
|
7.6
|
23.3
|
54.6
|
Px
|
Used Medicaid to pay for services (Age = 75-84) (knowns only) {RCC SU
2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
163
|
167
|
18
|
134
|
208
|
79.1
|
4.5
|
68.6
|
87.3
|
TRUE
|
33
|
44
|
11
|
26
|
75
|
20.9
|
4.5
|
12.7
|
31.4
|
Used Medicaid to pay for services (Age = 85 and over) (knowns only) {RCC
SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
427
|
421
|
23
|
378
|
469
|
89.1
|
2.2
|
83.8
|
93.1
|
TRUE
|
45
|
52
|
11
|
33
|
81
|
10.9
|
2.2
|
6.9
|
16.2
|
Note that according to the NCHS presentation criteria, some of the
percentages should be suppressed.
(Figure 3 is slightly more involved, so we’ll do it next.)
- This figure shows the percentage of residents who have one of a
select set of chronic conditions.
- In addition, it shows the distribution of residents by the number of
conditions.
Here’s a table for high blood pressure.
Resident diagnosed with high blood pressure {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
397
|
404
|
25
|
357
|
457
|
44.0
|
2.5
|
38.9
|
49.1
|
TRUE
|
481
|
498
|
26
|
449
|
552
|
54.2
|
2.6
|
49.0
|
59.3
|
<N/A>
|
26
|
17
|
4
|
10
|
28
|
1.8
|
0.4
|
1.1
|
2.9
|
Once again, unknown values (NA
) are present, while the
figure is based on knowns only. Therefore, we again will use the
drop_na
argument:
tab("hbp", "alz", "depress", "arth", "diabetes", "heartdise", "osteo"
, "copd", "stroke", "cancer"
, drop_na = TRUE)
Resident diagnosed with high blood pressure (knowns only) {RCC SU 2018
PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
397
|
404
|
25
|
357
|
457
|
44.8
|
2.6
|
39.7
|
50.0
|
TRUE
|
481
|
498
|
26
|
449
|
552
|
55.2
|
2.6
|
50.0
|
60.3
|
Resident diagnosed with Alzheimer’s/dementia (knowns only) {RCC SU 2018
PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
538
|
598
|
26
|
549
|
651
|
66.3
|
2.1
|
62.0
|
70.5
|
TRUE
|
340
|
304
|
19
|
268
|
344
|
33.7
|
2.1
|
29.5
|
38.0
|
Resident diagnosed with depression (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
629
|
654
|
24
|
609
|
703
|
72.5
|
2.1
|
68.1
|
76.6
|
TRUE
|
249
|
248
|
20
|
211
|
292
|
27.5
|
2.1
|
23.4
|
31.9
|
Resident diagnosed with arthritis (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
683
|
717
|
26
|
668
|
770
|
79.5
|
2
|
75.3
|
83.3
|
TRUE
|
195
|
185
|
18
|
152
|
224
|
20.5
|
2
|
16.7
|
24.7
|
Resident diagnosed with diabetes (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
719
|
718
|
23
|
675
|
765
|
79.6
|
2.1
|
75.3
|
83.6
|
TRUE
|
159
|
184
|
20
|
148
|
227
|
20.4
|
2.1
|
16.4
|
24.7
|
Resident diagnosed with heart disease (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
739
|
746
|
25
|
697
|
798
|
82.7
|
1.8
|
78.7
|
86.2
|
TRUE
|
139
|
156
|
17
|
126
|
193
|
17.3
|
1.8
|
13.8
|
21.3
|
Resident diagnosed with osteoporosis (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
766
|
794
|
24
|
749
|
842
|
88
|
1.4
|
84.9
|
90.7
|
TRUE
|
112
|
108
|
13
|
85
|
137
|
12
|
1.4
|
9.3
|
15.1
|
Resident diagnosed with COPD (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
779
|
806
|
24
|
759
|
856
|
89.4
|
1.6
|
85.9
|
92.3
|
TRUE
|
99
|
96
|
14
|
71
|
129
|
10.6
|
1.6
|
7.7
|
14.1
|
Resident diagnosed with stroke (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
789
|
807
|
23
|
764
|
853
|
89.5
|
1.5
|
86.1
|
92.4
|
TRUE
|
89
|
94
|
14
|
70
|
128
|
10.5
|
1.5
|
7.6
|
13.9
|
Resident diagnosed with cancer (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
FALSE
|
806
|
824
|
23
|
780
|
871
|
91.4
|
1.6
|
87.7
|
94.2
|
TRUE
|
72
|
78
|
14
|
53
|
114
|
8.6
|
1.6
|
5.8
|
12.3
|
Advanced variable editing
surveytable
provides a number of functions to create
or modify survey variables.
We saw a couple of these above: var_collapse()
and
var_cut()
.
Occasionally, you might need to do advanced variable editing.
Here’s how:
Every survey object has an element called
variables
This is a data frame where the survey’s variables are
located
class(rccsu2018$variables)
## [1] "data.frame"
- Create a new variable in the
variables
data frame
(which is part of the survey object).
- Call
set_survey()
again. Any time you modify the
variables
data frame, call set_survey()
.
- Tabulate the new variable.
We go through these steps to count how many chronic
conditions were present.
rccsu2018$variables$num_cc = 0
for (vr in c("hbp", "alz", "depress", "arth", "diabetes", "heartdise", "osteo"
, "copd", "stroke", "cancer")) {
idx = which(rccsu2018$variables[,vr])
rccsu2018$variables$num_cc[idx] = rccsu2018$variables$num_cc[idx] + 1
}
set_survey(rccsu2018, mode = "NCHS")
## * Mode: NCHS.
Survey info {RCC SU 2018 PUF}
|
Variables
|
Observations
|
Design
|
82
|
904
|
Stratified Independent Sampling design svydesign(ids = ~1, strata =
~pufstrata2 + su_facid, fpc = ~pufpopfac2, weights = ~suwt, data = d1)
|
num_cc
is a numeric variable with the number of chronic
conditions. The published figure uses a categorical variable which is
based on this numeric variable. Use var_cut()
, which
converts numeric variables to categorical (factor
)
variables.
var_cut("Number of chronic conditions", "num_cc"
, c(-Inf, 0, 1, 3, 10, Inf)
, c("0", "1", "2-3", "4-10", "??"))
tab("Number of chronic conditions")
Number of chronic conditions {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
0
|
121
|
140
|
19
|
106
|
184
|
15.2
|
2.0
|
11.5
|
19.6
|
1
|
189
|
180
|
17
|
148
|
218
|
19.5
|
1.9
|
16.0
|
23.5
|
2-3
|
446
|
444
|
23
|
400
|
492
|
48.3
|
2.4
|
43.5
|
53.1
|
4-10
|
148
|
156
|
17
|
125
|
194
|
16.9
|
1.8
|
13.5
|
20.8
|
- This figure shows the percentage of residents who need help with one
of the activities of daily living (ADLs).
- In addition, it shows the distribution of residents by the number of
ADLs with which they need help.
Here’s a table for bathhlp
(help with bathing):
Type of assistance resident needs to bathe {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Flags
|
MISSING
|
22
|
10
|
2
|
6
|
17
|
1.1
|
0.3
|
0.6
|
2.1
|
|
NEED HELP OR SUPERVISION FROM ANOTHER PERSON
|
551
|
581
|
25
|
534
|
633
|
63.3
|
2.3
|
58.7
|
67.7
|
|
USE OF AN ASSISTIVE DEVICE
|
11
|
7
|
2
|
3
|
15
|
0.7
|
0.3
|
0.3
|
1.5
|
Cx
|
BOTH
|
127
|
113
|
15
|
87
|
148
|
12.4
|
1.6
|
9.4
|
15.9
|
|
NEED NO ASSISTANCE
|
193
|
207
|
18
|
173
|
247
|
22.5
|
2.0
|
18.7
|
26.6
|
|
This variable has multiple levels.
- Several of these levels correspond to a resident needing help,
- One level (
"NEED NO ASSISTANCE"
) = does not need
help
- One level (
"MISSING"
) = unknown
We want to show (resident needing help) as a percentage of knowns
only (that is, excluding the unknowns).
To do this, convert the variable to having 2 levels (needs help /
does not need help) plus NA
(for unknown); then use the
drop_na
argument to base percentages on knowns only.
for (vr in c("bathhlp", "walkhlp", "dreshlp", "transhlp", "toilhlp", "eathlp")) {
var_collapse(vr
, "Needs assistance"
, c("NEED HELP OR SUPERVISION FROM ANOTHER PERSON"
, "USE OF AN ASSISTIVE DEVICE"
, "BOTH"))
var_collapse(vr, NA, "MISSING")
}
tab("bathhlp", "walkhlp", "dreshlp", "transhlp", "toilhlp", "eathlp", drop_na = TRUE)
Type of assistance resident needs to bathe (knowns only) {RCC SU 2018
PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Needs assistance
|
689
|
702
|
25
|
654
|
752
|
77.2
|
2
|
73.1
|
81.1
|
NEED NO ASSISTANCE
|
193
|
207
|
18
|
173
|
247
|
22.8
|
2
|
18.9
|
26.9
|
Type of assistance resident needs for locomotion (knowns only) {RCC SU
2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Needs assistance
|
622
|
625
|
24
|
578
|
675
|
68.9
|
2.3
|
64.2
|
73.4
|
NEED NO ASSISTANCE
|
253
|
281
|
22
|
241
|
329
|
31.1
|
2.3
|
26.6
|
35.8
|
Type of assistance resident needs to dress (knowns only) {RCC SU 2018
PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Needs assistance
|
527
|
561
|
25
|
513
|
614
|
61.7
|
2.3
|
57.1
|
66.2
|
NEED NO ASSISTANCE
|
355
|
348
|
22
|
308
|
393
|
38.3
|
2.3
|
33.8
|
42.9
|
Type of assistance resident needs to transfer in/out of chair (knowns
only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Needs assistance
|
463
|
464
|
24
|
418
|
515
|
51
|
2.4
|
46.1
|
55.8
|
NEED NO ASSISTANCE
|
420
|
446
|
24
|
400
|
496
|
49
|
2.4
|
44.2
|
53.9
|
Type of assistance resident needs to use bathroom (knowns only) {RCC SU
2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Needs assistance
|
437
|
443
|
24
|
398
|
493
|
48.7
|
2.4
|
43.8
|
53.5
|
NEED NO ASSISTANCE
|
447
|
467
|
25
|
421
|
518
|
51.3
|
2.4
|
46.5
|
56.2
|
Type of assistance resident needs to eat (knowns only) {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
Needs assistance
|
257
|
240
|
21
|
200
|
286
|
26.3
|
2.3
|
21.9
|
31.1
|
NEED NO ASSISTANCE
|
628
|
671
|
26
|
622
|
724
|
73.7
|
2.3
|
68.9
|
78.1
|
Now, go through the “advanced variable editing” steps – very similar
to Figure 4 – to count how many ADLs were present.
rccsu2018$variables$num_adl = 0
for (vr in c("bathhlp", "walkhlp", "dreshlp", "transhlp", "toilhlp", "eathlp")) {
idx = which(rccsu2018$variables[,vr] %in%
c("NEED HELP OR SUPERVISION FROM ANOTHER PERSON"
, "USE OF AN ASSISTIVE DEVICE"
, "BOTH"))
rccsu2018$variables$num_adl[idx] = rccsu2018$variables$num_adl[idx] + 1
}
set_survey(rccsu2018, mode = "NCHS")
## * Mode: NCHS.
Survey info {RCC SU 2018 PUF}
|
Variables
|
Observations
|
Design
|
83
|
904
|
Stratified Independent Sampling design svydesign(ids = ~1, strata =
~pufstrata2 + su_facid, fpc = ~pufpopfac2, weights = ~suwt, data = d1)
|
For generating the figure, create a categorical variable based on
num_adl
, which is numeric.
var_cut("Number of ADLs", "num_adl"
, c(-Inf, 0, 2, 6, Inf)
, c("0", "1-2", "3-6", "??"))
tab("Number of ADLs")
Number of ADLs {RCC SU 2018 PUF}
|
Level
|
n
|
Number (000)
|
SE (000)
|
LL (000)
|
UL (000)
|
Percent
|
SE
|
LL
|
UL
|
0
|
131
|
114
|
12
|
92
|
142
|
12.4
|
1.3
|
9.9
|
15.4
|
1-2
|
218
|
249
|
22
|
209
|
297
|
27.1
|
2.2
|
22.8
|
31.8
|
3-6
|
555
|
555
|
25
|
508
|
606
|
60.4
|
2.4
|
55.6
|
65.1
|