Creating summary tables
Overview
Teaching: 45 min
Exercises: 10 minQuestions
How can I create a ‘Table 1’ style table?
How can export tables and data
Objectives
make a new variable from other(s) - e.g. log transform
Create a categorical (0/1) out of continuous
Stratify
One of the most useful things in public health data analysis or studies is the use of tables that compare baseline characteristics between two treatment groups. So for example, in a clinical trial study, we may have a treatment group and a placebo group. Before we can compare the two groups for the effect of treatment on the treatment group, we want to ensure that the underlying characteristics between the two groups are the same/similar so we are comparing apple to apples.
First table that is often created in Epidemiology studies is referred to as “Table 1”.
The first table in many papers gives an overview of the study population and its characteristics, usually giving numbers and percentages of the study population in different categories (e.g. by sex, educational attainment, smoking status) and summaries of measured characteristics (continuous variables) of the participants (e.g. age, height, body mass index).
Inskip, H., Ntani, G., Westbury, L. et al. Getting started with tables. Arch Public Health 75, 14 (2017) doi:10.1186/s13690-017-0180-1
str(analysis_swan_df)
'data.frame': 2424 obs. of 16 variables:
$ SWANID : int 10046 10056 10126 10153 10196 10245 10484 10514 10522 10532 ...
$ Age : int 58 57 54 57 52 54 56 52 52 49 ...
$ RACE : Factor w/ 5 levels "Black","Chinese",..: 2 4 1 3 2 4 1 4 4 4 ...
$ BMI : num 35.6 19.8 26.4 31.6 22.3 ...
$ Glucose : int 116 89 82 85 80 88 111 101 83 91 ...
$ Smoker : Factor w/ 2 levels "No","Yes": 1 1 NA 1 1 1 2 1 1 NA ...
$ LDL : int 137 90 136 154 130 129 93 137 103 128 ...
$ HDL : int 48 78 57 55 59 83 47 39 65 41 ...
$ CRP : num 8.7 0.5 1.5 2.7 0.3 1.3 7.4 1.3 1.1 1.5 ...
$ DBP : int 72 62 80 68 62 64 68 70 58 70 ...
$ SBP : int 134 96 102 108 94 94 130 124 102 118 ...
$ Exercise : Factor w/ 2 levels "No","Yes": 2 2 NA 2 2 1 2 2 2 NA ...
$ log_CRP : num 2.163 -0.693 0.405 0.993 -1.204 ...
$ Chol_Ratio : num 3.85 2.15 3.39 3.8 3.2 ...
$ BMI_cat : Factor w/ 6 levels "Normal","Underweight",..: 5 1 3 4 1 3 4 4 1 4 ...
$ bp_category: Factor w/ 4 levels "Normal","Elevated",..: 3 1 3 1 1 1 3 2 1 1 ...
Creating Table 1 - Demographics:
library(tableone)
varlist = c('Age', 'RACE', 'BMI', 'Glucose', 'Smoker', 'LDL', 'HDL', 'CRP', 'bp_category', 'Exercise', 'Chol_Ratio')
factorvarlist = c('RACE', 'Smoker', 'bp_category', 'Exercise')
For overall values
(CreateTableOne(data = analysis_swan_df,
vars = varlist, factorVars = factorvarlist))
Overall
n 2424
Age (mean (SD)) 51.97 (2.68)
RACE (%)
Black 720 (29.7)
Chinese 207 ( 8.5)
Japanese 261 (10.8)
Caucasian 1197 (49.4)
Hispanic 39 ( 1.6)
BMI (mean (SD)) 28.89 (7.35)
Glucose (mean (SD)) 94.26 (31.12)
Smoker = Yes (%) 320 (13.9)
LDL (mean (SD)) 121.11 (33.54)
HDL (mean (SD)) 58.20 (14.91)
CRP (mean (SD)) 4.07 (6.79)
bp_category (%)
Normal 1105 (51.8)
Elevated 269 (12.6)
Hypertension Stage 1 456 (21.4)
Hypertension Stage 2+ 302 (14.2)
Exercise = Yes (%) 1520 (70.9)
Chol_Ratio (mean (SD)) 3.21 (0.86)
kableone(CreateTableOne(data = analysis_swan_df,
vars = varlist, factorVars = factorvarlist))
Overall | |
---|---|
n | 2424 |
Age (mean (SD)) | 51.97 (2.68) |
RACE (%) | |
Black | 720 (29.7) |
Chinese | 207 ( 8.5) |
Japanese | 261 (10.8) |
Caucasian | 1197 (49.4) |
Hispanic | 39 ( 1.6) |
BMI (mean (SD)) | 28.89 (7.35) |
Glucose (mean (SD)) | 94.26 (31.12) |
Smoker = Yes (%) | 320 (13.9) |
LDL (mean (SD)) | 121.11 (33.54) |
HDL (mean (SD)) | 58.20 (14.91) |
CRP (mean (SD)) | 4.07 (6.79) |
bp_category (%) | |
Normal | 1105 (51.8) |
Elevated | 269 (12.6) |
Hypertension Stage 1 | 456 (21.4) |
Hypertension Stage 2+ | 302 (14.2) |
Exercise = Yes (%) | 1520 (70.9) |
Chol_Ratio (mean (SD)) | 3.21 (0.86) |
For each BMI category
kableone(CreateTableOne(data = analysis_swan_df,
vars = varlist, factorVars = factorvarlist, strata = "BMI_cat"))
Normal | Underweight | Pre-obese | Obesity I | Obesity II | Obesity III | p | test | |
---|---|---|---|---|---|---|---|---|
n | 725 | 30 | 590 | 372 | 228 | 174 | ||
Age (mean (SD)) | 51.87 (2.66) | 52.43 (2.70) | 52.13 (2.69) | 52.01 (2.65) | 52.25 (2.70) | 51.66 (2.73) | 0.119 | |
RACE (%) | <0.001 | |||||||
Black | 85 (11.7) | 5 (16.7) | 161 (27.3) | 149 (40.1) | 124 (54.4) | 91 (52.3) | ||
Chinese | 136 (18.8) | 7 (23.3) | 45 ( 7.6) | 12 ( 3.2) | 2 ( 0.9) | 2 ( 1.1) | ||
Japanese | 156 (21.5) | 7 (23.3) | 52 ( 8.8) | 13 ( 3.5) | 1 ( 0.4) | 3 ( 1.7) | ||
Caucasian | 346 (47.7) | 11 (36.7) | 319 (54.1) | 195 (52.4) | 93 (40.8) | 74 (42.5) | ||
Hispanic | 2 ( 0.3) | 0 ( 0.0) | 13 ( 2.2) | 3 ( 0.8) | 8 ( 3.5) | 4 ( 2.3) | ||
BMI (mean (SD)) | 22.28 (1.63) | 17.71 (0.61) | 27.29 (1.40) | 32.27 (1.41) | 37.36 (1.40) | 45.51 (5.06) | <0.001 | |
Glucose (mean (SD)) | 86.37 (14.82) | 82.41 (7.64) | 91.65 (24.46) | 100.47 (44.17) | 104.16 (38.16) | 111.05 (41.06) | <0.001 | |
Smoker = Yes (%) | 82 (11.6) | 3 (10.0) | 83 (14.6) | 45 (12.7) | 36 (16.4) | 19 (11.4) | 0.387 | |
LDL (mean (SD)) | 118.39 (31.96) | 111.55 (34.78) | 123.07 (34.22) | 124.73 (34.24) | 126.41 (34.66) | 114.68 (34.37) | <0.001 | |
HDL (mean (SD)) | 65.33 (14.73) | 71.03 (14.75) | 57.95 (14.22) | 52.62 (12.98) | 50.90 (11.82) | 50.42 (10.72) | <0.001 | |
CRP (mean (SD)) | 1.77 (5.63) | 0.84 (1.71) | 2.99 (4.17) | 5.48 (8.31) | 7.47 (7.25) | 9.58 (7.86) | <0.001 | |
bp_category (%) | <0.001 | |||||||
Normal | 505 (69.8) | 28 (93.3) | 296 (50.3) | 158 (42.6) | 64 (28.1) | 51 (29.5) | ||
Elevated | 60 ( 8.3) | 1 ( 3.3) | 70 (11.9) | 63 (17.0) | 41 (18.0) | 29 (16.8) | ||
Hypertension Stage 1 | 107 (14.8) | 1 ( 3.3) | 146 (24.8) | 86 (23.2) | 64 (28.1) | 47 (27.2) | ||
Hypertension Stage 2+ | 52 ( 7.2) | 0 ( 0.0) | 77 (13.1) | 64 (17.3) | 59 (25.9) | 46 (26.6) | ||
Exercise = Yes (%) | 555 (79.3) | 21 (70.0) | 406 (73.0) | 244 (70.1) | 127 (58.8) | 79 (49.1) | <0.001 | |
Chol_Ratio (mean (SD)) | 2.92 (0.75) | 2.67 (0.71) | 3.25 (0.84) | 3.49 (0.88) | 3.56 (0.90) | 3.35 (0.89) | <0.001 |
Key Points
Use
stargazer
package to make beautiful tablesUse R functions to export data (in different formats?)