This lesson is still being designed and assembled (Pre-Alpha version)

Creating summary tables

Overview

Teaching: 45 min
Exercises: 10 min
Questions
  • How can I create a ‘Table 1’ style table?

  • How can export tables and data

Objectives
  • make a new variable from other(s) - e.g. log transform

  • Create a categorical (0/1) out of continuous

  • Stratify

One of the most useful things in public health data analysis or studies is the use of tables that compare baseline characteristics between two treatment groups. So for example, in a clinical trial study, we may have a treatment group and a placebo group. Before we can compare the two groups for the effect of treatment on the treatment group, we want to ensure that the underlying characteristics between the two groups are the same/similar so we are comparing apple to apples.

First table that is often created in Epidemiology studies is referred to as “Table 1”.

The first table in many papers gives an overview of the study population and its characteristics, usually giving numbers and percentages of the study population in different categories (e.g. by sex, educational attainment, smoking status) and summaries of measured characteristics (continuous variables) of the participants (e.g. age, height, body mass index).

Inskip, H., Ntani, G., Westbury, L. et al. Getting started with tables. Arch Public Health 75, 14 (2017) doi:10.1186/s13690-017-0180-1
str(analysis_swan_df)
'data.frame':	2424 obs. of  16 variables:
 $ SWANID     : int  10046 10056 10126 10153 10196 10245 10484 10514 10522 10532 ...
 $ Age        : int  58 57 54 57 52 54 56 52 52 49 ...
 $ RACE       : Factor w/ 5 levels "Black","Chinese",..: 2 4 1 3 2 4 1 4 4 4 ...
 $ BMI        : num  35.6 19.8 26.4 31.6 22.3 ...
 $ Glucose    : int  116 89 82 85 80 88 111 101 83 91 ...
 $ Smoker     : Factor w/ 2 levels "No","Yes": 1 1 NA 1 1 1 2 1 1 NA ...
 $ LDL        : int  137 90 136 154 130 129 93 137 103 128 ...
 $ HDL        : int  48 78 57 55 59 83 47 39 65 41 ...
 $ CRP        : num  8.7 0.5 1.5 2.7 0.3 1.3 7.4 1.3 1.1 1.5 ...
 $ DBP        : int  72 62 80 68 62 64 68 70 58 70 ...
 $ SBP        : int  134 96 102 108 94 94 130 124 102 118 ...
 $ Exercise   : Factor w/ 2 levels "No","Yes": 2 2 NA 2 2 1 2 2 2 NA ...
 $ log_CRP    : num  2.163 -0.693 0.405 0.993 -1.204 ...
 $ Chol_Ratio : num  3.85 2.15 3.39 3.8 3.2 ...
 $ BMI_cat    : Factor w/ 6 levels "Normal","Underweight",..: 5 1 3 4 1 3 4 4 1 4 ...
 $ bp_category: Factor w/ 4 levels "Normal","Elevated",..: 3 1 3 1 1 1 3 2 1 1 ...

Creating Table 1 - Demographics:

library(tableone)

varlist = c('Age', 'RACE', 'BMI', 'Glucose', 'Smoker', 'LDL', 'HDL', 'CRP', 'bp_category', 'Exercise', 'Chol_Ratio')

factorvarlist = c('RACE', 'Smoker', 'bp_category', 'Exercise')

For overall values

(CreateTableOne(data = analysis_swan_df,
                        vars = varlist, factorVars = factorvarlist))
                          
                           Overall       
  n                          2424        
  Age (mean (SD))           51.97 (2.68) 
  RACE (%)                               
     Black                    720 (29.7) 
     Chinese                  207 ( 8.5) 
     Japanese                 261 (10.8) 
     Caucasian               1197 (49.4) 
     Hispanic                  39 ( 1.6) 
  BMI (mean (SD))           28.89 (7.35) 
  Glucose (mean (SD))       94.26 (31.12)
  Smoker = Yes (%)            320 (13.9) 
  LDL (mean (SD))          121.11 (33.54)
  HDL (mean (SD))           58.20 (14.91)
  CRP (mean (SD))            4.07 (6.79) 
  bp_category (%)                        
     Normal                  1105 (51.8) 
     Elevated                 269 (12.6) 
     Hypertension Stage 1     456 (21.4) 
     Hypertension Stage 2+    302 (14.2) 
  Exercise = Yes (%)         1520 (70.9) 
  Chol_Ratio (mean (SD))     3.21 (0.86) 
kableone(CreateTableOne(data = analysis_swan_df,
                        vars = varlist, factorVars = factorvarlist))
  Overall
n 2424
Age (mean (SD)) 51.97 (2.68)
RACE (%)  
Black 720 (29.7)
Chinese 207 ( 8.5)
Japanese 261 (10.8)
Caucasian 1197 (49.4)
Hispanic 39 ( 1.6)
BMI (mean (SD)) 28.89 (7.35)
Glucose (mean (SD)) 94.26 (31.12)
Smoker = Yes (%) 320 (13.9)
LDL (mean (SD)) 121.11 (33.54)
HDL (mean (SD)) 58.20 (14.91)
CRP (mean (SD)) 4.07 (6.79)
bp_category (%)  
Normal 1105 (51.8)
Elevated 269 (12.6)
Hypertension Stage 1 456 (21.4)
Hypertension Stage 2+ 302 (14.2)
Exercise = Yes (%) 1520 (70.9)
Chol_Ratio (mean (SD)) 3.21 (0.86)

For each BMI category

kableone(CreateTableOne(data = analysis_swan_df,
                        vars = varlist, factorVars = factorvarlist, strata = "BMI_cat"))
  Normal Underweight Pre-obese Obesity I Obesity II Obesity III p test
n 725 30 590 372 228 174    
Age (mean (SD)) 51.87 (2.66) 52.43 (2.70) 52.13 (2.69) 52.01 (2.65) 52.25 (2.70) 51.66 (2.73) 0.119  
RACE (%)             <0.001  
Black 85 (11.7) 5 (16.7) 161 (27.3) 149 (40.1) 124 (54.4) 91 (52.3)    
Chinese 136 (18.8) 7 (23.3) 45 ( 7.6) 12 ( 3.2) 2 ( 0.9) 2 ( 1.1)    
Japanese 156 (21.5) 7 (23.3) 52 ( 8.8) 13 ( 3.5) 1 ( 0.4) 3 ( 1.7)    
Caucasian 346 (47.7) 11 (36.7) 319 (54.1) 195 (52.4) 93 (40.8) 74 (42.5)    
Hispanic 2 ( 0.3) 0 ( 0.0) 13 ( 2.2) 3 ( 0.8) 8 ( 3.5) 4 ( 2.3)    
BMI (mean (SD)) 22.28 (1.63) 17.71 (0.61) 27.29 (1.40) 32.27 (1.41) 37.36 (1.40) 45.51 (5.06) <0.001  
Glucose (mean (SD)) 86.37 (14.82) 82.41 (7.64) 91.65 (24.46) 100.47 (44.17) 104.16 (38.16) 111.05 (41.06) <0.001  
Smoker = Yes (%) 82 (11.6) 3 (10.0) 83 (14.6) 45 (12.7) 36 (16.4) 19 (11.4) 0.387  
LDL (mean (SD)) 118.39 (31.96) 111.55 (34.78) 123.07 (34.22) 124.73 (34.24) 126.41 (34.66) 114.68 (34.37) <0.001  
HDL (mean (SD)) 65.33 (14.73) 71.03 (14.75) 57.95 (14.22) 52.62 (12.98) 50.90 (11.82) 50.42 (10.72) <0.001  
CRP (mean (SD)) 1.77 (5.63) 0.84 (1.71) 2.99 (4.17) 5.48 (8.31) 7.47 (7.25) 9.58 (7.86) <0.001  
bp_category (%)             <0.001  
Normal 505 (69.8) 28 (93.3) 296 (50.3) 158 (42.6) 64 (28.1) 51 (29.5)    
Elevated 60 ( 8.3) 1 ( 3.3) 70 (11.9) 63 (17.0) 41 (18.0) 29 (16.8)    
Hypertension Stage 1 107 (14.8) 1 ( 3.3) 146 (24.8) 86 (23.2) 64 (28.1) 47 (27.2)    
Hypertension Stage 2+ 52 ( 7.2) 0 ( 0.0) 77 (13.1) 64 (17.3) 59 (25.9) 46 (26.6)    
Exercise = Yes (%) 555 (79.3) 21 (70.0) 406 (73.0) 244 (70.1) 127 (58.8) 79 (49.1) <0.001  
Chol_Ratio (mean (SD)) 2.92 (0.75) 2.67 (0.71) 3.25 (0.84) 3.49 (0.88) 3.56 (0.90) 3.35 (0.89) <0.001  

Key Points

  • Use stargazer package to make beautiful tables

  • Use R functions to export data (in different formats?)