This lesson is still being designed and assembled (Pre-Alpha version)

Merging and Joining Data

Overview

Teaching: 45 min
Exercises: 10 min
Questions
  • How can I combine data from different sources?

Objectives
  • Learn how to stack data frames (ex. different years’ data)

  • Learn how to join - inner, outer, etc.

  • Explore after a join to verify results as expected

Introduction goes here

Very often, studies record different types of data in different tables. For example, there may be one table with demographic information about participants; there might be another table with lab measurements associated with the participants.

Discussion:

When might be some scenarios where it might be preferable not to record all of the study information in a single table?

For the next several lessons, we’ll be working with a data set from the Study of Women’s Health Across the Nation (SWAN). SWAN is a multi-site longitudinal, epidemiologic study. The study, which began in 1994 and is still ongoing, examines the physical, biological, psychological, and social changes in women during their middle years.

Citation: 
Sutton-Tyrell, Kim, Selzer, Faith, Sowers, MaryFran R. (Mary Francis Roy), Finkelstein, Joel, Powell, Lynda, Gold, Ellen B., … Matthews, Karen. Study of Women’s Health Across the Nation (SWAN), 2002-2004: Visit 06 Dataset. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2018-11-13. https://doi.org/10.3886/ICPSR31181.v2

The SWAN study collected data in 3-year cycles. We’ll be working with the 2002-2004 cycle data. Although the study collected data on hundreds of variables across thousands of women, we’ll focus our analysis on the following smaller set of variables:

Variable Description
SWANID Participant’s Study Identifier
AGE6 Age
RACE Race at Screener
BMI6 Body Mass Index (BMI)
LDLRESU6 Low-density lipoprotein cholesterol (estimated) mg/dl
HDLRESU6 High density lipoprotein cholesterol mg/dl
GLUCRES6 Glucose mg/dl
CRPRESU6 C-reactive protein (CRP) mg/l
DIABP16 Diastolic blood pressure mmHg
SYSBP16 Systolic blood pressure mmHg
EXERCIS6 During the past 12 months, have you exercised?
SMOKERE6 Since your last study visit, have you smoked cigarettes regularly (at least one cigarette a day)?
STATUS6 Menopausal Status

STATUS6 | Menopausal Status : 1 = Post by BSO (Bilateral Salpingo Oophorectomy) 2 = Natural Post 3 = Late Peri 4 = Early Peri 5 = Pre 6 = Pregnant/breastfeeding 7 = Unknown due to HT use 8 = Unknown due to hysterectomy

RACE Race at Screener 1: Black/African American 2: Chinese/Chinese American 3: Japanese/Japanese American 4: Caucasian/ White Non-Hispanic 5: Hispanic

In our case, we have two tables, a demographic table and a table with lab measurements.

We’ll start by reading each of them into a separate data frame.

swan_demographic <- read.csv('data/SWAN_2002_2004/DS0001/swan_demo.csv')
swan_lab <- read.csv('data/SWAN_2002_2004/DS0001/swan_lab.csv')

Let’s take a peek at each data frame:

head(swan_demographic)
  SWANID VISIT INTDAY6 AGE6 LANGINT6 RACE
1  10046     6    2310   58        1    2
2  10056     6    2199   57        1    4
3  10126     6    2213   54        1    1
4  10153     6    2201   57        1    3
5  10196     6    2213   52        1    2
6  10245     6    2296   54        1    4

swan_lab is very wide, but feel free to peek at it using head() and/or by clicking on it in RStudio’s Environment pane. Notice that when there are many variables, RStudio provides buttons for paginating left and right through the variables.

Let’s look at the variable names (column names):

colnames(swan_demographic)
[1] "SWANID"   "VISIT"    "INTDAY6"  "AGE6"     "LANGINT6" "RACE"    
colnames(swan_lab)
  [1] "SWANID"    "VISIT"     "PREGNAN6"  "PREVBLO6"  "EATDRIN6"  "STRTPER6" 
  [7] "BLEDAY6"   "BLDRWAT6"  "BLDDRAW6"  "SPEDAY6"   "ANTICO16"  "ACOATW16" 
 [13] "ANTICO26"  "ACOATW26"  "HEART16"   "HARTTW16"  "HEART26"   "HARTTW26" 
 [19] "CHOLST16"  "CHOLTW16"  "CHOLST26"  "CHOLTW26"  "BP16"      "BPTW16"   
 [25] "BP26"      "BPTW26"    "DIURET16"  "DIURTW16"  "DIURET26"  "DIURTW26" 
 [31] "THYROI16"  "THYRTW16"  "THYROI26"  "THYRTW26"  "INSULN16"  "INSUTW16" 
 [37] "INSULN26"  "INSUTW26"  "NERVS16"   "NERVTW16"  "NERVS26"   "NERVTW26" 
 [43] "STEROI16"  "STERTW16"  "STEROI26"  "STERTW26"  "FERTIL16"  "FRTLTW16" 
 [49] "FERTIL26"  "FRTLTW26"  "BCP16"     "BCPTWI16"  "BCP26"     "BCPTWI26" 
 [55] "BCREAS6"   "BCRES_S6"  "ESTROG16"  "ESTRTW16"  "ESTROG26"  "ESTRTW26" 
 [61] "ESTRDA16"  "ESTRDA26"  "ESTRNJ16"  "EINJTW16"  "ESTRNJ26"  "EINJTW26" 
 [67] "COMBIN16"  "COMBTW16"  "COMBIN26"  "COMBTW26"  "PROGES16"  "PROGTW16" 
 [73] "PROGES26"  "PROGTW26"  "PROGDA16"  "PROGDA26"  "OSTEPR16"  "OSTETW16" 
 [79] "OSTEPR26"  "OSTETW26"  "ARTHRT16"  "ARTHTW16"  "ARTHRT26"  "ARTHTW26" 
 [85] "OTHMED16"  "OTHRTW16"  "OTHMED26"  "OTHRTW26"  "OTHMED36"  "OTHRTW36" 
 [91] "OTHMED46"  "OTHRTW46"  "OTHMED56"  "OTHRTW56"  "OTHMED66"  "OTHRTW66" 
 [97] "OTHMED76"  "OTHRTW76"  "OTHMED86"  "OTHRTW86"  "OTHMED96"  "OTHRTW96" 
[103] "OTHME106"  "OTHTW106"  "OTHME116"  "OTHTW116"  "OTHME126"  "OTHTW126" 
[109] "OTHME136"  "OTHTW136"  "OTHME146"  "OTHTW146"  "OTHME156"  "OTHTW156" 
[115] "ESTLSTV6"  "REDUHAR6"  "OSTEOPO6"  "MENOSYM6"  "YOUNGLK6"  "HCPADVI6" 
[121] "FRNADVI6"  "IMPRMEM6"  "REGPERI6"  "HORMOTH6"  "DONTKNO6"  "HORMDAY6" 
[127] "PRBBLEE6"  "HAVEPER6"  "LIKEFEL6"  "SIDEEFF6"  "CANCER6"   "ADVISTO6" 
[133] "EXPENSI6"  "NOLIKE6"   "NOREMEB6"  "DNTKNOW6"  "STOPOTH6"  "NOREASO6" 
[139] "REGVITA6"  "ONCEADA6"  "ANTIOXI6"  "VITCOMB6"  "VTMOTH16"  "VTMOTH26" 
[145] "VTMOTH36"  "VTMOTH46"  "VITAMNA6"  "BETACAR6"  "VITAMNC6"  "VITAMND6" 
[151] "VITAMNE6"  "CALCTUM6"  "IRON6"     "ZINC6"     "SELENIU6"  "VTMSING6" 
[157] "SVTMOT16"  "SVTMOT26"  "SVTMOT36"  "SVTMOT46"  "SVTMOT56"  "SVTMOT66" 
[163] "SVTMOT76"  "SVTMOT86"  "SVTMOT96"  "SVTMO106"  "PAIN16"    "PAINTW16" 
[169] "PAIN26"    "PAINTW26"  "SLEEP16"   "SLEPTW16"  "SLEEP26"   "SLEPTW26" 
[175] "OTC16"     "OTCTW16"   "OTC26"     "OTCTW26"   "OTC36"     "OTCTW36"  
[181] "OTC46"     "OTCTW46"   "ALCHL246"  "SOYYSNO6"  "SOYPROT6"  "CEREACA6" 
[187] "BREADCA6"  "ORANGCA6"  "ANEMIA6"   "DIABETE6"  "HIGHBP6"   "HBCHOLE6" 
[193] "MIGRAIN6"  "STROKE6"   "OSTEOAR6"  "THYROID6"  "HEARTAT6"  "ANGINA6"  
[199] "OSTEOPR6"  "CANCERS6"  "SITESPE6"  "TAMOXIF6"  "CHEMOTH6"  "BROKEBO6" 
[205] "BONES16"   "HAPPEN16"  "BONES26"   "HAPPEN26"  "BONES36"   "HAPPEN36" 
[211] "DANDC6"    "NUMDAND6"  "HYSTERE6"  "HYSTDAY6"  "OOPHORE6"  "ONEOVAR6" 
[217] "ABLATIN6"  "UTERPRO6"  "THYRREM6"  "ENDO6"     "ENDODIF6"  "PELVCPN6" 
[223] "DIFPELV6"  "PROLAPS6"  "DIFPROL6"  "PELVCNC6"  "DIFCANC6"  "ABBLEED6" 
[229] "DIFBLED6"  "FIBRUTR6"  "DIFFIBR6"  "PRVIDER6"  "PROFDEG6"  "SPECIFY6" 
[235] "PROVSPC6"  "SPECIAL6"  "PROVTIM6"  "BLEEDNG6"  "BLD3MON6"  "LMPDAY6"  
[241] "DESCPER6"  "LENGCYL6"  "PRGNANT6"  "OUTCOME6"  "BRSTFEE6"  "QLTYLIF6" 
[247] "LISTEN6"   "TAKETOM6"  "CONFIDE6"  "HELPSIC6"  "CONTROL6"  "ABILITY6" 
[253] "YOURWAY6"  "PILING6"   "BOTHER6"   "APPETIT6"  "BLUES6"    "GOOD6"    
[259] "KEEPMIN6"  "DEPRESS6"  "EFFORT6"   "HOPEFUL6"  "FAILURE6"  "FEARFUL6" 
[265] "RESTLES6"  "HAPPY6"    "TALKLES6"  "LONELY6"   "UNFRNDL6"  "ENJOY6"   
[271] "CRYING6"   "SAD6"      "DISLIKE6"  "GETGOIN6"  "SEPEXP6"   "SEPTHR6"  
[277] "SEPINJ6"   "SEPFAM6"   "SEPLEV6"   "SEPSAF6"   "SEPCON6"   "SEPEAC6"  
[283] "SEPFUT6"   "SEPGOV6"   "SEPCNF6"   "SEPANX6"   "CHNGJOB6"  "JOB6"     
[289] "CHANGHR6"  "STRTIM16"  "STPTIM16"  "ROTAT16"   "STRTIM26"  "STPTIM26" 
[295] "ROTAT26"   "STRTIM36"  "STPTIM36"  "ROTAT36"   "HOURSPA6"  "VOLUNTE6" 
[301] "VLNTHR16"  "VLNTHR26"  "VLNTHR36"  "MARITAL6"  "CHGHHLD6"  "HOUSEHL6" 
[307] "RELAT16"   "SEX16"     "AGE16"     "RELAT26"   "SEX26"     "AGE26"    
[313] "RELAT36"   "SEX36"     "AGE36"     "RELAT46"   "SEX46"     "AGE46"    
[319] "RELAT56"   "SEX56"     "AGE56"     "RELAT66"   "SEX66"     "AGE66"    
[325] "RELAT76"   "SEX76"     "AGE76"     "RELAT86"   "SEX86"     "AGE86"    
[331] "RELAT96"   "SEX96"     "AGE96"     "RELAT106"  "SEX106"    "AGE106"   
[337] "RELAT116"  "SEX116"    "AGE116"    "RELAT126"  "SEX126"    "AGE126"   
[343] "FIBROID6"  "FORMINT6"  "STATUS6"   "SAADAY6"   "FORMSAA6"  "LANGSAA6" 
[349] "OVERHLT6"  "HOSPSTA6"  "MDTALK6"   "NERVES6"   "PAPSMEA6"  "BRSTEXA6" 
[355] "MAMOGRA6"  "MEDICDY6"  "INSURDR6"  "INSDRCO6"  "INSURRX6"  "INSRXCO6" 
[361] "INSURHO6"  "INSHOCO6"  "HLTHSER6"  "INSURAN6"  "NOTAFFR6"  "NOTRANS6" 
[367] "NOPROVI6"  "TOOBUSY6"  "NOTRUST6"  "BETTROF6"  "FAILOTH6"  "SMOKERE6" 
[373] "AVCIGDA6"  "DRNKBEE6"  "GLASBEE6"  "GLASWIN6"  "GLASLIQ6"  "HLTHAYR6" 
[379] "V_ACTI6"   "M_ACTI6"   "LIFTING6"  "CLIMBS6"   "CLIMB1_6"  "BENDING6" 
[385] "WALKM6"    "WALKS6"    "WALK1_6"   "BATHING6"  "PHYCTDW6"  "PHYACCO6" 
[391] "PHYLIMI6"  "PHYDFCL6"  "EMOCTDW6"  "EMOACCO6"  "EMOCARE6"  "INTERFR6" 
[397] "BODYPAI6"  "PAINTRF6"  "PEP6"      "NERV4WK6"  "CHER4WK6"  "CALM4WK6" 
[403] "ENERGY6"   "BLUE4WK6"  "WORNOUT6"  "HAPY4WK6"  "TIRED6"    "SOCIAL6"  
[409] "HEALSIC6"  "HEALTHY6"  "HEALWOR6"  "HEALEXC6"  "CARING6"   "MEALS6"   
[415] "ROUTNCH6"  "MODERAT6"  "VIGOROU6"  "PHYSACT6"  "WATCHTV6"  "WALKBIK6" 
[421] "SWEATPA6"  "SPORTS6"   "SPOREX16"  "RATEIN16"  "MTHSAC16"  "HRSACT16" 
[427] "OTHSPOR6"  "SPOREX26"  "RATEIN26"  "MTHSAC26"  "HRSACT26"  "PHYSWOR6" 
[433] "WORKTIR6"  "WRKACTA6"  "WRKACTB6"  "WRKACTC6"  "WRKACTD6"  "WRKACTE6" 
[439] "WRKACTF6"  "WRKACTG6"  "STIFF6"    "ACHES6"    "COLDSWE6"  "NITESWE6" 
[445] "VAGINDR6"  "FEELBLU6"  "DIZZY6"    "IRRITAB6"  "NRVOUS6"   "FORGET6"  
[451] "MOODCHG6"  "HARTRAC6"  "FEARFULA6" "HDACHE6"   "HOTFLAS6"  "BRSTPAI6" 
[457] "TRBLSLE6"  "WAKEUP6"   "WAKEARL6"  "SLEEPQL6"  "GETUPUR6"  "INVOLEA6" 
[463] "DAYSLEA6"  "COUGHLE6"  "URGEVOI6"  "OTHRLEA6"  "NUTRIRE6"  "HERBREM6" 
[469] "PSYCMET6"  "PHYSMET6"  "FOLKMED6"  "OTHRTHE6"  "SPECOTH6"  "STARTNE6" 
[475] "WORKTRB6"  "QUITJOB6"  "WORKLOA6"  "PRTUNEM6"  "MONEYPR6"  "WORSREL6" 
[481] "RELATEN6"  "SERIPRO6"  "CHILDMO6"  "RESPCAR6"  "LEGALPR6"  "CRELDIE6" 
[487] "CLOSDIE6"  "SELFVIO6"  "FAMLVIO6"  "PHYSILL6"  "MAJEVEN6"  "EMPLYPA6" 
[493] "REWRDJO6"  "STRSSJO6"  "CRNTCAR6"  "RWRDCAR6"  "STRSCAR6"  "CRNTMAR6" 
[499] "RWRDREL6"  "STRSREL6"  "CHILDRE6"  "REWRDMO6"  "STRSSMO6"  "INCOME6"  
[505] "HOW_HAR6"  "LOSSINC6"  "FINAN1Y6"  "FINAN2Y6"  "FINAN3Y6"  "HOMEOWN6" 
[511] "HOMESPE6"  "LADERCO6"  "LADERUS6"  "INTRPAN6"  "DISIPAN6"  "EXCIPAN6" 
[517] "UPSEPAN6"  "STROPAN6"  "GUILPAN6"  "SCARPAN6"  "HOSTPAN6"  "ENTHPAN6" 
[523] "PROUPAN6"  "IRRIPAN6"  "ALERPAN6"  "ASHAPAN6"  "INSPPAN6"  "NERVPAN6" 
[529] "DETEPAN6"  "ATTEPAN6"  "JITTPAN6"  "ACTIPAN6"  "AFRAPAN6"  "STRENEX6" 
[535] "MODEREX6"  "ACUPUNC6"  "ACUPHAR6"  "ACUPOST6"  "ACUPMEN6"  "ACUPLOO6" 
[541] "ACUPMEM6"  "ACUPPER6"  "ACUPGEN6"  "ACUPWGH6"  "ACUPADV6"  "ACUPOTH6" 
[547] "ACUPSPE6"  "BCOHOSH6"  "BCOHHAR6"  "BCOHOST6"  "BCOHMEN6"  "BCOHLOO6" 
[553] "BCOHMEM6"  "BCOHPER6"  "BCOHGEN6"  "BCOHWGH6"  "BCOHADV6"  "BCOHOTH6" 
[559] "BCOHSPE6"  "DHEA6"     "DHEAHAR6"  "DHEAOST6"  "DHEAMEN6"  "DHEALOO6" 
[565] "DHEAMEM6"  "DHEAPER6"  "DHEAGEN6"  "DHEAWGH6"  "DHEAADV6"  "DHEAOTH6" 
[571] "DHEASPE6"  "DQUAI6"    "DQUAHAR6"  "DQUAOST6"  "DQUAMEN6"  "DQUALOO6" 
[577] "DQUAMEM6"  "DQUAPER6"  "DQUAGEN6"  "DQUAWGH6"  "DQUAADV6"  "DQUAOTH6" 
[583] "DQUASPE6"  "DIETNUT6"  "DIETHAR6"  "DIETOST6"  "DIETMEN6"  "DIETLOO6" 
[589] "DIETMEM6"  "DIETPER6"  "DIETGEN6"  "DIETWGH6"  "DIETADV6"  "DIETOTH6" 
[595] "DIETSPE6"  "HEALENR6"  "HEALHAR6"  "HEALOST6"  "HEALMEN6"  "HEALLOO6" 
[601] "HEALMEM6"  "HEALPER6"  "HEALGEN6"  "HEALWGH6"  "HEALADV6"  "HEALOTH6" 
[607] "HEALSPE6"  "EXERCIS6"  "EXERHAR6"  "EXEROST6"  "EXERMEN6"  "EXERLOO6" 
[613] "EXERMEM6"  "EXERPER6"  "EXERGEN6"  "EXERWGH6"  "EXERADV6"  "EXEROTH6" 
[619] "EXERSPE6"  "FLAXSEE6"  "FLAXHAR6"  "FLAXOST6"  "FLAXMEN6"  "FLAXLOO6" 
[625] "FLAXMEM6"  "FLAXPER6"  "FLAXGEN6"  "FLAXWGH6"  "FLAXADV6"  "FLAXOTH6" 
[631] "FLAXSPE6"  "GINKGO6"   "GINKHAR6"  "GINKOST6"  "GINKMEN6"  "GINKLOO6" 
[637] "GINKMEM6"  "GINKPER6"  "GINKGEN6"  "GINKWGH6"  "GINKADV6"  "GINKOTH6" 
[643] "GINKSPE6"  "GINSENG6"  "GINSHAR6"  "GINSOST6"  "GINSMEN6"  "GINSLOO6" 
[649] "GINSMEM6"  "GINSPER6"  "GINSGEN6"  "GINSWGH6"  "GINSADV6"  "GINSOTH6" 
[655] "GINSSPE6"  "GLUSAMI6"  "GLUSHAR6"  "GLUSOST6"  "GLUSMEN6"  "GLUSLOO6" 
[661] "GLUSMEM6"  "GLUSPER6"  "GLUSGEN6"  "GLUSWGH6"  "GLUSADV6"  "GLUSOTH6" 
[667] "GLUSSPE6"  "MYAMPRO6"  "MYAMHAR6"  "MYAMOST6"  "MYAMMEN6"  "MYAMLOO6" 
[673] "MYAMMEM6"  "MYAMPER6"  "MYAMGEN6"  "MYAMWGH6"  "MYAMADV6"  "MYAMOTH6" 
[679] "MYAMSPE6"  "MSM6"      "MSMHAR6"   "MSMOST6"   "MSMMEN6"   "MSMLOO6"  
[685] "MSMMEM6"   "MSMPER6"   "MSMGEN6"   "MSMWGH6"   "MSMADV6"   "MSMOTH6"  
[691] "MSMSPE6"   "PRAYER6"   "PRAYHAR6"  "PRAYOST6"  "PRAYMEN6"  "PRAYLOO6" 
[697] "PRAYMEM6"  "PRAYPER6"  "PRAYGEN6"  "PRAYWGH6"  "PRAYADV6"  "PRAYOTH6" 
[703] "PRAYSPE6"  "SAME6"     "SAMEHAR6"  "SAMEOST6"  "SAMEMEN6"  "SAMELOO6" 
[709] "SAMEMEM6"  "SAMEPER6"  "SAMEGEN6"  "SAMEWGH6"  "SAMEADV6"  "SAMEOTH6" 
[715] "SAMESPE6"  "SELFHEL6"  "SELFHAR6"  "SELFOST6"  "SELFMEN6"  "SELFLOO6" 
[721] "SELFMEM6"  "SELFPER6"  "SELFGEN6"  "SELFWGH6"  "SELFADV6"  "SELFOTH6" 
[727] "SELFSPE6"  "SOYSUPP6"  "SOYHAR6"   "SOYOST6"   "SOYMEN6"   "SOYLOO6"  
[733] "SOYMEM6"   "SOYPER6"   "SOYGEN6"   "SOYWGH6"   "SOYADV6"   "SOYOTH6"  
[739] "SOYSPE6"   "WORTSTJ6"  "WORTHAR6"  "WORTOST6"  "WORTMEN6"  "WORTLOO6" 
[745] "WORTMEM6"  "WORTPER6"  "WORTGEN6"  "WORTWGH6"  "WORTADV6"  "WORTOTH6" 
[751] "WORTSPE6"  "TAICHI6"   "TAIHAR6"   "TAIOST6"   "TAIMEN6"   "TAILOO6"  
[757] "TAIMEM6"   "TAIPER6"   "TAIGEN6"   "TAIWGH6"   "TAIADV6"   "TAIOTH6"  
[763] "TAISPE6"   "WVITAMI6"  "WVITHAR6"  "WVITOST6"  "WVITMEN6"  "WVITLOO6" 
[769] "WVITMEM6"  "WVITPER6"  "WVITGEN6"  "WVITWGH6"  "WVITADV6"  "WVITOTH6" 
[775] "WVITSPE6"  "YOGA6"     "YOGAHAR6"  "YOGAOST6"  "YOGAMEN6"  "YOGALOO6" 
[781] "YOGAMEM6"  "YOGAPER6"  "YOGAGEN6"  "YOGAWGH6"  "YOGAADV6"  "YOGAOTH6" 
[787] "YOGASPE6"  "OTHALT6"   "OTHALTS6"  "OTHHAR6"   "OTHOST6"   "OTHMEN6"  
[793] "OTHLOO6"   "OTHMEM6"   "OTHPER6"   "OTHGEN6"   "OTHWGH6"   "OTHADV6"  
[799] "OTHALTR6"  "WHYOTHA6"  "FLGSAAV6"  "SABDAY6"   "LANGSAB6"  "IMPORSE6" 
[805] "DESIRSE6"  "ENGAGSE6"  "NOPARTN6"  "PARTPRO6"  "PHYSPRO6"  "NOSEXOT6" 
[811] "SATISFY6"  "KISSING6"  "TOUCHIN6"  "ORALSEX6"  "INTCOUR6"  "AROUSED6" 
[817] "PELVIC6"   "LUBRICN6"  "ABLECLM6"  "IMPCLMX6"  "SATISFD6"  "FREQUEN6" 
[823] "MEN6MOS6"  "MASTURB6"  "FLGSABV6"  "PHYDAY6"   "PULSE6"    "SYSBP16"  
[829] "DIABP16"   "SYSBP26"   "DIABP26"   "HEIGHT6"   "HTMETHO6"  "WEIGHT6"  
[835] "SCALE6"    "WAIST6"    "WASTMEA6"  "HIP6"      "HIPMEAS6"  "FLGPHYV6" 
[841] "BMI6"      "COGDAY6"   "LANGCOG6"  "IMEDTHR6"  "IMEDCH16"  "IMEDHOU6" 
[847] "IMEDFIR6"  "IMEDFMN6"  "IMEDCLM6"  "IMEDCH26"  "IMEDRES6"  "IMEDMIN6" 
[853] "IMEDINJ6"  "IMEDEVR6"  "IMEDWEL6"  "TOTIDE16"  "SDMTSTA6"  "SDMTADM6" 
[859] "SDMTPRA6"  "SDMTATM6"  "SDMTCOR6"  "DIGIT1A6"  "DIGIT1B6"  "DIGIT2A6" 
[865] "DIGIT2B6"  "DIGIT3A6"  "DIGIT3B6"  "DIGIT4A6"  "DIGIT4B6"  "DIGIT5A6" 
[871] "DIGIT5B6"  "DIGIT6A6"  "DIGIT6B6"  "DIGTOT6"   "DLAYTHR6"  "DLAYCH16" 
[877] "DLAYHOU6"  "DLAYFIR6"  "DLAYFMN6"  "DLAYCLM6"  "DLAYCH26"  "DLAYRES6" 
[883] "DLAYMIN6"  "DLAYINJ6"  "DLAYEVR6"  "DLAYWEL6"  "TOTIDE26"  "FLGCOGV6" 
[889] "HRMDAY6"   "CYCDAY6"   "DHAS6"     "FSH6"      "SHBG6"     "T6"       
[895] "E2AVE6"    "FLGCV6"    "FLGDIF6"   "CVRDAY6"   "FLAGSER6"  "CHOLRES6" 
[901] "TRIGRES6"  "LDLRESU6"  "HDLRESU6"  "GLUCRES6"  "INSURES6"  "PAIRESU6" 
[907] "TPARESU6"  "LPARESU6"  "APOARES6"  "APOBRES6"  "CRPRESU6"  "SPSCDAY6" 
[913] "SPSCTIM6"  "SPSCMOD6"  "HPSCDAY6"  "HPSCTIM6"  "HPSCMOD6"  "SPBMDT6"  
[919] "HPBMDT6"   "BMDFLG6"   "BIODAY6"   "LANGBIO6"  "AICDPUM6"  "EXER12H6" 
[925] "EAT5HR6"   "ALCO24H6"  "METJEWL6"  "COMPBIA6"  "SIDE6"     "CONDRAW6" 
[931] "CONDFRZ6"  "IMPERAW6"  "IMPEFRZ6"  "PBFBIA6"   "SKELMM6"   "TBWNHAN6" 
[937] "FFMNHAN6"  "TBFNHAN6"  "PBFNHAN6"  "MISSCON6"  "MISSPHY6"  "FLAGSRP6" 
[943] "FLGBIOV6"  "CAMDAY6"  

What we’d ultimately like is to match up the observations in the lab table with the demographic data, only keeping columns of potential interest.

We’ll use the merge() function to join the data frames together. The function’s parameters refer to the “left” data frame as “x”, and the “right” data frame as “y”.

The merge() function allows us to make a number of choices:

swan_df <- merge(x = swan_demographic, y = swan_lab, by.x="SWANID", by.y="SWANID",
                   all.x=FALSE, all.y=TRUE)

Let’s take a peek at a few of the columns in swan_df, the data frame resulting from the merge.

head(colnames(swan_df), n=10)
 [1] "SWANID"   "VISIT.x"  "INTDAY6"  "AGE6"     "LANGINT6" "RACE"    
 [7] "VISIT.y"  "PREGNAN6" "PREVBLO6" "EATDRIN6"

We notice something unusual here. swan_df has columns named VISIT.x and VISIT.y. Sometimes both of the data frames in a merge have some columns with the same names. merge() keeps both columns, but to keep them distinct, it appends .x to the variable from the “x” data frame, and .y from the “y” data frame.

HERE WE NEED TO CONSIDER DPLYR FOR JOINING. #TODO

It might be a good idea to write out an intermediate data file with what we have so far, because this is the data we’ll be working with from this point forward. In a common format like CSV, you might wish to share the file with others at this point.

#TODO: Say why we have to use row.names=FALSE

write.csv(swan_df, file = 'data_out/swan_merge.csv', row.names = FALSE)

outcome variable. –>

Key Points

  • Use merge() to stack data

  • Use merge() to join data

  • Understand joins