![]() |
Introduction
To R |
Data Frames are much like matrices - they have rows and columns, and hence two dimensions. But each column can store a different type of thing. The first column could be a number, the second a character variable, the third a factor. Data Frames are the best way to store data where each row corresponds to a unit, or person, and each column represents a measurement on the units.
CAD3004,Frank Black, Frank Black, 15, CD Col4851,Weather Report, Sweetnighter, 6, CD Rep2257,Neil Young, Decade I, 19, CD Rep4335,Neil Young, Weld, 12, CD Chp1432,Red Hot Chili Peppers, Mother's Milk, 13, Tape EMI1233,Primus, The Brown Album, 12, Tape Atl4500,Led Zeppelin, Led Zep 3, 11, CDWe use read.table() to read this function into an object. We give it the filename, and in this case we have to tell it that fields are separated by commas:
> music <- read.table("music.dat",sep=",",row.names=1,quote="") > music V2 V3 V4 V5 CAD3004 Frank Black Frank Black 15 CD Col4851 Weather Report Sweetnighter 6 CD Rep2257 Neil Young Decade I 19 CD Rep4335 Neil Young Weld 12 CD Chp1432 Red Hot Chili Peppers Mother's Milk 13 Tape EMI1233 Primus The Brown Album 12 Tape Atl4500 Led Zeppelin Led Zep 3 11 CDNotice how some columns are numbers, and some are text. You can't do that with an ordinary matrix.
> music[,3] [1] 15 6 19 12 13 12 11 > music[2,] V2 V3 V4 V5 Col4851 Weather Report Sweetnighter 6 CD
> music$V5 [1] CD CD CD CD Tape Tape CD Levels: CD TapeYou can give the columns more sensible names by assigning the names() to a character vector:
> names(music) <- c('Artist','Title','Ntracks','Format') > music$Title [1] Frank Black Sweetnighter Decade I Weld [5] Mother's Milk The Brown Album Led Zep 3 Levels: Decade I Frank Black Led Zep 3 Mother's Milk Sweetnighter The Brown Album WeldNote that the first column, the catalogue number, isn't really part of the data frame, and so doesn't have a names() entry. You can get or set these values using row.names():
> row.names(music) [1] "CAD3004" "Col4851" "Rep2257" "Rep4335" "Chp1432" "EMI1233" "Atl4500"
> music <- cbind(music,Rate=c(7,6,9,10,9,8,8)) Marks out of 10 > music Artist Title Ntracks Format Rate CAD3004 Frank Black Frank Black 15 CD 7 Col4851 Weather Report Sweetnighter 6 CD 6 Rep2257 Neil Young Decade I 19 CD 9 Rep4335 Neil Young Weld 12 CD 10 Chp1432 Red Hot Chili Peppers Mother's Milk 13 Tape 9 EMI1233 Primus The Brown Album 12 Tape 8 Atl4500 Led Zeppelin Led Zep 3 11 CD 8
> is.factor(music$Title) [1] TRUE > music$Title <- as.character(music$Title) > music$Artist <- as.character(music$Artist) > music Artist Title Ntracks Format rate CAD3004 Frank Black Frank Black 15 CD 7 Col4851 Weather Report Sweetnighter 6 CD 6 Rep2257 Neil Young Decade I 19 CD 9 Rep4335 Neil Young Weld 12 CD 10 Chp1432 Red Hot Chili Peppers Mother's Milk 13 Tape 9 EMI1233 Primus The Brown Album 12 Tape 8 Atl4500 Led Zeppelin Led Zep 3 11 CD 8