![]() |
Introduction
To R |
> system("head skull.dat") MaxBreadth BaseHeight BaseLength NasalHeight Year 131 138 89 49 -4000 125 131 92 48 -4000 131 132 99 50 -4000 119 132 96 44 -4000 136 143 100 54 -4000 138 137 89 56 -4000 139 130 108 48 -4000 125 136 93 48 -4000 131 134 102 51 -4000
> skull <- read.table('skull.dat',head=T) > dim(skull) [1] 150 5 > skull[1:5,] First five rows... MaxBreadth BaseHeight BaseLength NasalHeight Year 1 131 138 89 49 -4000 2 125 131 92 48 -4000 3 131 132 99 50 -4000 4 119 132 96 44 -4000 5 136 143 100 54 -4000We want to investigate the relation between the BaseHeight of the skull, and the BaseLength.
> plot(skull$BaseH,skull$BaseL)and we want to fit a model that relates the Base Length to the Base Height.
> lm(skull$BaseL ~ skull$BaseH) Can abbreviate names... Call: lm(formula = skull$BaseL ~ skull$BaseH) Coefficients: (Intercept) skull$BaseH 58.3103 0.2878Note how lm() is called with lm(y ~ x) - this we read as y depends on x. This is the way that many models are constructed - with the quantity you are measuring (the response) as the y value and the quantities you control (the predictors) as the x variables. Of course in this situation its just two sets of measurements... Anyway...
> sfit <- lm(skull$BaseL~skull$BaseH) > is.list(sfit) [1] TRUE > names(sfit) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model"These are all important results from the fitted model. Perhaps the most important is the coefficients - you can use these to add the fitted model to the scatter plot:
> plot(skull$BaseH,skull$BaseL) > sfit$coef (Intercept) skull$BaseH 58.31026 0.2878212 > abline(sfit$coeff)The abline() function takes a vector of length two and draws a line with that slope and intercept.
Other components of the fit are useful for diagnostic purposes. For example you can look at the plot of the fitted values against the residuals:
> plot(sfit$fitted,sfit$resid)this can show deviations from the data fitting a linear model well, if there is significant curvature in the plot.
You can also do a histogram of the residuals to look for outliers - if the data really comes from a linear model plus gaussian noise then the residuals should come from a normal distribution:
> hist(sfit$resid)
> class(sfit) [1] "lm"The fact that sfit is an object of the class lm implies that diagnostic plots as those seen above can be obtained simply by typing:
> plot(sfit)This is possible because plot() is a generic function: the type of the plot produced is dependent on the type or class of the argument.