Help to exercise 3 in the R-exercises
# QUESTION b)
# Generate 100 standard normally distributed observations of x1, x2, and e with? cor(x1,x2) = 0.5
# This gives 100 observations of ?y = x1 + x2 + e
# Fit the model with x1 and x2 and the model without x2:
rho<-0.5
x1<-rnorm(100)
x2<-rho*x1+sqrt(1-rho^2)*rnorm(100)
e<-rnorm(100)
y<-x1+x2+e
summary(lm(y~x1+x2))
summary(lm(y~x1))
# Look at the estimates for the two models. How do they agree with the theory in question a?
# Discuss the implications this result may have if an important covariate is left out in a regression analysis.
# Check if the estimate in the model without x2 agrees with the formula given in the introduction to the exercise:
cor(x1,x2)
koef<-lm(y~x1+x2)$coef
koef
koef[2]+koef[3]*cor(x1,x2)*(sd(x2)/sd(x1))
lm(y~x1)$coef
# Make sure that you understand the computations! Is the result in agreement with the theory?
# QUESTION c)
# Generate new data with cor=0.95 and ?y = x1 - x2 + e.
# Fit linear regression models with each of the covariates separately and with both the covariates:
rho=0.95
x1<-rnorm(100)
x2<-rho*x1+sqrt(1-rho^2)*rnorm(100)
e<-rnorm(100)
y<-x1-x2+e
summary(lm(y~x1))
summary(lm(y~x2))
summary(lm(y~x1+x2))
# Look at the results for the regression models with only one covariate and for the model with both covariates.
# Discuss the implications these results may have if an important covariate is left out in a regression analysis.
#QUESTION d)
# Do 100 simulations of the situation in question b:
rho=0.50
koefsim<-numeric(0)
for (i in 1:100)
{
? ??x1<-rnorm(100)
? ??x2<-rho*x1+sqrt(1-rho^2)*rnorm(100)
??? e<-rnorm(100)
? ??y<-x1+x2+e
? ??koefsim<-rbind(koefsim,lm(y~x1+x2)$coef[2:3])
}
plot(koefsim[,1],koefsim[,2])
cor(koefsim)
# What is the correlation between the estimates?
# Repeat the simulations with ?rho=0.90, rho=-0.50, and rho=-0.90. ?What do you see?