```{r data generation, echo = FALSE, results = "hide"} ## data-generating process: linear vs. quadratic, homoscedastic vs. heteroscedastic type <- sample(c("constant", "linear"), size = 1, prob = c(0.35, 0.65)) d <- data.frame(x = runif(100, -1, 1)) a <- 0 b <- if(type == "constant") 0 else sample(c(-1, 1), 1) * runif(1, 0.6, 0.9) d\$y <- a + b * d\$x + rnorm(100, sd = 0.25) write.csv(d, "linreg.csv", row.names = FALSE, quote = FALSE) ## model and interpretation m <- lm(y ~ x, data = d) ahat <- coef(m)[1] bhat <- coef(m)[2] bpvl <- summary(m)\$coefficients[2, 4] bsol <- c(bpvl >= 0.05, (bpvl < 0.05) & (bhat > 0), (bpvl < 0.05) & (bhat < 0)) bint <- c("`x` and `y` are not significantly correlated", "`y` increases significantly with `x`", "`y` decreases significantly with `x`") bint <- bint[bsol] ``` Question ======== **Theory:** Consider a linear regression of `y` on `x`. It is usually estimated with which estimation technique (three-letter abbreviation)? ##ANSWER1## This estimator yields the best linear unbiased estimator (BLUE) under the assumptions of the Gauss-Markov theorem. Which of the following properties are required for the errors of the linear regression model under these assumptions? ##ANSWER2## **Application:** Using the data provided in [linreg.csv](linreg.csv) estimate a linear regression of `y` on `x`. What are the estimated parameters? Intercept: ##ANSWER3## Slope: ##ANSWER4## In terms of significance at 5% level: ##ANSWER5## Answerlist ---------- * * independent * zero expectation * normally distributed * identically distributed * homoscedastic * * * `x` and `y` are not significantly correlated * `y` increases significantly with `x` * `y` decreases significantly with `x` Solution ======== **Theory:** Linear regression models are typically estimated by ordinary least squares (OLS). The Gauss-Markov theorem establishes certain optimality properties: Namely, if the errors have expectation zero, constant variance (homoscedastic), no autocorrelation and the regressors are exogenous and not linearly dependent, the OLS estimator is the best linear unbiased estimator (BLUE). **Application:** The estimated coefficients along with their significances are reported in the summary of the fitted regression model, showing that `r bint` (at 5% level). ```{r regression, echo = FALSE, comment = NA} summary(m) ``` **Code:** The analysis can be replicated in R using the following code. ``` ## data d <- read.csv("linreg.csv") ## regression m <- lm(y ~ x, data = d) summary(m) ## visualization plot(y ~ x, data = d) abline(m) ``` Meta-information ================ exname: Linear regression extype: cloze exsolution: OLS|01001|`r fmt(ahat, 3)`|`r fmt(bhat, 3)`|`r mchoice2string(bsol)` exclozetype: string|mchoice|num|num|schoice extol: 0.01