```{r data generation, echo = FALSE, results = "hide"}
## data-generating process: linear vs. quadratic, homoscedastic vs. heteroscedastic
type <- sample(c("constant", "linear", "quadratic", "heteroscedastic"), size = 1, prob = c(0.25, 0.35, 0.2, 0.2))
d <- data.frame(x = runif(100, -1, 1))
a <- 0
b <- if(type == "constant") 0 else sample(c(-1, 1), 1) * runif(1, 0.6, 0.9)
c <- if(type != "quadratic") 0 else sample(c(-1, 1), 1) * runif(1, 0.3, 0.6)
s <- if(type != "heteroscedastic") 0.25 else exp(-1.5 + sign(b) * d$x)
d$y <- a + b * d$x + c * d$x^2 + rnorm(100, sd = s)
write.csv(d, "linreg.csv", row.names = FALSE, quote = FALSE)
## model
m <- lm(y ~ x, data = d)
ahat <- coef(m)[1]
bhat <- coef(m)[2]
bpvl <- summary(m)$coefficients[2, 4]
bsol <- c(bpvl >= 0.05, (bpvl < 0.05) & (bhat > 0), (bpvl < 0.05) & (bhat < 0))
## interpretation
bint <- c("`x` and `y` are not significantly correlated", "`y` increases significantly with `x`", "`y` decreases significantly with `x`")
bint <- bint[bsol]
typeint <- switch(type,
"quadratic" = "the true relationship between `y` and `x` is not linear but quadratic (and hence errors do not have zero expectation)",
"heteroscedastic" = "the errors are heteroscedastic with increasing variance along with the mean",
"the assumptions of the Gauss-Markov theorem are reasonably well fulfilled"
)
```
Question
========
**Theory:** Consider a linear regression of `y` on `x`. It is usually estimated with
which estimation technique (three-letter abbreviation)?
##ANSWER1##
This estimator yields the best linear unbiased estimator (BLUE) under the assumptions
of the Gauss-Markov theorem. Which of the following properties are required for the
errors of the linear regression model under these assumptions?
##ANSWER2##
**Application:** Using the data provided in [linreg.csv](linreg.csv) estimate a
linear regression of `y` on `x`. What are the estimated parameters?
Intercept: ##ANSWER3##
Slope: ##ANSWER4##
In terms of significance at 5% level:
##ANSWER5##
**Interpretation:** Consider various diagnostic plots for the fitted linear
regression model. Do you think the assumptions of the Gauss-Markov theorem are
fulfilled? What are the consequences?
##ANSWER6##
**Code:** Please upload your code script that reads the data, fits the regression model,
extracts the quantities of interest, and generates the diagnostic plots.
##ANSWER7##
Answerlist
----------
*
* independent
* zero expectation
* normally distributed
* identically distributed
* homoscedastic
*
*
* `x` and `y` are not significantly correlated
* `y` increases significantly with `x`
* `y` decreases significantly with `x`
*
*
Solution
========
**Theory:** Linear regression models are typically estimated by ordinary least squares (OLS).
The Gauss-Markov theorem establishes certain optimality properties: Namely, if the errors
have expectation zero, constant variance (homoscedastic), no autocorrelation and the
regressors are exogenous and not linearly dependent, the OLS estimator is the best linear
unbiased estimator (BLUE).
**Application:** The estimated coefficients along with their significances are reported in the
summary of the fitted regression model, showing that `r bint` (at 5% level).
```{r regression, echo = FALSE, comment = NA}
summary(m)
```
**Interpretation:** Considering the visualization of the data along with the diagnostic plots suggests
that `r typeint`.
\
```{r visualizations, echo = FALSE, results = "hide", fig.height = 4.5, fig.width = 13.5, fig.path = "", fig.cap = ""}
par(mfrow = c(1, 3))
plot(y ~ x, data = d, main = "Data and regression fit")
abline(m)
plot(m, which = 1)
plot(m, which = 3)
```
**Code:** The analysis can be replicated in R using the following code.
```
## data
d <- read.csv("linreg.csv")
## regression
m <- lm(y ~ x, data = d)
summary(m)
## visualization
plot(y ~ x, data = d)
abline(m)
## diagnostic plots
plot(m)
```
Meta-information
================
exname: Linear regression
extype: cloze
exsolution: OLS|01001|`r fmt(ahat, 3)`|`r fmt(bhat, 3)`|`r mchoice2string(bsol)`|nil|nil
exclozetype: string|mchoice|num|num|schoice|essay|file
extol: 0.01