<>=
## data-generating process: linear vs. quadratic, homoscedastic vs. heteroscedastic
type <- sample(c("constant", "linear"), size = 1, prob = c(0.35, 0.65))
d <- data.frame(x = runif(100, -1, 1))
a <- 0
b <- if(type == "constant") 0 else sample(c(-1, 1), 1) * runif(1, 0.6, 0.9)
d$y <- a + b * d$x + rnorm(100, sd = 0.25)
write.csv(d, "linreg.csv", row.names = FALSE, quote = FALSE)
## model and interpretation
m <- lm(y ~ x, data = d)
ahat <- coef(m)[1]
bhat <- coef(m)[2]
bpvl <- summary(m)$coefficients[2, 4]
bsol <- c(bpvl >= 0.05, (bpvl < 0.05) & (bhat > 0), (bpvl < 0.05) & (bhat < 0))
bint <- c("\\\\texttt{x} and \\\\texttt{y} are not significantly correlated", "\\\\texttt{y} increases significantly with \\\\texttt{x}", "\\\\texttt{y} decreases significantly with \\\\texttt{x}")
bint <- bint[bsol]
@
\begin{question}
\textbf{Theory:} Consider a linear regression of \texttt{y} on \texttt{x}. It is usually estimated with
which estimation technique (three-letter abbreviation)?
##ANSWER1##
This estimator yields the best linear unbiased estimator (BLUE) under the assumptions
of the Gauss-Markov theorem. Which of the following properties are required for the
errors of the linear regression model under these assumptions?
##ANSWER2##
\textbf{Application:} Using the data provided in \url{linreg.csv} estimate a
linear regression of \texttt{y} on \texttt{x}. What are the estimated parameters?
Intercept: ##ANSWER3##
Slope: ##ANSWER4##
In terms of significance at 5\% level:
##ANSWER5##
\begin{answerlist}
\item
\item independent
\item zero expectation
\item normally distributed
\item identically distributed
\item homoscedastic
\item
\item
\item \texttt{x} and \texttt{y} are not significantly correlated
\item \texttt{y} increases significantly with \texttt{x}
\item \texttt{y} decreases significantly with \texttt{x}
\end{answerlist}
\end{question}
\begin{solution}
\textbf{Theory:} Linear regression models are typically estimated by ordinary least squares (OLS).
The Gauss-Markov theorem establishes certain optimality properties: Namely, if the errors
have expectation zero, constant variance (homoscedastic), no autocorrelation and the
regressors are exogenous and not linearly dependent, the OLS estimator is the best linear
unbiased estimator (BLUE).
\textbf{Application:} The estimated coefficients along with their significances are reported in the
summary of the fitted regression model, showing that \Sexpr{bint} (at 5\% level).
<>=
summary(m)
@
\textbf{Code:} The analysis can be replicated in R using the following code.
\begin{verbatim}
## data
d <- read.csv("linreg.csv")
## regression
m <- lm(y ~ x, data = d)
summary(m)
## visualization
plot(y ~ x, data = d)
abline(m)
\end{verbatim}
\end{solution}
\exname{Linear regression}
\extype{cloze}
\exsolution{OLS|01001|\Sexpr{fmt(ahat, 3)}|\Sexpr{fmt(bhat, 3)}|\Sexpr{mchoice2string(bsol)}}
\exclozetype{string|mchoice|num|num|schoice}
\extol{0.01}