```{r data generation, echo = FALSE, results = "hide"} ## DATA GENERATION n <- sample(35:65,1) mx <- runif(1, 40, 60) my <- runif(1, 200, 280) sx <- runif(1, 9, 12) sy <- runif(1, 44, 50) r <- round(runif(1, 0.5, 0.9), 2) x <- rnorm(n, mx, sd = sx) y <- (r * x/sx + rnorm(n, my/sy - r * mx/sx, sqrt(1 - r^2))) * sy mx <- round(mean(x)) my <- round(mean(y)) r <- round(cor(x, y), digits = 2) varx <- round(var(x)) vary <- round(var(y)) b <- r * sqrt(vary/varx) a <- my - b * mx X <- round(runif(1, -10, 10) + mx) ## QUESTION/ANSWER GENERATION sol <- round(a + b * X, 3) ``` Question ======== For `r n` firms the number of employees $X$ and the amount of expenses for continuing education $Y$ (in EUR) were recorded. The statistical summary of the data set is given by: | | Variable $X$ | Variable $Y$ | |:--------:|:------------:|:------------:| | Mean | `r mx` | `r my` | | Variance | `r varx` | `r vary` | The correlation between $X$ and $Y$ is equal to `r r`. Estimate the expected amount of money spent for continuing education by a firm with `r X` employees using least squares regression. Solution ======== First, the regression line $y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$ is determined. The regression coefficients are given by: \begin{eqnarray*} && \hat \beta_1 = r \cdot \frac{s_y}{s_x} = `r r` \cdot \sqrt{\frac{`r vary`}{`r varx`}} = `r round(b,5)`, \\ && \hat \beta_0 = \bar y - \hat \beta_1 \cdot \bar x = `r my` - `r round(b,5)` \cdot `r mx` = `r round(a,5)`. \end{eqnarray*} The estimated amount of money spent by a firm with `r X` employees is then given by: \begin{eqnarray*} \hat y = `r round(a, 5)` + `r round(b, 5)` \cdot `r X` = `r sol`. \end{eqnarray*} Meta-information ================ extype: num exsolution: `r fmt(sol, 3)` exname: Prediction extol: 0.01