{r data generation, echo = FALSE, results = "hide"} ## DATA GENERATION n <- sample(35:65,1) mx <- runif(1, 40, 60) my <- runif(1, 200, 280) sx <- runif(1, 9, 12) sy <- runif(1, 44, 50) r <- round(runif(1, 0.5, 0.9), 2) x <- rnorm(n, mx, sd = sx) y <- (r * x/sx + rnorm(n, my/sy - r * mx/sx, sqrt(1 - r^2))) * sy mx <- round(mean(x)) my <- round(mean(y)) r <- round(cor(x, y), digits = 2) varx <- round(var(x)) vary <- round(var(y)) b <- r * sqrt(vary/varx) a <- my - b * mx X <- round(runif(1, -10, 10) + mx) ## QUESTION/ANSWER GENERATION sol <- round(a + b * X, 3)  Question ======== For r n firms the number of employees $X$ and the amount of expenses for continuing education $Y$ (in EUR) were recorded. The statistical summary of the data set is given by: | | Variable $X$ | Variable $Y$ | |:--------:|:------------:|:------------:| | Mean | r mx | r my | | Variance | r varx | r vary | The correlation between $X$ and $Y$ is equal to r r. Estimate the expected amount of money spent for continuing education by a firm with r X employees using least squares regression. Solution ======== First, the regression line $y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$ is determined. The regression coefficients are given by: \begin{eqnarray*} && \hat \beta_1 = r \cdot \frac{s_y}{s_x} = r r \cdot \sqrt{\frac{r vary}{r varx}} = r round(b,5), \\ && \hat \beta_0 = \bar y - \hat \beta_1 \cdot \bar x = r my - r round(b,5) \cdot r mx = r round(a,5). \end{eqnarray*} The estimated amount of money spent by a firm with r X employees is then given by: \begin{eqnarray*} \hat y = r round(a, 5) + r round(b, 5) \cdot r X = r sol. \end{eqnarray*} Meta-information ================ extype: num exsolution: r fmt(sol, 3) exname: Prediction extol: 0.01