Simple Linear Regression
Learning Module 10: Simple Linear Regression
Variation of the Dependent Variable (Sum of Squares total)
\[ \text{Variation of }Y \sum_{i=1}^{n} (Y_i - \bar{Y})^2 \tag{1} \]
Where:
- \(Y_i\): represent an observation of a company’s ROA
- \(\bar{Y}\): represent the mean ROA for the sample of size n
- \(n\): sample size
View Markdown Source
## Variation of the Dependent Variable (Sum of Squares total)
$$
\text{Variation of }Y \sum_{i=1}^{n} (Y_i - \bar{Y})^2 \tag{1}
$$
Where:
* $Y_i$: represent an observation of a company’s ROA
* $\bar{Y}$: represent the mean ROA for the sample of size n
* $n$: sample sizeVariation of the Independent Variable
\[ \text{Variation of } X = \sum_{i=1}^{n} (X_i - \bar{X})^2 \tag{2} \]
Where:
- \(X_i\): represent an observation of our explanatory variable
- \(\bar{X}\): sample mean of the independent variable
- \(n\): number of observations
View Markdown Source
## Variation of the Independent Variable
$$
\text{Variation of } X = \sum_{i=1}^{n} (X_i - \bar{X})^2 \tag{2}
$$
Where:
* $X_i$: represent an observation of our explanatory variable
* $\bar{X}$: sample mean of the independent variable
* $n$: number of observationsSimple Linear Regression Model
\[ Y_i = b_0 + b_1 X_i + \varepsilon_i, \ldots, n. \tag{3} \]
Where:
- \(Y_i\): dependent variable
- \(X_i\): independent variable
- \(b_0\): intercept
- \(b_1\): slope coefficient
- \(\varepsilon\): error term
View Markdown Source
## Simple Linear Regression Model
$$
Y_i = b_0 + b_1 X_i + \varepsilon_i, \ldots, n. \tag{3}
$$
Where:
* $Y_i$: dependent variable
* $X_i$: independent variable
* $b_0$: intercept
* $b_1$: slope coefficient
* $\varepsilon$: error termSum of Squares Error (SSE)
\[ \text{Sum of squares error } = \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 \]
\[ = \sum_{i=1}^{n} [ Y_i - ( \hat{b}_0 + \hat{b}_1 X_i )]^2 \tag{4} \]
\[ = \sum_{i=1}^{n} e_i^2 \]
Where:
- \(Y_i\): observed value of the dependent variable
- \(\hat{Y}_i\): predicted value of the dependent variable
- \(\hat{b}_0\): estimated intercept
- \(\hat{b}_1\): slope coefficient
- \(e_i\): residual for the \(i\)th observation
- \(e_i = Y_i - \hat{Y}_i\)
- \(n\): number of observations
View Markdown Source
## Sum of Squares Error (SSE)
$$
\text{Sum of squares error } = \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2
$$
$$
= \sum_{i=1}^{n} [ Y_i - ( \hat{b}_0 + \hat{b}_1 X_i )]^2 \tag{4}
$$
$$
= \sum_{i=1}^{n} e_i^2
$$
Where:
* $Y_i$: observed value of the dependent variable
* $\hat{Y}_i$: predicted value of the dependent variable
* $\hat{b}_0$: estimated intercept
* $\hat{b}_1$: slope coefficient
* $e_i$: residual for the $i$th observation
* $e_i = Y_i - \hat{Y}_i$
* $n$: number of observationsOrdinary Least Squares Slope Estimator
\[ \hat{b}_1 = \frac{\text{Covariance of } Y \text{ and } X}{\text{Variance of } X} = \frac{\frac{\sum_{i=1}^{n} (Y_i - \bar{Y})(X_i - \bar{X})}{n - 1}}{\frac{\sum_{i=1}^{n} (X_i - \bar{X})^2}{n - 1}} \]
Simplifying,
\[ \hat{b}_1 = \frac{\sum_{i=1}^{n} (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^{n} (X_i - \bar{X})^2} \tag{5} \]
Where:
- \(\hat{b}_1\): estimated slope coefficient
- \(Y_i\): dependent variable observation
- \(X_i\): independent variable observation
- \(\bar{Y}\): mean of the dependent variable \(Y\)
- \(\bar{X}\): mean of the independent variable \(X\)
- \(n\): number of observations
View Markdown Source
## Ordinary Least Squares Slope Estimator
$$
\hat{b}_1 = \frac{\text{Covariance of } Y \text{ and } X}{\text{Variance of } X} =
\frac{\frac{\sum_{i=1}^{n} (Y_i - \bar{Y})(X_i - \bar{X})}{n - 1}}{\frac{\sum_{i=1}^{n} (X_i - \bar{X})^2}{n - 1}}
$$
Simplifying,
$$
\hat{b}_1 = \frac{\sum_{i=1}^{n} (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^{n} (X_i - \bar{X})^2} \tag{5}
$$
Where:
* $\hat{b}_1$: estimated slope coefficient
* $Y_i$: dependent variable observation
* $X_i$: independent variable observation
* $\bar{Y}$: mean of the dependent variable $Y$
* $\bar{X}$: mean of the independent variable $X$
* $n$: number of observationsOrdinary Least Squares Intercept Estimator
\[ \hat{b}_0 = \bar{Y} - \hat{b}_1 \bar{X} \tag{6} \]
Where:
- \(\hat{b}_0\): estimated intercept
- \(\bar{Y}\): mean of the dependent variable \(Y\)
- \(\bar{X}\): mean of the independent variable \(X\)
- \(\hat{b}_1\): estimated slope
View Markdown Source
## Ordinary Least Squares Intercept Estimator
$$
\hat{b}_0 = \bar{Y} - \hat{b}_1 \bar{X} \tag{6}
$$
Where:
* $\hat{b}_0$: estimated intercept
* $\bar{Y}$: mean of the dependent variable $Y$
* $\bar{X}$: mean of the independent variable $X$
* $\hat{b}_1$: estimated slopeSample Correlation
\[ r = \frac{\text{Covariance of } Y \text{ and } X}{(\text{Standard deviation of } Y)(\text{Standard deviation of } X)} \tag{7} \]
Where:
- \(r\): sample correlation
- \(\text{Covariance of } Y \text{ and } X\): covariance between \(X\) and \(Y\)
- \(\text{Standard deviation of } X\): standard deviation of the independent variable
- \(\text{Standard deviation of } Y\): standard deviation of the dependent variable
View Markdown Source
## Sample Correlation
$$
r = \frac{\text{Covariance of } Y \text{ and } X}{(\text{Standard deviation of } Y)(\text{Standard deviation of } X)} \tag{7}
$$
Where:
* $r$: sample correlation
* $\text{Covariance of } Y \text{ and } X$: covariance between $X$ and $Y$
* $\text{Standard deviation of } X$: standard deviation of the independent variable
* $\text{Standard deviation of } Y$: standard deviation of the dependent variableHomoskedasticity Assumption
\[ E(\varepsilon_i^2) = \sigma_\varepsilon^2,\quad i = 1, \ldots, n \tag{8} \]
Where:
- \(\varepsilon_i\):
- \(\sigma_\varepsilon^2\):
- \(E\):
View Markdown Source
## Homoskedasticity Assumption
$$
E(\varepsilon_i^2) = \sigma_\varepsilon^2,\quad i = 1, \ldots, n \tag{8}
$$
Where:
* $\varepsilon_i$:
* $\sigma_\varepsilon^2$:
* $E$: Sum of Squares Regression (SSR)
\[ \sum_{i=1}^{n} \left( \hat{Y}_i - \bar{Y} \right)^2 \tag{9} \]
Where:
- \(\hat{Y}_i\): predicted value of the dependent variable
- \(\bar{Y}\): mean of the dependent variable
- \(n\): number of observations
View Markdown Source
## Sum of Squares Regression (SSR)
$$
\sum_{i=1}^{n} \left( \hat{Y}_i - \bar{Y} \right)^2 \tag{9}
$$
Where:
* $\hat{Y}_i$: predicted value of the dependent variable
* $\bar{Y}$: mean of the dependent variable
* $n$: number of observationsCoefficient of Determination \((R^2)\)
\[ \text{Coefficient of determination} = \dfrac{\text{Sum of squares regression}}{\text{Sum of squares total}} \]
\[ \text{Coefficient of determination} = \frac{\sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2}{\sum_{i=1}^{n} (Y_i - \bar{Y})^2} \tag{10} \]
Where:
- coefficient of determination: is the percentage of the variation of the dependent variable that is explained by the independent variable
- \(\hat{Y}_i\): predicted value of the dependent variable
- \(Y_i\): observed value of the dependent variable
- \(\bar{Y}\): mean of the dependent variable
- \(n\): number of observations
View Markdown Source
## Coefficient of Determination $(R^2)$
$$
\text{Coefficient of determination} = \dfrac{\text{Sum of squares regression}}{\text{Sum of squares total}}
$$
$$
\text{Coefficient of determination} = \frac{\sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2}{\sum_{i=1}^{n} (Y_i - \bar{Y})^2} \tag{10}
$$
Where:
* coefficient of determination: is the percentage of the variation of the dependent variable that is explained by the independent variable
* $\hat{Y}_i$: predicted value of the dependent variable
* $Y_i$: observed value of the dependent variable
* $\bar{Y}$: mean of the dependent variable
* $n$: number of observationsRelationship Between \(r^2\) and \(R^2\)
\[ r^2 = \frac{\sum_{i=1}^{n} \left( \hat{Y}_i - \bar{Y} \right)^2}{\sum_{i=1}^{n} \left( Y_i - \bar{Y} \right)^2} = R^2 \tag{11} \]
Where:
- \(r\): sample correlation coefficient
- \(R^2\): coefficient of determination
View Markdown Source
## Relationship Between $r^2$ and $R^2$
$$
r^2 = \frac{\sum_{i=1}^{n} \left( \hat{Y}_i - \bar{Y} \right)^2}{\sum_{i=1}^{n} \left( Y_i - \bar{Y} \right)^2} = R^2 \tag{11}
$$
Where:
* $r$: sample correlation coefficient
* $R^2$: coefficient of determinationMean Square Regression (MSR) with k Parameters
\[ \text{MSR} = \frac{\text{Sum of squares regression}}{k} = \frac{\sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2}{1} \tag{12} \]
Where:
- \(\text{MSR}\): mean square regression
- \(k\): number of independent variables
View Markdown Source
## Mean Square Regression (MSR) with k Parameters
$$
\text{MSR} = \frac{\text{Sum of squares regression}}{k} = \frac{\sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2}{1} \tag{12}
$$
Where:
* $\text{MSR}$: mean square regression
* $k$: number of independent variablesMean Square Regression (MSR)
\[ \text{MSR} = \sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2 \tag{13} \]
Where:
- \(\text{MSR}\): mean square regression
- \(\hat{Y}_i\): predicted value of the dependent variable
- \(\bar{Y}\): mean of the dependent variable
- \(n\): number of observations
View Markdown Source
## Mean Square Regression (MSR)
$$
\text{MSR} = \sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2 \tag{13}
$$
Where:
* $\text{MSR}$: mean square regression
* $\hat{Y}_i$: predicted value of the dependent variable
* $\bar{Y}$: mean of the dependent variable
* $n$: number of observationsMean Square Error (MSE)
\[ \text{MSE} = \frac{\text{Sum of squares error}}{n - k - 1} \]
\[ \text{MSE} = \frac{\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2}{n - 2} \tag{14} \]
Where:
- \(\text{MSE}\): mean square error
- \(Y_i\): observed value of the dependent variable
- \(\hat{Y}_i\): predicted value of the dependent variable
- \(n\): number of observations
View Markdown Source
## Mean Square Error (MSE)
$$
\text{MSE} = \frac{\text{Sum of squares error}}{n - k - 1}
$$
$$
\text{MSE} = \frac{\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2}{n - 2} \tag{14}
$$
Where:
* $\text{MSE}$: mean square error
* $Y_i$: observed value of the dependent variable
* $\hat{Y}_i$: predicted value of the dependent variable
* $n$: number of observationsF-distributed test statistic (MSR/MSE)
\[ F = \frac{\frac{\text{Sum of squares regression}}{k}}{\frac{\text{Sum of squares error}}{n - k - 1}} = \frac{\text{MSR}}{\text{MSE}} \]
\[ F = \frac{ \frac{\sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2}{1}}{ \frac{\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2}{n-2}} \tag{15} \]
Where:
- \(F\): F-statistic for testing overall regression significance
- \(\hat{Y}_i\): predicted value of the dependent variable
- \(Y_i\): observed value of the dependent variable
- \(\bar{Y}\): mean of the dependent variable
- \(n - 2\): degrees of freedom
- \(n\): number of observations
View Markdown Source
## F-distributed test statistic (MSR/MSE)
$$
F = \frac{\frac{\text{Sum of squares regression}}{k}}{\frac{\text{Sum of squares error}}{n - k - 1}} = \frac{\text{MSR}}{\text{MSE}}
$$
$$
F = \frac{ \frac{\sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2}{1}}{ \frac{\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2}{n-2}} \tag{15}
$$
Where:
* $F$: F-statistic for testing overall regression significance
* $\hat{Y}_i$: predicted value of the dependent variable
* $Y_i$: observed value of the dependent variable
* $\bar{Y}$: mean of the dependent variable
* $n - 2$: degrees of freedom
* $n$: number of observationst-Test Statistic for Slope Coefficient
\[ t = \frac{\hat{b}_1 - B_1}{s_{\hat{b}_1}} \tag{16} \]
Where:
- \(t\): t-statistic for hypothesis test of the slope
- \(\hat{b}_1\): estimated slope coefficient
- \(B_1\): hypothesized population slope
- \(s_{\hat{b}_1}\): standard error of the slope coefficient
View Markdown Source
## t-Test Statistic for Slope Coefficient
$$
t = \frac{\hat{b}_1 - B_1}{s_{\hat{b}_1}} \tag{16}
$$
Where:
* $t$: t-statistic for hypothesis test of the slope
* $\hat{b}_1$: estimated slope coefficient
* $B_1$: hypothesized population slope
* $s_{\hat{b}_1}$: standard error of the slope coefficientStandard Error of the Slope Coefficient
\[ s_{\hat{b}_1} = \frac{s_e}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2}} \tag{17} \]
Where:
- \(s_{\hat{b}_1}\): standard error of the slope estimate
- \(s_e\): standard error of the estimate
- \(X_i\):
- \(\bar{X}\):
- \(n\): number of observations
View Markdown Source
## Standard Error of the Slope Coefficient
$$
s_{\hat{b}_1} = \frac{s_e}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2}} \tag{17}
$$
Where:
* $s_{\hat{b}_1}$: standard error of the slope estimate
* $s_e$: standard error of the estimate
* $X_i$:
* $\bar{X}$:
* $n$: number of observationst-Test Statistic for Correlation
\[ t = \frac{r\sqrt{n - 2}}{\sqrt{1 - r^2}} \tag{18} \]
Where:
- \(t\): t-statistic for testing correlation significance
- \(r\): sample correlation coefficient
- \(n\): number of observations
View Markdown Source
## t-Test Statistic for Correlation
$$
t = \frac{r\sqrt{n - 2}}{\sqrt{1 - r^2}} \tag{18}
$$
Where:
* $t$: t-statistic for testing correlation significance
* $r$: sample correlation coefficient
* $n$: number of observationsStandard error of the intercept
\[ s_{\hat{b}_0} = S_e \sqrt{\frac{1}{n} + \frac{\bar{X}^2}{\sum_{i=1}^{n} (X_i - \bar{X})^2}} \tag{19} \]
View Markdown Source
## Standard error of the intercept
$$
s_{\hat{b}_0} = S_e \sqrt{\frac{1}{n} + \frac{\bar{X}^2}{\sum_{i=1}^{n} (X_i - \bar{X})^2}} \tag{19}
$$Intercept
\[ t_{\text{intercept}} = \frac{\hat{b}_0 - B_0}{s_{\hat{b}_0}} = \frac{\hat{b}_0 - B_0}{S \sqrt{\frac{1}{n} + \frac{\bar{X}^2}{\sum_{i=1}^{n} (X_i - \bar{X})^2}}} \tag{20} \]
Where:
- \(B_0\): hypothesized value
- \(\hat{b}_0\): estimated intercept
- \(s_{\hat{b}_0}\) standard error of the intercept
View Markdown Source
## Intercept
$$
t_{\text{intercept}} = \frac{\hat{b}_0 - B_0}{s_{\hat{b}_0}} = \frac{\hat{b}_0 - B_0}{S \sqrt{\frac{1}{n} + \frac{\bar{X}^2}{\sum_{i=1}^{n} (X_i - \bar{X})^2}}} \tag{20}
$$
Where:
* $B_0$: hypothesized value
* $\hat{b}_0$: estimated intercept
* $s_{\hat{b}_0}$ standard error of the interceptHypothesis Tests of Slope When the Independent Variable Is an Indicator Variable
\[ \text{RET}_i = b_0 + b_1 \text{EARN}_i + \varepsilon_i \tag{21} \]
Where:
- RET: monthly returns
- EARN: Indicator variable
View Markdown Source
## Hypothesis Tests of Slope When the Independent Variable Is an Indicator Variable
$$
\text{RET}_i = b_0 + b_1 \text{EARN}_i + \varepsilon_i \tag{21}
$$
Where:
* RET: monthly returns
* EARN: Indicator variableStandard Error of the Estimate
\[ \text{Standard error of the estimate } (s_e) = \sqrt{\text{MSE}} = \sqrt{ \frac{\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2}{n - 2} } \tag{22} \]
Where:
- \(s_e\): standard error of the estimate
- The \(s_e\) is a measure of the distance between the observed values of the dependent variable and those predicted from the estimated regression—the smaller the se, the better the fit of the model
View Markdown Source
## Standard Error of the Estimate
$$
\text{Standard error of the estimate } (s_e) = \sqrt{\text{MSE}} = \sqrt{ \frac{\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2}{n - 2} } \tag{22}
$$
Where:
* $s_e$: standard error of the estimate
* The $s_e$ is a measure of the distance between the observed values of the dependent variable and those predicted from the estimated regression—the smaller the se,
the better the fit of the modelForecasted Value of the Dependent Variable
\[ \hat{Y}_f = \hat{b}_0 + \hat{b}_1 X_f \tag{23} \]
Where:
- \(\hat{Y}_f\): forecasted value of the dependent variable
- \(X_f\): forecasted independent variable
View Markdown Source
## Forecasted Value of the Dependent Variable
$$
\hat{Y}_f = \hat{b}_0 + \hat{b}_1 X_f \tag{23}
$$
Where:
* $\hat{Y}_f$: forecasted value of the dependent variable
* $X_f$: forecasted independent variableStandard Error of the Forecast
estimated variance of the prediction error
\[ s_f^2 = s_e^2 \left[ 1 + \frac{1}{n} + \frac{(X_f - \bar{X})^2}{(n - 1)s_X^2} \right] = s_e^2 \left[ 1 + \frac{1}{n} + \frac{(X_f - \bar{X})^2}{\sum_{i=1}^{n} (X_i - \bar{X})^2} \right] \]
Where:
\(s_f^2\): estimated variance of the prediction error
Standard Error of the Forecast
\[ s_f = s_e \sqrt{1 + \frac{1}{n} + \frac{(X_f - \bar{X})^2}{\sum_{i=1}^{n} (X_i - \bar{X})^2}} \tag{24} \]
Where:
- \(s_f\): standard error of the forecast
- \(s_e\): standard error of the estimate
- \(X_f\): forecast value of the independent variable
View Markdown Source
## Standard Error of the Forecast
**estimated variance of the prediction error**
$$
s_f^2 = s_e^2 \left[ 1 + \frac{1}{n} + \frac{(X_f - \bar{X})^2}{(n - 1)s_X^2} \right]
= s_e^2 \left[ 1 + \frac{1}{n} + \frac{(X_f - \bar{X})^2}{\sum_{i=1}^{n} (X_i - \bar{X})^2} \right]
$$
Where:
$s_f^2$: estimated variance of the prediction error
**Standard Error of the Forecast**
$$
s_f = s_e \sqrt{1 + \frac{1}{n} + \frac{(X_f - \bar{X})^2}{\sum_{i=1}^{n} (X_i - \bar{X})^2}} \tag{24}
$$
Where:
* $s_f$: standard error of the forecast
* $s_e$: standard error of the estimate
* $X_f$: forecast value of the independent variableLog-Lin Model
\[ \ln Y_i = b_0 + b_1 X_i \tag{25} \]
Where:
- \(Y_i\): dependent variable
- is in logarithmic form
- \(X_i\): independent variable
- not in logarithmic form
- \(b_0\): intercept
- \(b_1\): slope coefficient
View Markdown Source
## Log-Lin Model
$$
\ln Y_i = b_0 + b_1 X_i \tag{25}
$$
Where:
* $Y_i$: dependent variable
* is in logarithmic form
* $X_i$: independent variable
* not in logarithmic form
* $b_0$: intercept
* $b_1$: slope coefficientLin-Log Model
\[ Y_i = b_0 + b_1 \ln X_i \tag{26} \]
Where:
- \(Y_i\): dependent variable
- not in logarithmic form
- \(X_i\): independent variable
- is in logarithmic form
- \(b_0\): intercept
- \(b_1\): slope coefficient
View Markdown Source
## Lin-Log Model
$$
Y_i = b_0 + b_1 \ln X_i \tag{26}
$$
Where:
* $Y_i$: dependent variable
* not in logarithmic form
* $X_i$: independent variable
* is in logarithmic form
* $b_0$: intercept
* $b_1$: slope coefficientLog-Log Model
\[ \ln Y_i = b_0 + b_1 \ln X_i \tag{27} \]
Where:
- \(Y_i\): dependent variable
- logarithmic form
- \(X_i\): independent variable
- logarithmic form
- \(b_0\): intercept
- \(b_1\): slope coefficient
- This model is useful in calculating elasticities because the slope coefficient is the relative change in the dependent variable for a relative change in the independent variable.
View Markdown Source
## Log-Log Model
$$
\ln Y_i = b_0 + b_1 \ln X_i \tag{27}
$$
Where:
* $Y_i$: dependent variable
* logarithmic form
* $X_i$: independent variable
* logarithmic form
* $b_0$: intercept
* $b_1$: slope coefficient
* This model is useful in calculating elasticities because the slope
coefficient is the relative change in the dependent variable for a
relative change in the independent variable.