In simple linear regression, the population regression line is defined by the equation , where represents the true, unknown rate of change in the response variable for every one-unit increase in the explanatory variable.
When we take a random sample of size , we calculate a sample regression line , where is the sample slope used to estimate the population parameter .
Because different random samples yield different values of , the sample slope is a random variable with its own probability distribution, known as the sampling distribution of the sample slope.
Center: The mean of the sampling distribution of is equal to the population slope, . This indicates that the sample slope is an unbiased estimator of the population slope.
Spread: The theoretical standard deviation of the sample slope is , where is the standard deviation of the population residuals and is the standard deviation of the -values.
Interpretation: The spread of the distribution decreases as the sample size increases or as the variability in the -values increases, leading to more precise estimates of the slope.
In practice, we rarely know the population standard deviation of residuals (), so we must estimate it using the standard error of the residuals (), calculated as .
The standard error of the slope () is the estimated standard deviation of the sampling distribution and is calculated as .
This value describes how much the sample slope typically varies from the population slope in repeated random sampling.
When the conditions are met, the standardized statistic follows a t-distribution.
The degrees of freedom for this distribution are .
We use because we have estimated two parameters from the sample data to create the regression line: the intercept () and the slope ().
| Term | Symbol | Definition |
|---|---|---|
| Population Slope | The true rate of change in the entire population. | |
| Sample Slope | The estimated rate of change calculated from a specific sample. | |
| Standard Error of Slope | How much varies across different samples. | |
| Standard Error of Residuals | The typical distance between observed values and the regression line. |
Identify the Parameter: Always define in the context of the problem (e.g., 'the true slope of the relationship between height and weight').
Check the df: Remember that for regression, . Using is a common mistake that leads to incorrect critical values.
Residual Plot Interpretation: If a residual plot shows a clear curve, the linearity condition is violated, and the sampling distribution of the slope may not be valid.
Computer Output: On exams, you are often given a table. The sample slope is usually in the 'Coef' column next to the variable name, and is in the 'SE Coef' column.