Numerical Range: The correlation coefficient is a numerical value ranging from to . The sign ( or ) indicates the direction of the relationship, while the absolute value indicates the strength.
Strength Interpretation: A coefficient of represents a perfect linear relationship, while represents no relationship. Generally, values between and are considered weak, to moderate, and to strong.
Coefficient of Determination (): Squaring the correlation coefficient provides the proportion of variance in one variable that is predictable from the other. For instance, an of results in an of , meaning of the variance is shared between the variables.
The Causality Gap: A high correlation coefficient does not imply that one variable causes the other to change. It only indicates that they vary together, which could be due to a variety of reasons other than direct influence.
The Third Variable Problem: Often, a correlation between two variables is actually caused by a third, unmeasured variable (an extraneous or confounding variable). For example, a correlation between ice cream sales and drowning incidents is actually caused by the third variable of hot weather.
| Feature | Correlation | Experimentation |
|---|---|---|
| Variables | Co-variables (measured) | IV and DV (manipulated) |
| Goal | Identify relationships | Establish cause and effect |
| Control | Low control over environment | High control over variables |
| Conclusion | Variables are related | Variable A causes Variable B |
Interpreting the Sign: Always distinguish between the direction and the strength. A correlation of is stronger than a correlation of , even though the latter is a positive number.
Scatter Plot Analysis: When asked to describe a scatter plot, always mention the direction (positive/negative), the strength (how close the points are to a line), and whether the relationship appears linear.
Identifying Outliers: Be vigilant for individual data points that fall far from the general cluster. These outliers can disproportionately affect the correlation coefficient, making a relationship seem stronger or weaker than it truly is.
Non-Linear Relationships: Pearson's only measures linear relationships. If the data follows a curve (curvilinear), the correlation coefficient might be near zero even if a very strong relationship exists.
Directionality Confusion: Students often assume that if A and B are correlated, A must lead to B. In reality, B could lead to A, or they could both be influenced by an external factor.
Sample Size Sensitivity: Small samples can produce high correlation coefficients by chance. It is essential to consider the sample size and statistical significance before concluding that a relationship is meaningful.