Covariance vs Correlation
Covariance and correlation both describe how two variables move together—but they differ in scale, interpretation, and usefulness.
Covariance
Definition: Measures the direction of the relationship between two variables.
Formula:
\[ \text{Cov}(X, Y) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu_X)(y_i - \mu_Y) \]
For a sample:
\[ \text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{X})(y_i - \bar{Y}) \]
Interpretation:
- Positive → variables increase together
- Negative → one increases while the other decreases
- Zero → no linear relationship
Limitation: Depends on the units of measurement, making it hard to interpret magnitude.
Correlation
Definition: Measures both the direction and strength of a linear relationship.
Formula (Pearson correlation):
\[ r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} \]
Expanded form:
\[ r = \frac{\sum (x_i - \bar{X})(y_i - \bar{Y})}{\sqrt{\sum (x_i - \bar{X})^2 \sum (y_i - \bar{Y})^2}} \]
Range: \(-1 \leq r \leq 1\)
Interpretation:
- +1 → perfect positive relationship
- -1 → perfect negative relationship
- 0 → no linear relationship
Advantage: Unit-free and easy to compare across datasets.
Relationship Between Covariance and Correlation
\[ \text{Correlation} = \frac{\text{Covariance}}{\text{Standard Deviation of X} \times \text{Standard Deviation of Y}} \]
Correlation is essentially a normalized version of covariance.
Example
Covariance = 200 → Hard to interpret
Correlation = 0.85 → Strong positive relationship