Seven Meanings of Correlation
by Lee Corbin
19 July 1995

  1. The English Language Meaning.

    The term "correlation" among educated people is often used to describe a positive- varying relationship when there is no intention of imputing causality. (It's often safer to say that A is correlated with B than to say that A is preceded by B, for example, because the latter might imply a causal relationship.) The Oxford American dictionary defines it as "a systematic connection".

  2. The Mathematical Meaning.

    The mathematical definition of correlation most commonly used, the Pearson "product moment correlation" is well described in several non- technical books, among them "The Bell Curve." An equivalent way of looking at correlation was brought to my attention by Norm Hardy. Suppose, for example, that we only have three data "points" when examining a relationship between the "height" and "weight" of three people: (h1,w1), (h2,w2), and (h3,w3). Of course, you properly visualize this as three points on a two dimensional graph.
    But another way to visualize this, not so apparent, is to think of two three dimensional vectors: (h1,h2,h3) and (w1,w2,w3). In a strong sense, the correlation of "height" to "weight" is the so-called dot product of these two vectors, a sort of measure of their difference in direction, (or the angle between them). If these vectors are aligned almost completely, this denotes a very high correlation. If they are perpendicular, the correlation is zero. If they point in opposite directions, i.e., the heights go up as the weights go down, the correlation is negative.

  3. The Elliptical Meaning.

    As the mathematical meaning that's usually given in books attempts to convey, the plot of data points results in an overall geometrical shape. In the case of two variables, uncorrelated data forms a nearly circular or square shape: no axis of symmetry is apparent. In other words, for every example of one variable being high, the other variable may be high or low for that same data point.

    But when the shape is "elliptical", a positive or negative correlation is present, which means a lack of data in which, say, one variable is high and the other variable is low (positive correlation).

    I have found a simple formula that relates the shape of the ellipse to mathematical correlation. Start with the observation that an ellipse is a squashed circle. Let s be the "squeeze factor" necessary to obtain a given ellipse. A squeeze factor of 2 reduces, say, the vertical size of an ellipse to half the horizontal size. (A squeeze factor of s results in a major axis s times as large as the minor axis.)

    Then the correlation of an elliptical distribution is
                        s^2 - 1
                   r =  -------
                        s^2 + 1
    
    so that, for example, elliptically shaped data with a squeeze factor of 2 has correlation 0.6 (two squared minus one divided by two squared plus one). It follows that a correlation of r implies a squeeze factor equal to the square root of (1-r)/(1+r). This last formula also provides a quick visualization of a corresponding ellipse given a correlation.

  4. Newsweek Phrase.

    In an October 24, 1994 issue of Newsweek in an article on "The Bell Curve", the following sentence was used to elucidate the meaning of correlation: "A correlation of .4 would tell you that 40 percent of the variation in one thing is matched by variation in another, while 60 percent of it is not." This provides the right flavor, I suppose, because one can imagine two things varying, sometimes together, sometimes not, and that the correlation between them provides some measure of this tendency.

    But "variation" is technically misleading here (see Pythagorean Correlation below). Perhaps they should have written "A correlation of .4 would tell you that 16 percent of the variation in one thing is matched by variation in another, while 84 percent of it is not". (16 percent is .4 squared.) Yet this striking phrase did motivate me to conduct some experiments and investigations about what correlations arise when you deliberately allow two variables to vary sometimes together and sometimes not:

  5. Variation Together and Not.

    Suppose we plot some two variable data on a standard graph, and let the abscissa of a point denote, as usual, the value of one variable and its ordinate the value of the other. People's heights and weights is a customary example.

    What correlation obtains if precisely half the points lie on the line x=y, and exactly half are at random? In other words, what correlation do we obtain if half the data is perfectly correlated ( r=1 ) and half is perfectly uncorrelated ( r=0 )? You will be pleased to know that the resultant correlation is .5.

    Likewise, if p percent of the data is perfectly aligned on the line y=x, and (100-p) percent is scattered at random (all within the unit square), then also, to my enormous relief, the correlation turns out to be exactly p/100. So, for example, a correlation coefficient of .3 could represent data in which three tenths of the data is perfectly correlated, with no correlation whatever between the other seven-tenths of the data items.

    (Some other distributions that occurred to me resulted in some very interesting correlations of p squared, and square root of p, but they were not quite as natural as the foregoing.)

  6. Pythagorean Correlation.

    Appendix 1 of "The Bell Curve" has a mysterious paragraph that reads: "Whatever the correlation coefficient of a pair of variables is, squaring it yields another notable number. Squaring .50, for example, give .25. The significance of the squared correlation is that it tells how much the variation in weight would decrease if we could make everyone the same height, or vice versa. If all the boys in the class were the same height, the variation in their weights would decline by 25 percent. Perhaps, if you have been compelled to be around social scientists, you have heard the phrase "explains the variance," as in, for example, "Education explains 20 percent of the variance in income." That figure comes from the squared correlation."

    Once again, "variance" seems mysterious, but actually, refers here to its statistical meaning (which is a squared measure that's additive).

    The following may provide a better example. As you know, (.6)^2 + (.8)^2 = 1, which is only a multiple of the well known Pythagorean Triangle (3,4,5). Visualize a right triangle with base .6, height .8, and hypotenuse 1. Let's say for concreteness that the hypotenuse represents some statistical measure that we are trying to learn about, e.g., intelligence; and suppose that the horizontal leg represents Nurture and the vertical leg represents Nature, ( a supposition doubtless agreeable in spirit to the authors of "The Bell Curve"). Also assume that these factors are the ONLY causal sources of intelligence. Then we might say that "Nurture is responsible for 36% of intelligence and Nature for the remaining 64%".

    Moreover, you may visualize the length .6 as being projected onto the hypotenuse. By similar triangles, the length of this projection is .36. Likewise, the length of the projection of .8 is .64. In terms of Pythagoras, then, we may conclude, A CORRELATION IS THE LENGTH OF A COMPONENT VECTOR!

    (A three-factor example: 2^2 + 2^2 + 1^2 = 3^2, or "two-thirds squared plus two-thirds squared plus one-third squared equals one". Suppose that the correlation between lung cancer and smoking is one-third. Then you may conclude that one-ninth of the variation is explained by "smoking", and that eight-ninths of the variation remains "unexplained". It might even be permissible in this case to say that "smoking is eleven percent due to lung cancer and eighty-nine percent due to unknown (individual) factors". Or, that "lung cancer is 11% smoking", much as people say that intelligence is sixty-four percent heredity, i.e., .8 correlated with the genes.)

  7. Slope with Normalized Coordinates.

    On page 563 of "The Bell Curve" another way to obtain the same correlation coefficient is explained. You first "normalize" the data. That is, each value (say someone's height or weight) is translated into what in statistics is called a "z score". This only means that it is expressed in standard deviations. (Example: if your IQ is 140, this gives you a z-score of 2.5, since IQ usually has a standard deviation of 16 points.)

    After all the data has been normalized (I consider "data" to be a mass noun, thank you), the two-variable data may be plotted in the usual way. Now by a mathematical miracle, the slope of the "best fit" line is less than one! Moreover, the very slope itself also equals the product- moment correlation (i.e., what usually we are referring to when speaking technically).

    This is very cute mathematically, but isn't helpful intuitively to me because no unique "best fit" line is visually apparent when examining a plot. (And if one were, to someone who had enough experience, I suppose, then the opposite line, with reciprocal slope, would have to be equally apparent.)