CORRELATION ANALYSIS


So far we have studied problems relating to one variable only.  They are called univariate distributions.  In the preceding chapters, we have discussed the measures of averages like mean, median, mode, quartiles, etc., and measures of dispersion like Quartile Deviation, Mean Deviation, Standard Deviation, etc., for a univariate distribution.  However, more often we are required to study the behaviour or relationship between two or more variables.  For example, the price and demand, supply and production, income and expenditure of a given family, import and export of a commodity, etc.,


Correlation analysis deals with the association between two or more variables.  The degree of relationship between two or more variables is called correlation.  If there are only two variables, say X and Y, then it is called linear correlation.  If there are more than two variables involved in correlation, then it is called multiple correlations.  For example, in agricultural experiment, the yield obtained depends upon many factors like quality of seed, irrigation facilities, fertility of soil, manure which applied, pesticide applied, etc.,


In partial correlation, only two variables are studied after eliminating the effect of the other variables.  e.g. correlation between yield of crops and rainfall by eliminating effect of temperature both yield and rainfall.  In linear correlations, we are dealing with two variables known as cause variable (X) and effect variable (Y).  if the two variables vary together in the same direction or opposite directions, they are said to be correlated.  If as X increases, Y increases consistently, we say X and Y are positively correlated.  Some variables are negatively correlated, in which as X increases, Y decreases and as X decreases Y increases.  e.g. Price increases, the demand decreases.  If the change in one variable is proportional to the change in the other, the two variables are said to be perfectly correlated.  Therefore, whether correlation is positive (direct) or negative (inverse) would depend upon the direction of change of the variables.


Study of correlation analysis is of great importance in practical problems, especially in a business, because of the following reasons.

In a business or in a medical science or in social-economic problems, most of the pairs of variables show some kind of relationship with the help of correlation analysis, we can measure the relationship by finding its coefficient.

Once the measurement of correlation is obtained, any business executive can estimate the likely value of a dependent variable for a particular known value of independent variable.  This can be achieved with the help of regression analysis.  We will discuss this in the next chapter.

Correlation analysis helps businessmen, economists to analyze the side effects due to change in one variable and also it gives guidelines to them regarding the effect of the change on the other variable.
There are various methods for getting the degree of relationship between the variables.  Generally there are 4 methods depending upon the nature of the data.  They are

1.        Scatter Diagram

2.        Correlation Graph

3.        Correlation Table


4.        Coefficient of Correlation

SCATTER DIAGRAM

It is one of the simplest ways of diagrammatic representation of a bivariate distribution and it provides us one of the simplest tools of ascertaining the correlation between two variables.  Suppose we are given n pairs of values (x­1, y1),  (x2 , y2), (x3, y3) ……… (xn , yn)     of the two variables X and Y.  After plotting the given set of values as points on a graph paper, we can study the nature of the diagram.  Then a straight line can be drawn by inspection, which seems to be the best fit for the given set of points.  Some points will lie on the line and the others will be near the line.  While drawing the line, care has to be taken about the number of points above and below the line, which should be approximately same.

The pairs of values of X and Y are represented by points plotted on a graph paper.  The graph so obtained is called a Scatter Diagram.  By studying the diagram, the following conclusions can be drawn about the correlation.
If all the plotted points lie on a straight line as shown in Fig. 1 , then the correlation is perfect positive.


If the points cluster around and they  ascend from lower left hand corner to the upper right hand corner then there is positive correlation.  It is shown in Fig. 2.



If all the points lie on a straight line starting from upper left hand corner to lower corner the correlation is said to the perfect and negative.  Fig. 3 depicts this type of correlation.


If the points tend to fall along a direction from upper left hand corner to lower right hand corner then there is negative correlation.  This is shown in figure Fig. 4.


If the points are scattered over the graph paper such that no definite conclusion can be drawn about the direction of the points then there is absence of correlation.



Scatter diagram is an important step in analyzing correlation.  When amount of data is limited, a scatter diagram is easy to make manually.  It portrays the joint behaviour of the two variables.  But it gives only the direction of correlation.  It does not give us the numerical measure of the degree of correlation.

COEFFICIENT OF CORRELATION

The three methods mentioned above give us the direction of correlation if it exists.  But we also require exact numerical measurement for the degree or extent of correlation.  It is useful to have a numerical measure, which is independent of the units of the original data, so that the two variables can be compared.  For this we calculate coefficient of correlation.  Its value always lies between –1 and +1.  The sign of the correlation coefficient indicates whether the variables are related positively or negatively, and the value indicates the degree of relationship.

Definition

The coefficient of correlation denoted by “ r ” and named after Karl Pearson  is defined as 

The value of r always lies between –1 and +1.
  1. If 0 < r < 1, the correlation is positive.
  2. If r = 1, the correlation is perfect positive.
  3. If  - 1 < r < 0, then the correlation is negative.
  4. If r = - 1, the correlation is perfect negative.
If there is no correlation between the two variables, r = 0 but the converse is not true.

EXERCISE:

1.     The following data represents the time in weeks (X) and the output in thousand units (Y).  Find the coefficient of correlation.
x:
7
5
4
11
10
12
14
9
y:
14
8
8
19
16
19
20
16
                                                                                                [ Answer: 0.9635 ]
2.      Find the coefficient of correlation for the following data:
x:
14
8
10
11
9
13
5
y:
14
9
11
13
11
12
4
                                                                                                [ Answer: 0.9231 ]
3.      Find the coefficient of correlation for the following data representing cost in Rs. (X) and sales in Rs. (Y) of a product for a period of eight years.
x:
84
80
92
85
95
90
83
87
y:
115
104
122
116
125
120
112
120
                                                                                                [ Answer: 0.9358 ]
4.      Calculate the coefficient of correlation between marks in Economics (X) and marks in Accountancy (Y) of a group of 10 students.
x:
53
47
42
60
63
52
57
55
61
48
y:
72
61
62
85
80
65
79
75
84
73
                                                                                                [ Answer: 0.8831 ]
5.      Calculate the coefficient of correlation between X and Y.
x:
5
8
10
12
15
18
21
24
25
6
y:
25
21
20
18
16
15
14
12
11
24
                                                                                                [ Answer: - 0.9828 ]
6.      The distribution of marks in Advertising (x) and marks in Business Planning (y) for a group of ten students is given below:  Calculate product moment coefficient of correlation.
x:
25
20
17
16
20
14
23
21
15
12
y:
24
17
22
18
20
18
24
20
16
14
                                                                                                [ Answer: 0.8168]
7.      The following data gives the experience (x) in years of eight machine operators and their performance ratings (y).  Calculate the coefficient of correlation.
x:
16
13
17
4
3
11
7
14
y:
88
87
89
72
70
82
78
84
                                                                                                [ Answer: 0.9803]
8.      Find the Pearson’s coefficient of correlation for the following data:
x:
140
138
126
132
135
131
137
142
y:
122
140
118
119
132
125
145
150
                                                                                                [ Answer: 0.7043 ]
9.           Find the coefficient of correlation for the following data:
x:
53
59
72
43
93
35
55
80
y:
35
49
63
36
75
28
38
71
                                                                                       [ Answer: 0.9676 ]
10.    Calculate the coefficient of correlation from the following data:
x:
20
22
18
17
10
25
7
15
y:
15
17
16
10
5
19
4
8
                                                                                                [ Answer: 0.9553 ]
11.     Calculate the coefficient of correlation for the following data of heights in cms. (x) and weights in kgs. (y) of a group of 10 students.
x:
159
163
165
162
158
160
165
167
168
170
y:
51
57
58
50
49
54
55
56
58
57
                                                                                                [ Answer: 0.7940 ]
12.     Below are the heights in cms. (x) and weights in kgs. (y) of a group of children.  Find the coefficient of correlation.
x:
130
128
132
135
140
142
137
139
y:
31
30
36
32
41
40
35
34
                                                                                                [ Answer: 0.8020 ]
13.     Calculate the product moment coefficient of correlation.                 
x:
212
214
205
220
225
214
218
y:
500
515
577
530
522
516
525
                                                                                                [ Answer: 0.6683 ]
14.     Find the Pearson’s coefficient of correlation from the following data.
x:
10
2
5
7
9
4
8
y:
8
4
4
8
5
3
7
                                                                                    [ Answer: 0.7352 ]
15.     Find the coefficient of correlation between the marks in Mathematics and Physics from the following data.
x:
40
37
90
85
67
75
80
52
80
y:
50
40
80
85
75
80
85
65
85
                                                                                    [ Answer: 0.95 ]



RANK CORRELATION


In certain types of characteristics it is not possible to get numerical measurements; but we can rank the individuals in order according to our own judgement.  e.g., smartness, beauty, talent, etc.,  If two persons rank a given group of individuals and we have to find how far the two judges agree with each other, the technique of rank correlation can be used.  In some cases though actual measurements are available we may still be interested in only ranks, that is, the relative position of an individual in the group.   Here also rank correlation is used.

The formula for Spearman’s coefficient of Rank Correlation is 


If two or more observations have the same value then common rank by considering the average can be given to all repeated values.  Here a correction factor is to be added to  Σ d2  while calculating the rank correlation coefficient.  


This correction factor must be added to every repeating value in the data.  Finally the calculation for the coefficient of Rank Correlation remains the same after calculating Σ d2 

EXERCISE:
1.         Calculate the coefficient of rank correlation for the following data giving working capital in lakhs of Rs. (x) and profit in thousands of Rs. (y) of 10 companies for the year 2003.
x:
15
32
25
30
35
20
19
22
27
31
y:
50
70
65
72
90
58
53
57
68
74
                                                                                                [ Answer: 0.9515 ]
2.         Calculate Spearman’s rank correlation coefficient for the following data.   
x:
105
112
107
115
160
152
148
132
y:
120
127
135
123
140
142
138
110
                                                                                                [ Answer: 0.5394 ]
3.         Quotations of index numbers of security prices of debentures of a certain joint stock company and of prices of preference shares for the years 1995 – 2002 are given below.  Use the method of rank correlation to determine the relationship between debentures and share prices.
Year:
1995
1996
1997
1998
1999
2000
2001
2002
Debenture
97.8
99.2
98.8
98.3
98.4
96.7
97.6
97.1
Share Price
78.9
85.8
81.2
83.8
84.2
80.1
80.6
77.6
                                                                                                [ Answer: 0.8095 ]
4.         Find Spearman’s coefficient of correlation for the following data representing the exports (x) and local sales (y), both expressed in lakhs of Rs. of fashion garments for 10 years.
x:
12
15
13
20
15
14
19
13
21
18
y:
25
21
15
18
20
17
20
16
20
22
                                                                                                [ Answer: 0.1333 ]
5.         Calculate the rank correlation coefficient between age of husband (x) and age of wife (y), both expressed in years, from the following data.
x:
60
30
37
30
42
37
55
45
y:
50
25
33
27
40
33
50
42
                                                                                                [ Answer: 0.9643 ]
6.         Calculate rank correlation coefficient for the following data showing respectively the marks in Economics (x) and marks in English (y).
x:
56
37
65
60
54
51
40
70
y:
50
42
55
48
51
53
38
47
                                                                                                [ Answer: 0.381 ]
7.         Find the Spearman’s coefficient of correlation for the following data.
x:
33
37
42
23
21
15
13
30
39
y:
17
27
32
12
13
11
9
25
30
                                                                                                [ Answer: 0.9667 ]
8.         Find the rank correlation coefficient for the following data representing marks in terminal (x) and the marks in Final examination for a group of 10 students.
x:
52
33
47
65
43
33
54
66
75
70
y:
65
59
72
72
82
60
57
58
72
90
                                                                                                [ Answer: 0.2303 ]
9.         Find rank correlation coefficient.
x:
84
89
72
75
90
62
62
78
y:
65
75
58
65
75
54
51
57
                                                                                                [ Answer: 0.881 ]
10.       Calculate Spearman’s rank correlation coefficient for the following data.   
x:
101
113
83
109
101
97
83
95
90
117
y:
53
59
52
57
59
50
54
58
59
61
                                                                                                [ Answer: 0.5212 ]
11.       Find the rank correlation coefficient for the following data.
x:
64
72
70
85
64
90
60
85
89
54
y:
47
43
29
47
25
52
47
50
51
20
                                                                                                [ Answer: 0.7677 ]
12.       The marks obtained by 10 students are as follows:  Calculate the coefficient of rank correlation.
x:
90
88
90
76
88
62
98
90
70
76
y:
61
58
64
73
73
78
58
82
58
67
                                                                                                [ Answer: -0.20909 ]
13.       The ranks of 10 students in three subjects A, B and C are given below.  Find the rank correlation coefficient for each of the three possible pairs and comment on the result.
Student No:
1
2
3
4
5
6
7
8
9
10
Rank in A:
1
3
4
2
5
10
8
6
7
9
Rank in B:
3
5
1
2
6
10
4
9
7
8
Rank in C:
2
3
5
1
4
9
6
7
8
10
                                    [ Answer: Coefficient of Rank correlation between A & B = 0.7333
                                                    Coefficient of Rank correlation between B & C =  0.7576
                                                    Coefficient of Rank correlation between A & C =  0.9273
                                              Hence,   there is maximum correlation between subjects A and C ]
14.       Three judges gave the following ranks to eight participants in a personality contest.  Calculate coefficient of rank correlation for each of the three possible pairs and decide which pair of judges has the most common approach.
Candidate No:
1
2
3
4
5
6
7
8
Rank by Judge A:
7
6
5
8
3
1
2
4
Rank by Judge B:
6
8
4
7
1
2
4
5
Rank by Judge C:
4
5
6
7
3
1
2
8
[ Answer: Coefficient of Rank correlation between A & B = 0.7976
                                                    Coefficient of Rank correlation between B & C =  0.5833
                                                    Coefficient of Rank correlation between A & C =  0.6667
                                              Hence,    there is maximum correlation between Judges A and B ]
15.       Three Judges X, Y, Z in a painting competition judged the contestants as follows: Calculate coefficient of rank correlation for each of the three possible pairs and decide which pair of judges has the most common approach.
Contestant No:
1
2
3
4
5
6
7
8
Rank by Judge X:
1
2
3
5
4
6
7
8
Rank by Judge Y:
2
4
1
3
8
5
6
7
Rank by Judge Z:
1
3
2
5
4
8
7
6



COEFFICIENT OF CORRELATION FOR A BI-VARIATE FREQUENCY DISTRIBUTION

The coefficient of correlation in the case of a bi-variate frequency data, is calculated using the following formula;

EXERCISE:
1.      The following table gives bivariate frequency distribution of 50 students according to age in years and height in cms.  Calculate the Pearson’s coefficient of correlation.
Age in Years:
Height in Centimetres
144 – 148
148 – 152
152 – 156
156 – 160
Total
10 – 12
7
2
-
-
9
12 – 14
3
5
3
3
14
14 – 16
-
3
8
6
17
16 – 18
-
-
5
5
10
Total
10
10
16
14
50
                                                                                                [ Answer: 0.6974 ]
2.      Calculate the coefficient of correlation for the following data expressing the service in years and the salary in Rs. of 50 employees of a firm.
Service in Years:
Salary in Rupees.
1000-1500
1500-2000
2000-2500
2500-3000
3000-3500
0 – 5
5
2
-
-
-
5 – 10
3
3
6
-
-
10 – 15
-
3
4
-
-
15 – 20
-
-
4
5
6
20 – 25
-
-
-
3
6
                                                                                                [ Answer: 0.8542 ]
3.      The following table represents height in cms.  and weights in kgs.  of a group of 25 boys.  Calculate the product moment coefficient of correlation.
Weight in Kgs:
Height in Centimetres
150 – 155
155 – 160 
160 – 165 
165 – 170
170 – 175
50 – 54
2
2
-
-
-
54 – 58
1
1
2
-
-
58 – 62
-
-
3
4
-
62 – 66
-
-
1
3
3
66 – 70
-
-
-
1
2
                                                                                                [ Answer: 0.8547 ]
4.      The following table represents food expenditure and family income of a few families.  Calculate the coefficient of correlation.
Family Income in Rs:
Food Expenditure in percentage:
10 – 15
15 – 20
20 – 25
25 – 30
30 – 35
1500 – 2000
3
3
-
-
-
2000 – 2500
2
2
3
-
-
2500 – 3000
-
2
2
3
-
3000 – 3500
-
-
3
2
2
3500 – 4000
-
-
-
2
1
                                                                                                [ Answer: 0.7897 ]
5.      The following data represents the sales in lakhs of Rs.  and profit in thousands of Rs. of sixty six companies.  Find the coefficient of correlation.
Profits in thousands of Rs.
Sales in lakhs of Rs.
50 – 60
60 – 70
70 – 80
80 – 90
90 – 100
50 – 55
1
3
1
-
-
55 – 60
4
7
2
5
-
60 – 65
3
5
4
10
6
65 – 70
-
1
3
7
2
70 – 75
-
-
2
-
-
                                                                                                [ Answer: 0.4072 ]
6.      Calculate the Karl Pearson’s coefficient of correlation for the following distribution.  
Marks in Civics
Marks in History
0 – 10
10 – 20
20 – 30
30 – 40
40 – 50
0 – 10
2
1
-
-
-
10 – 20
4
3
2
-
-
20 – 30
3
2
2
3
1
30 – 40
-
1
1
2
1
40 – 50
-
-
-
2
-
                                                                                                [ Answer: 0.6133 ]
7.      Calculate the product moment coefficient of correlation.
Income in Rs.
Savings in Rs.
0 – 400
400 – 800
800 – 1200
1200 – 1600
1600 – 2000
2500 – 3000
5
3
-
-
-
3000 – 3500
3
4
4
-
-
3500 – 4000
-
3
5
3
1
4000 – 4500
-
-
3
2
2
4500 – 5000
-
-
1
1
1
                                                                                                [ Answer: 0.7738 ]
8.      A firm administers a test to sales trainees before they go into the field.  The management of the firm is interested in determining the relationship between the test scores and the sales made by the trainees at the end of one year in the field.  The following data were collected for 50 sales personnel who had been in the field for one year.  Calculate the coefficient of correlation.
Test Score
Sales in thousands of Rs.
10 – 12
12 – 14
14 – 16
16 – 18
60 – 70
2
3
-
-
70 – 80
3
4
2
-
80 – 90
-
7
12
2
90 – 100
-
-
8
7
                                                                                                [ Answer: 0.73 ]