OMTEX CLASSES: CORRELATION ANALYSIS

### CORRELATION ANALYSIS

So far we have studied problems relating to one variable only.  They are called univariate distributions.  In the preceding chapters, we have discussed the measures of averages like mean, median, mode, quartiles, etc., and measures of dispersion like Quartile Deviation, Mean Deviation, Standard Deviation, etc., for a univariate distribution.  However, more often we are required to study the behaviour or relationship between two or more variables.  For example, the price and demand, supply and production, income and expenditure of a given family, import and export of a commodity, etc.,

Correlation analysis deals with the association between two or more variables.  The degree of relationship between two or more variables is called correlation.  If there are only two variables, say X and Y, then it is called linear correlation.  If there are more than two variables involved in correlation, then it is called multiple correlations.  For example, in agricultural experiment, the yield obtained depends upon many factors like quality of seed, irrigation facilities, fertility of soil, manure which applied, pesticide applied, etc.,

In partial correlation, only two variables are studied after eliminating the effect of the other variables.  e.g. correlation between yield of crops and rainfall by eliminating effect of temperature both yield and rainfall.  In linear correlations, we are dealing with two variables known as cause variable (X) and effect variable (Y).  if the two variables vary together in the same direction or opposite directions, they are said to be correlated.  If as X increases, Y increases consistently, we say X and Y are positively correlated.  Some variables are negatively correlated, in which as X increases, Y decreases and as X decreases Y increases.  e.g. Price increases, the demand decreases.  If the change in one variable is proportional to the change in the other, the two variables are said to be perfectly correlated.  Therefore, whether correlation is positive (direct) or negative (inverse) would depend upon the direction of change of the variables.

Study of correlation analysis is of great importance in practical problems, especially in a business, because of the following reasons.

In a business or in a medical science or in social-economic problems, most of the pairs of variables show some kind of relationship with the help of correlation analysis, we can measure the relationship by finding its coefficient.

Once the measurement of correlation is obtained, any business executive can estimate the likely value of a dependent variable for a particular known value of independent variable.  This can be achieved with the help of regression analysis.  We will discuss this in the next chapter.

Correlation analysis helps businessmen, economists to analyze the side effects due to change in one variable and also it gives guidelines to them regarding the effect of the change on the other variable.
There are various methods for getting the degree of relationship between the variables.  Generally there are 4 methods depending upon the nature of the data.  They are

1.        Scatter Diagram

2.        Correlation Graph

3.        Correlation Table

4.        Coefficient of Correlation

# SCATTER DIAGRAM

It is one of the simplest ways of diagrammatic representation of a bivariate distribution and it provides us one of the simplest tools of ascertaining the correlation between two variables.  Suppose we are given n pairs of values (x­1, y1),  (x2 , y2), (x3, y3) ……… (xn , yn)     of the two variables X and Y.  After plotting the given set of values as points on a graph paper, we can study the nature of the diagram.  Then a straight line can be drawn by inspection, which seems to be the best fit for the given set of points.  Some points will lie on the line and the others will be near the line.  While drawing the line, care has to be taken about the number of points above and below the line, which should be approximately same.

The pairs of values of X and Y are represented by points plotted on a graph paper.  The graph so obtained is called a Scatter Diagram.  By studying the diagram, the following conclusions can be drawn about the correlation.
If all the plotted points lie on a straight line as shown in Fig. 1 , then the correlation is perfect positive.

If the points cluster around and they  ascend from lower left hand corner to the upper right hand corner then there is positive correlation.  It is shown in Fig. 2.

If all the points lie on a straight line starting from upper left hand corner to lower corner the correlation is said to the perfect and negative.  Fig. 3 depicts this type of correlation.

If the points tend to fall along a direction from upper left hand corner to lower right hand corner then there is negative correlation.  This is shown in figure Fig. 4.

If the points are scattered over the graph paper such that no definite conclusion can be drawn about the direction of the points then there is absence of correlation.

Scatter diagram is an important step in analyzing correlation.  When amount of data is limited, a scatter diagram is easy to make manually.  It portrays the joint behaviour of the two variables.  But it gives only the direction of correlation.  It does not give us the numerical measure of the degree of correlation.

## COEFFICIENT OF CORRELATION

The three methods mentioned above give us the direction of correlation if it exists.  But we also require exact numerical measurement for the degree or extent of correlation.  It is useful to have a numerical measure, which is independent of the units of the original data, so that the two variables can be compared.  For this we calculate coefficient of correlation.  Its value always lies between –1 and +1.  The sign of the correlation coefficient indicates whether the variables are related positively or negatively, and the value indicates the degree of relationship.

Definition

The coefficient of correlation denoted by “ r ” and named after Karl Pearson  is defined as

The value of r always lies between –1 and +1.
1. If 0 < r < 1, the correlation is positive.
2. If r = 1, the correlation is perfect positive.
3. If  - 1 < r < 0, then the correlation is negative.
4. If r = - 1, the correlation is perfect negative.
If there is no correlation between the two variables, r = 0 but the converse is not true.

EXERCISE:

1.     The following data represents the time in weeks (X) and the output in thousand units (Y).  Find the coefficient of correlation.
 x: 7 5 4 11 10 12 14 9 y: 14 8 8 19 16 19 20 16
2.      Find the coefficient of correlation for the following data:
 x: 14 8 10 11 9 13 5 y: 14 9 11 13 11 12 4
3.      Find the coefficient of correlation for the following data representing cost in Rs. (X) and sales in Rs. (Y) of a product for a period of eight years.
 x: 84 80 92 85 95 90 83 87 y: 115 104 122 116 125 120 112 120
4.      Calculate the coefficient of correlation between marks in Economics (X) and marks in Accountancy (Y) of a group of 10 students.
 x: 53 47 42 60 63 52 57 55 61 48 y: 72 61 62 85 80 65 79 75 84 73
5.      Calculate the coefficient of correlation between X and Y.
 x: 5 8 10 12 15 18 21 24 25 6 y: 25 21 20 18 16 15 14 12 11 24
6.      The distribution of marks in Advertising (x) and marks in Business Planning (y) for a group of ten students is given below:  Calculate product moment coefficient of correlation.
 x: 25 20 17 16 20 14 23 21 15 12 y: 24 17 22 18 20 18 24 20 16 14
7.      The following data gives the experience (x) in years of eight machine operators and their performance ratings (y).  Calculate the coefficient of correlation.
 x: 16 13 17 4 3 11 7 14 y: 88 87 89 72 70 82 78 84
8.      Find the Pearson’s coefficient of correlation for the following data:
 x: 140 138 126 132 135 131 137 142 y: 122 140 118 119 132 125 145 150
9.           Find the coefficient of correlation for the following data:
 x: 53 59 72 43 93 35 55 80 y: 35 49 63 36 75 28 38 71
10.    Calculate the coefficient of correlation from the following data:
 x: 20 22 18 17 10 25 7 15 y: 15 17 16 10 5 19 4 8
11.     Calculate the coefficient of correlation for the following data of heights in cms. (x) and weights in kgs. (y) of a group of 10 students.
 x: 159 163 165 162 158 160 165 167 168 170 y: 51 57 58 50 49 54 55 56 58 57
12.     Below are the heights in cms. (x) and weights in kgs. (y) of a group of children.  Find the coefficient of correlation.
 x: 130 128 132 135 140 142 137 139 y: 31 30 36 32 41 40 35 34
13.     Calculate the product moment coefficient of correlation.
 x: 212 214 205 220 225 214 218 y: 500 515 577 530 522 516 525
14.     Find the Pearson’s coefficient of correlation from the following data.
 x: 10 2 5 7 9 4 8 y: 8 4 4 8 5 3 7
15.     Find the coefficient of correlation between the marks in Mathematics and Physics from the following data.
 x: 40 37 90 85 67 75 80 52 80 y: 50 40 80 85 75 80 85 65 85

### RANK CORRELATION

In certain types of characteristics it is not possible to get numerical measurements; but we can rank the individuals in order according to our own judgement.  e.g., smartness, beauty, talent, etc.,  If two persons rank a given group of individuals and we have to find how far the two judges agree with each other, the technique of rank correlation can be used.  In some cases though actual measurements are available we may still be interested in only ranks, that is, the relative position of an individual in the group.   Here also rank correlation is used.

The formula for Spearman’s coefficient of Rank Correlation is

If two or more observations have the same value then common rank by considering the average can be given to all repeated values.  Here a correction factor is to be added to  Î£ d2  while calculating the rank correlation coefficient.

This correction factor must be added to every repeating value in the data.  Finally the calculation for the coefficient of Rank Correlation remains the same after calculating Î£ d2

EXERCISE:
1.         Calculate the coefficient of rank correlation for the following data giving working capital in lakhs of Rs. (x) and profit in thousands of Rs. (y) of 10 companies for the year 2003.
 x: 15 32 25 30 35 20 19 22 27 31 y: 50 70 65 72 90 58 53 57 68 74
2.         Calculate Spearman’s rank correlation coefficient for the following data.
 x: 105 112 107 115 160 152 148 132 y: 120 127 135 123 140 142 138 110
3.         Quotations of index numbers of security prices of debentures of a certain joint stock company and of prices of preference shares for the years 1995 – 2002 are given below.  Use the method of rank correlation to determine the relationship between debentures and share prices.
 Year: 1995 1996 1997 1998 1999 2000 2001 2002 Debenture 97.8 99.2 98.8 98.3 98.4 96.7 97.6 97.1 Share Price 78.9 85.8 81.2 83.8 84.2 80.1 80.6 77.6
4.         Find Spearman’s coefficient of correlation for the following data representing the exports (x) and local sales (y), both expressed in lakhs of Rs. of fashion garments for 10 years.
 x: 12 15 13 20 15 14 19 13 21 18 y: 25 21 15 18 20 17 20 16 20 22
5.         Calculate the rank correlation coefficient between age of husband (x) and age of wife (y), both expressed in years, from the following data.
 x: 60 30 37 30 42 37 55 45 y: 50 25 33 27 40 33 50 42
6.         Calculate rank correlation coefficient for the following data showing respectively the marks in Economics (x) and marks in English (y).
 x: 56 37 65 60 54 51 40 70 y: 50 42 55 48 51 53 38 47
7.         Find the Spearman’s coefficient of correlation for the following data.
 x: 33 37 42 23 21 15 13 30 39 y: 17 27 32 12 13 11 9 25 30
8.         Find the rank correlation coefficient for the following data representing marks in terminal (x) and the marks in Final examination for a group of 10 students.
 x: 52 33 47 65 43 33 54 66 75 70 y: 65 59 72 72 82 60 57 58 72 90
9.         Find rank correlation coefficient.
 x: 84 89 72 75 90 62 62 78 y: 65 75 58 65 75 54 51 57
10.       Calculate Spearman’s rank correlation coefficient for the following data.
 x: 101 113 83 109 101 97 83 95 90 117 y: 53 59 52 57 59 50 54 58 59 61
11.       Find the rank correlation coefficient for the following data.
 x: 64 72 70 85 64 90 60 85 89 54 y: 47 43 29 47 25 52 47 50 51 20
12.       The marks obtained by 10 students are as follows:  Calculate the coefficient of rank correlation.
 x: 90 88 90 76 88 62 98 90 70 76 y: 61 58 64 73 73 78 58 82 58 67
13.       The ranks of 10 students in three subjects A, B and C are given below.  Find the rank correlation coefficient for each of the three possible pairs and comment on the result.
 Student No: 1 2 3 4 5 6 7 8 9 10 Rank in A: 1 3 4 2 5 10 8 6 7 9 Rank in B: 3 5 1 2 6 10 4 9 7 8 Rank in C: 2 3 5 1 4 9 6 7 8 10
[ Answer: Coefficient of Rank correlation between A & B = 0.7333
Coefficient of Rank correlation between B & C =  0.7576
Coefficient of Rank correlation between A & C =  0.9273
Hence,   there is maximum correlation between subjects A and C ]
14.       Three judges gave the following ranks to eight participants in a personality contest.  Calculate coefficient of rank correlation for each of the three possible pairs and decide which pair of judges has the most common approach.
 Candidate No: 1 2 3 4 5 6 7 8 Rank by Judge A: 7 6 5 8 3 1 2 4 Rank by Judge B: 6 8 4 7 1 2 4 5 Rank by Judge C: 4 5 6 7 3 1 2 8
[ Answer: Coefficient of Rank correlation between A & B = 0.7976
Coefficient of Rank correlation between B & C =  0.5833
Coefficient of Rank correlation between A & C =  0.6667
Hence,    there is maximum correlation between Judges A and B ]
15.       Three Judges X, Y, Z in a painting competition judged the contestants as follows: Calculate coefficient of rank correlation for each of the three possible pairs and decide which pair of judges has the most common approach.
 Contestant No: 1 2 3 4 5 6 7 8 Rank by Judge X: 1 2 3 5 4 6 7 8 Rank by Judge Y: 2 4 1 3 8 5 6 7 Rank by Judge Z: 1 3 2 5 4 8 7 6

COEFFICIENT OF CORRELATION FOR A BI-VARIATE FREQUENCY DISTRIBUTION

The coefficient of correlation in the case of a bi-variate frequency data, is calculated using the following formula;

EXERCISE:
1.      The following table gives bivariate frequency distribution of 50 students according to age in years and height in cms.  Calculate the Pearson’s coefficient of correlation.
 Age in Years: Height in Centimetres 144 – 148 148 – 152 152 – 156 156 – 160 Total 10 – 12 7 2 - - 9 12 – 14 3 5 3 3 14 14 – 16 - 3 8 6 17 16 – 18 - - 5 5 10 Total 10 10 16 14 50
2.      Calculate the coefficient of correlation for the following data expressing the service in years and the salary in Rs. of 50 employees of a firm.
 Service in Years: Salary in Rupees. 1000-1500 1500-2000 2000-2500 2500-3000 3000-3500 0 – 5 5 2 - - - 5 – 10 3 3 6 - - 10 – 15 - 3 4 - - 15 – 20 - - 4 5 6 20 – 25 - - - 3 6
3.      The following table represents height in cms.  and weights in kgs.  of a group of 25 boys.  Calculate the product moment coefficient of correlation.
 Weight in Kgs: Height in Centimetres 150 – 155 155 – 160 160 – 165 165 – 170 170 – 175 50 – 54 2 2 - - - 54 – 58 1 1 2 - - 58 – 62 - - 3 4 - 62 – 66 - - 1 3 3 66 – 70 - - - 1 2
4.      The following table represents food expenditure and family income of a few families.  Calculate the coefficient of correlation.
 Family Income in Rs: Food Expenditure in percentage: 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 1500 – 2000 3 3 - - - 2000 – 2500 2 2 3 - - 2500 – 3000 - 2 2 3 - 3000 – 3500 - - 3 2 2 3500 – 4000 - - - 2 1