Chapter : 1 INTRODUCTION TO STATISTICS



The word ‘statistics’ seems to have been derived from the Latin word ‘Status’ or Italian word ‘Statista’ or the German word ‘Statistik’ or the French word ‘Statistique’ each of which means a political state.  It is not a new discipline, but is as old as the human society.  In good old days, the term statistics was applied to a branch of statecraft – science of statecraft.  As such, the term statistics was applied to mean facts and figures which were needed by the state in its day to day life.  Statistics was regarded as a by-product of administrative activities of the State.  Now statistics is usually not studied for its own sake (as a separate branch), but statistics is employed as a tool in solving or analyzing the problems of the State.
In the present age, statistics is regarded as one of the most important tools for taking decisions.  All the branches of science make use of statistics.  Statistics helps in forming suitable policies; as such it is being used in all the fields.  In science, statistics is freely used.  In research work, it has got its own status as a tool of research.  Thus in every situation there is a demand for statistics.  The sampling techniques further reduce the cost of statistics.  This is because by studying a part of the population, the characteristics of the whole population can be known.  Thus the increasing demand and decreasing cost of statistics give way to growth.
Planning and control are the twin-babies of management.  Whenever we think of a plan we have to think of statistics.  Planning cannot be devised without statistics.  In this technically advanced and competitive world, a producer has to make a number of decisions such as what to produce, where to produce, how to produce, where to sell, at what price to sell etc.  Such decisions depend upon sound forecasting and forecasting cannot be made without statistics.  Prof. Marshall observed that “statistics are the straw out of which I, like any other economist, have had to make bricks”.  Statistics helps in formulating suitable policies and as such its need is increasingly felt in all the fields.  A businessman needs information on daily demand of the products, seasonal changes in demand, prices of competitive products etc.  All these problems are resolved in the light of factual information and hence the need for statistics.
By statistics we mean aggregates of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other.”    -  Horacesecrist
Numerical data alone constitute statistics.  Students can be classified very good, good, average, poor, etc. on the basis of their performance in tests.  But they are in qualitative expressions and are not statistics.  In particular, the qualitative characteristics – honesty, beauty, intelligence, etc. which cannot be measured numerically are not statistics.  If they are expressed by giving certain scores (marks) as numerical standards, then they can be called as statistics.  Another example is beauty competition of girls; if ranks are assigned, then the quantitative measure of beauty of the girls can be regarded as statistics.
The numerical data pertaining to any field of enquiry can be obtained either by enumeration (by actual counting) or by estimation.  If the field of enquiry is not large, enumeration (actual counting) can be conducted.  If the field of enquiry is wide and large, enumeration is out of question; and in such cases, data can be estimated.  For instance, in the MBA class there are 60 students; this is a case of enumeration.  (We count the number of students).  At the same time we may say that 1,00,000 people attended the Independence Day Celebration;  it is a case of estimation (approximation).
A reasonable standard of accuracy is needed in both enumeration and estimation.  For instance, if the weights of students are being measured, fractions of kilogram (say 1/10th or 1/20th ) can be ignored; when measuring the distance from Chennai to Kanyakumari, fraction of a kilometer can be easily ignored.  No hard and fast rule can be laid down for all cases.  Hence mathematical accuracy cannot be attained in statistical studies.
“Statistics is the science of estimates and probabilities”
This definition is narrow, as the other methods like enumeration, classification; analysis, etc. have been ignored.  Therefore, this definition narrows down the scope of the science of statistics.
“Statistics may be defined as the collection, presentation, analysis and interpretation of numerical data”.
This definition is clear and concise.  The data are collected to study a particular problem.  The collected data in mass may be converted in the form of diagrams, graphs, etc.  According to this definition, there are four stages.
a.       Collection of data:  The first step of an investigation is the collection of data.  Careful collection is needed, because further analysis is based on this.  There are different methods of collection of data (Census, sampling, primary, secondary etc.) and they must be reliable.  If the collected data are faulty, results will also be faulty.  Therefore, the investigator must take special care in collection.  
b.       Presentation of data:   The collected data are generally in an unintelligible form and need to be classified and tabulated before they can be analyzed.  For example, the investigator is interested to know the average income of 1000 families of a village.  The mass data collected should be difficult to understand and analyze.  Therefore, the collected data are to be presented in tabular or diagrammatic or graphic form.  The data presented in a systematic order will facilitate further analysis.
c.       Analysis of data:  After the presentation of data, the next step is to analyse the presented data. Analysis includes condensation, summarization, conclusion, etc. though the means of measures of central tendencies, dispersion, skewness, kurtosis, correlation, re-gression, etc.
d.       Interpretation of data:   Figures do not speak for themselves.  The duty of the statistician is not complete with mere collection and analysis of data.  But, valid conclusions must be drawn on the basis of analysis.  A high degree of skill and experience is necessary for the interpretation.  Correct interpretation leads to valid conclusion.
Without an adequate understanding of the statistical methods, the investigator in the social sciences may be like the blind man groping in a dark room for a black cat that is not there.  The methods of statistics are useful in an over-widening range of human activities in any field of thought in which numerical data may be had.
The real purpose of statistical methods is to make sense out of facts and figures, to prove the unknown and to cast light upon the situation.
Broadly speaking, one may say that the statistical methods can be fruitfully applied to any problem of decision making where numerical data are available or can be made available.  Therefore, in business, industry and economics;  the statistical techniques are applicable to problems like maintenance to trends of population, production of agricultural industries, prices, internal and external trades, gross national product, taxation laws and rates; preparation of budgets, computation of consumer price indices from time to time o revise the wage structures, preparation of price policies of new products, scheduling of the projects and then exercising control over the operations till the completion, resource allocations for any job carrying out inquiries to know the potential markets, stock control, quality control, maintenance and replacement of equipments etc.
In research and technology, the statistical techniques are used to develop optimum designs of experiments that can be applied to obtain the relevant information with highest precision at minimum cost.  In social sciences, Statistics help in studying the distribution of wealth, intelligence etc.  It is also used in studying the changes in standards of living, food habits and attitudes of people.
Functions of Statistics:
In various fields discussed above and many others, the science of statistics us used to perform the following functions:
  1. Statistics helps in developing sound methods of collecting data so that the data collected can be used to draw the valid inference regarding the desired objectives.
  2. It presents the information in numerical form.
  3. It helps in simplifying the complex data by way of classification / tabulation / graphical representation.
  4. The tabular / graphical representation of data and other complex statistics help in comparison.
  5. Statistics can be used to study the relationship between two or more factors.  The use of such relationship can be made in estimating one factor when other/s are known.
  6. The data regarding a characteristic for a series of past periods can be used to forecast its value for a future period.
  7. The powerful function of forecasting leads us to the need of planning and thus facilitates in formulating policies and helps in planning to implement these policies.
Limitations of Statistics:
Statistics is a very powerful science to study quantitative data.  Qualitative data cannot be studied with the help of Statistics except when we make them to be looked upon a quantitative by defining suitable varieties.
More often than not, Statistics is used to draw conclusions regarding a group of units rather than single unit.  In case of individual units, the inference drawn is always with an element of chance or on an average.
Sometimes due to bias involved in the collection of data the inference drawn is a biased one.
The potential danger involved in the use of statistics is its misuse.  It is easy to misuse it for supporting or contradicting any proposition or a conjuncture.  For instance, a statement like “During the last month, six street accidents were recorded in the middle of the road compared to twenty one accidents recorded on the sides of the road in busy streets of Mumbai” may lead one to the conclusion like “It is safer to walk in the middle of the road”.
SOURCES OF DATA
An application of statistics involves data and therefore the foremost question that arises is from where to get the data or what the sources of data are.
We have seen that collection of the data is always the first step in any statistical enquiry.  Before starting any enquiry, the following concepts must be clearly defined.
POPULATION:
Any finite or infinite aggregation of all possible objects under study, not necessarily animate is called a POPULATION.  In statistical study we may have a population of number of students at the University, number of employees of a company, number of misprints in a book, the production of a factory, the number of cheques cleared in a month etc.
SAMPLE:
Any finite set of objects selected from a population is called a SAMPLE.  The objects included in the sample are representative of the items in the population so that by studying the sample values in detail, an idea about the characteristics of the objects in the population can be obtained.
VARIATE:
A characteristic from the population which can be expressed numerically and which varies from object to object is called a VARIATE.  For example, the wages of persons or the heights of students can be measured quantitatively and so these are variates.
ATTRIBUTE:
Certain characteristic cannot be expressed quantitatively but they can be described qualitatively.  For example, beauty, intelligence, skill, talent, etc.  These are called ATTRIBUTES.
PARAMETER:
A statistical measure like mean, standard deviation, which is calculated for all the objects included in the population is called a PARAMETER.  It is usually expressed by Greek letters like  for mean,  for standard deviation etc.
STATISTIC:
A statistical measure calculated for all units in the sample is called a statistic.  It is expressed by using English alphabets.  For example the sample mean is denoted by  and the sample standard deviation is denoted by S.
The following points must be decided before collection of data begins:
  1. The purpose or the objective of the collection must be precisely defined.  The type of data to be included, the characteristics to be considered, the sources from where the data is to be obtained and the steps to be followed to collect the data – every step should be worked out in advance.
  2. The scope of the enquiry with respect to the time, the places to be covered should be decided first.  There are different types of enquiries like official or non-official, regular or ad-hoc, direct or indirect etc.  The proper type which suites the purpose and the scope should be decided.
  3. The measurement of values of a variable is done in a particular unit which is called Statistical unit.  For example, for incomes of employees, the unit is a rupee.  For heights of persons, centimeter etc.  Along with the unit, the degree of accuracy also should be decided.
After considering the above mentioned points, the type of data whether primary of secondary, is to be decided.
METHODS OF COLLECTING PRIMARY DATA
The primary data is the information collected by an enumerator or investigator for the purpose of the enquiry for the first time.  The following are the methods using which the primary data can be collected.
a.  DIRECT PERSONAL INVESTIGATION:   Here the investigator meets the informants personally and collects the information by asking questions.  The questions should be simple, short and should be so formed as to get brief and unambiguous answers.  The enumerator must be trained, specially hired for the job.  His observation should be keen and he should be well acquainted with the local conditions.  He should possess sufficient knowledge of tastes and preferences of the informants.  The investigator should be polite and courteous yet he should be firm, determined to get answers tactfully from the respondents.
This type of investigation, though very costly and time-consuming is the best method available as far as accuracy concerned.  If the scope of the enquiry is very wide, this method cannot be used.  Also, care has to be taken to avoid personal bias entering the answers of the respondents; otherwise it will affect the validity of the data collected. 
b.   INDIRECT ORAL INVESTIGATION:    If the persons, directly concerned with the investigation are not willing to supply the necessary information, then it is obtained by questioning witnesses who are supposed to know the situation, to have knowledge about the persons concerned or the problem involved.
This method is adopted by Inquiry Committees or Commissions.  It is applicable in those situations where indirect informants can give more reliable and accurate information than the persons involved.  This method can be successful only when the witnesses are honest and are not hostile towards the persons concerned.  They should be able to express themselves precisely, accurately, without exaggerating the situation.  The investigators should be able to judge whether the information provided by the witness is correct and without bias.
c.     QUESTIONNAIRES AND SCHEDULES:   In this method, a list of questions is prepared and it is sent by post to various informants.  Usually, a sample of informants is selected from the concerned population.  Sometimes the schedules are filled in by the enumerators who question the people and write down the necessary information.  If the questionnaire is sent by mail, then a forwarding letter, explaining the objective of the survey and requesting co-operation, should accompany the form.  The advantage in this method is that the respondents can write the answers of the questions as per their convenience and would not hesitate to give some confidential information asked in the questionnaire.  This method has a wide coverage, it is quick and inexpensive.  But still, the response is not very good.  If possible, there should be some incentive like a small prize, lucky number draw, concession at some shops etc. to get better response.  Every questionnaire must be accompanied by an addressed and stamped envelope.
If a schedule is to be filled in by an enumerator, he should be trained, qualified person.  The enumerator should be a person of unquestioned integrity.  He must be patient and tactful with the respondents.  He must explain the purpose of the investigation and also the questions in detail.  While writing the answers, he has to take care that personal bias does not affect the investigation.  The reports of the enumerators should be periodically checked by the supervisors.  Now, let us see how a good questionnaire should be prepared.


REQUISITES OF A GOOD QUESTIONNAIRE OR A SCHEDULE:
1.           The number of questions should be as few as possible but at the same time, the questions should cover all the essential topics on which information is required.
2.           The questions should be short, simple and unambiguous.  Clarity is essential in forming the questions.
3.           The questions should be drafted in such a way that the answers to them are of objective type and brief in nature, for example, the answers printed should be ‘yes’ or ‘no’ or multiple-choice answers of the type ‘single, married, widowed, divorced’.
4.           It is possible that some questions cannot be answered accurately by the respondents.  So the degree of accuracy for a statistical unit should be mentioned with the question itself.  For example, for age – the answer is expected in completed years or the monthly income is to be expressed in hundreds of rupees etc.
5.           The questions which are unduly inquisitive or which are likely to offend the respondents should not be included in a questionnaire.  Questions regarding personal habits behaviour with the family members, income should be tactfully asked.  Leading questions providing a hint to the possible answer should be avoided.
6.           The questions should be so worded that personal bias of an investigator is not reflected.
7.           The arrangement of the questions should be carefully planned.  Proper space for answers must be kept and there should be logical flow from one question to the other.
8.           The questionnaire should be neatly printed on a high quality paper creating good impression on the respondents.
9.           If possible, the questionnaire should be tried on a small sample before applying it to a large group so that some revision or amendment of the questionnaire can be made, if necessary.
EDITING OF THE PRIMARY DATA
The collected data should be edited and then only it can be processed further.  While editing the data, the following points must be remembered.
  1. The data should be consistent.  That is, the answers obtained should not contradict one another.
  2. The answers should be complete and uniform in all respects.  If some, important questions are left unanswered then the respondent should be contacted again to complete the questionnaire.
  3. The answers should be checked for accuracy.  Inaccuracy due to mathematical errors is to be corrected.
  4. The data must be checked for homogeneity of answers.  For example, if one respondent has mentioned the gross pay and if the other has mentioned net pay after tax deduction, then these cannot be compared.
SECONDARY DATA
The data compiled through various published or unpublished sources is known as Secondary Data.  The following are the main sources of the secondary data.
a)      Various Central or State Government publications supply reliable data, on many social and economic activities.  For example, Census reports, Pay Commission reports, monthly or annual publications like Bulletin on Index of Industrial Production, Retail Price Bulletin, Estimates or national product etc.
b)     Various international institutions publish the reports on matters of international importance.  Organizations like W.H.O., I.M.F., U.N.O., I.B.R.D., regularly publish official reports.
c)      Semi-official publications of corporations like municipal corporations, Life Insurance Corporation of India, etc.
d)     Publications of private bodies like Chambers of Commerce, Institute of Chartered Accountants, Institute of Bankers provide secondary data, on various issues.
e)      Periodicals like Economic Weekly, Commerce, Economic Times supply reliable information.
f)      Various universities, research organizations collect data in different fields which can be used as Secondary data.
g)     Some reference books also supply information over a long period.
h)     There are also sources like records of government departments, trade union offices, railways, state transport offices which can be used as secondary data. 
The secondary data should be carefully checked before using it in any investigation.  The data should be suitable and adequate for the investigation.  The information should be checked for the reliability and accuracy of data.  The integrity of the investigators or enumerators should be ascertained.  The secondary data should never be accepted at its face value without checking.
We have seen different methods of data collection.  If the data is collected for all units of the population, it is called Census and if it is collected only for a sample then it is called a Sample Survey.