Statistics
Practice MCQsStatistics is the branch of mathematics that deals with collection, classification, tabulation, presentation, analysis and interpretation of data.
Statistics is the branch of mathematics that deals with collection, classification, tabulation, presentation, analysis and interpretation of data. It helps convert raw data into meaningful information using tables, charts, averages, dispersion, correlation and regression. This chapter covers collection and tabulation of statistical data, classification of data, frequency distribution, cumulative frequency distribution, graphical representation using bar charts, pie charts, histograms and frequency polygons, measures of central tendency, variance, standard deviation, correlation and regression.
What is Statistics?
Statistics is the science of data. It begins with collecting data and continues through arranging, classifying, presenting and analysing it. Statistical tools help us understand patterns, compare groups, measure variation and make decisions based on evidence.
In competitive examinations, Statistics questions commonly test frequency tables, cumulative frequency, graphical interpretation, mean, median, mode, variance, standard deviation, correlation and simple regression.
| Area | What It Covers | Exam Focus |
|---|---|---|
| Collection of Data | Primary and secondary data. | Source and reliability of data. |
| Classification and Tabulation | Organising data into groups, rows and columns. | Frequency tables and grouped data. |
| Graphical Representation | Bar chart, pie chart, histogram, frequency polygon. | Chart selection and interpretation. |
| Central Tendency | Mean, median and mode. | Finding representative value. |
| Dispersion | Variance and standard deviation. | Measuring spread or consistency. |
| Correlation and Regression | Relationship and prediction between variables. | Direction, strength and linear prediction. |
“Statistics converts data into information and information into decision-making support.”
Key points
- Data must be collected from reliable sources.
- Raw data should be classified before analysis.
- Frequency distribution shows how often values occur.
- Cumulative frequency shows running totals.
- Graphs make data easier to interpret.
- Mean, median and mode measure central tendency.
- Variance and standard deviation measure spread.
- Correlation measures relationship between variables.
- Regression helps estimate one variable from another.
Core Formula Bank
Tip: Mean gives the balance point, median gives the middle value, and mode gives the most frequent value.
Data Analysis Workflow
Instead of memorising Statistics as separate formulas, it is useful to understand the complete workflow from raw data to interpretation.
| Step | Action | Purpose |
|---|---|---|
| 1 | Collect data | Get observations from primary or secondary source. |
| 2 | Classify data | Group similar items together. |
| 3 | Tabulate data | Arrange data in rows and columns. |
| 4 | Represent graphically | Use charts to make patterns visible. |
| 5 | Compute measures | Find mean, median, mode, variance and standard deviation. |
| 6 | Interpret results | Draw useful conclusions from the data. |
Collection and Tabulation of Statistical Data
The first step in Statistics is the collection of data. After collection, data is arranged in a systematic form using tables. This process is called tabulation.
| Concept | Meaning | Example |
|---|---|---|
| Primary Data | Data collected directly by the investigator. | Survey of students in a class. |
| Secondary Data | Data collected from already published or existing sources. | Government reports, books, websites, records. |
| Raw Data | Data in original unorganised form. | Marks: 45, 72, 63, 45, 80, 72. |
| Array | Data arranged in ascending or descending order. | 45, 45, 63, 72, 72, 80. |
| Tabulation | Arrangement of data in rows and columns. | Marks and number of students table. |
Classification of Data
Classification means arranging data into different groups or classes according to common characteristics. It reduces complexity and makes data suitable for analysis.
| Type of Classification | Meaning | Example |
|---|---|---|
| Chronological Classification | Data arranged according to time. | Sales by year: 2022, 2023, 2024. |
| Geographical Classification | Data arranged according to place or region. | Population by state. |
| Qualitative Classification | Data arranged according to attributes or qualities. | Gender, literacy, employment status. |
| Quantitative Classification | Data arranged according to numerical values. | Marks, income, height, age. |
| Discrete Data | Data that takes countable values. | Number of children, number of books. |
| Continuous Data | Data that can take any value within an interval. | Height, weight, time, temperature. |
Frequency Distribution
A frequency distribution shows how many times each value or class interval occurs in a dataset. It is useful when raw data is large or repetitive.
| Marks | Tally | Frequency |
|---|---|---|
| 10 | || | 2 |
| 20 | ||| | 3 |
| 30 | |||| | 4 |
| 40 | || | 2 |
| 50 | | | 1 |
In this table, frequency means the number of times each mark appears.
Grouped Frequency Distribution and Cumulative Frequency
When data is large, it is grouped into class intervals. The cumulative frequency gives the running total of frequencies.
| Class Interval | Frequency \(f\) | Less Than Cumulative Frequency |
|---|---|---|
| 0 - 10 | 3 | 3 |
| 10 - 20 | 5 | 8 |
| 20 - 30 | 9 | 17 |
| 30 - 40 | 7 | 24 |
| 40 - 50 | 6 | 30 |
Graphical Representation of Data
Graphs and charts present data visually. They make comparisons, trends, proportions and distributions easier to understand.
| Graph / Chart | Best Used For | Example Use |
|---|---|---|
| Bar Chart | Comparison of different categories. | Sales of different products. |
| Pie Chart | Showing share of parts in a whole. | Budget allocation by department. |
| Histogram | Grouped continuous frequency data. | Marks grouped into class intervals. |
| Frequency Polygon | Trend across class intervals using midpoints. | Comparing frequency distributions. |
| Ogive | Cumulative frequency data. | Finding median graphically. |
| Line Graph | Change over time. | Monthly revenue or temperature trend. |
Histogram, Frequency Polygon and Pie Chart: Examples
The following examples show how different types of graphical representation are interpreted.
| Graph Type | Data Needed | How It Is Constructed |
|---|---|---|
| Histogram | Class intervals and frequencies. | Draw adjoining rectangles where base is class interval and height is frequency. |
| Frequency Polygon | Class midpoints and frequencies. | Plot class midpoint against frequency and join the points by straight lines. |
| Pie Chart | Category values and total value. | Convert each category into central angle using \(\frac{\text{value}}{\text{total}}\times 360^\circ\). |
| Bar Chart | Categories and values. | Draw equal-width bars with height proportional to value. |
Measures of Central Tendency
Measures of central tendency give a single representative value for a dataset. The three most common measures are mean, median and mode.
| Measure | Meaning | Formula / Method | Best Used When |
|---|---|---|---|
| Mean | Arithmetic average. | \(\bar{x}=\frac{\sum x}{n}\) | All values are important and no extreme values dominate. |
| Median | Middle value after arranging data. | Arrange data first and find middle value. | Data has extreme values. |
| Mode | Most frequently occurring value. | Value with highest frequency. | Most common item is required. |
Variance and Standard Deviation
Measures of dispersion show how spread out the data values are. Variance and standard deviation are important measures of spread. A lower standard deviation means values are more consistent.
| Measure | Formula | Meaning |
|---|---|---|
| Deviation from Mean | \(x-\bar{x}\) | Distance of each value from the mean. |
| Variance | \(\sigma^2=\frac{\sum (x-\bar{x})^2}{n}\) | Average of squared deviations from mean. |
| Standard Deviation | \(\sigma=\sqrt{\frac{\sum (x-\bar{x})^2}{n}}\) | Square root of variance. |
| Comparison | Lower \(\sigma\) means more consistency. | Used to compare variation between datasets. |
Correlation and Regression
Correlation measures the direction and strength of relationship between two variables. Regression helps estimate or predict the value of one variable based on another variable.
| Concept | Meaning | Example |
|---|---|---|
| Positive Correlation | Both variables move in the same direction. | Study hours and marks. |
| Negative Correlation | One variable increases while the other decreases. | Price and demand. |
| Zero Correlation | No clear linear relationship. | Shoe size and exam marks. |
| Correlation Coefficient | A number between \(-1\) and \(+1\). | \(r=1\) perfect positive, \(r=-1\) perfect negative. |
| Regression | Prediction of one variable using another variable. | Predicting sales from advertising expenditure. |
| Regression Line | Line of best fit. | \(y=a+bx\) |
Step-by-Step Solving Method
| Step | Action | Example Focus |
|---|---|---|
| Step 1 | Identify the type of data. | Raw, discrete, continuous, grouped or ungrouped. |
| Step 2 | Organise the data. | Arrange, classify, tabulate or form frequency distribution. |
| Step 3 | Select the required tool. | Mean, median, mode, variance, standard deviation, chart, correlation. |
| Step 4 | Apply the formula carefully. | Use \(\sum x\), \(\sum f\), \(\sum fx\), deviations or cumulative frequency. |
| Step 5 | Interpret the result. | Average, spread, most common value, relationship or prediction. |
Solved Examples
| Question | Method | Answer |
|---|---|---|
| Find the mean of \(5, 7, 8, 10, 15\). | \[ \bar{x}=\frac{\sum x}{n} \] \[ \bar{x}=\frac{5+7+8+10+15}{5}=\frac{45}{5}=9 \] | 9 |
| Find the median of \(4, 9, 2, 7, 5\). | Arrange in ascending order: \[ 2,4,5,7,9 \] Middle value is 5. | 5 |
| Find the mode of \(3, 5, 7, 5, 9, 5, 3\). | Value 5 occurs most frequently. | 5 |
| Find the mean for the frequency table: \(x: 10,20,30\), \(f: 2,3,5\). | \[ \sum fx=(10)(2)+(20)(3)+(30)(5)=20+60+150=230 \] \[ \sum f=2+3+5=10 \] \[ \bar{x}=\frac{230}{10}=23 \] | 23 |
| Find variance and standard deviation of \(2,4,6\). | Mean: \[ \bar{x}=\frac{2+4+6}{3}=4 \] Deviations: \(-2,0,2\). Squared deviations: \(4,0,4\). \[ \sigma^2=\frac{4+0+4}{3}=\frac{8}{3} \] \[ \sigma=\sqrt{\frac{8}{3}} \] | Variance \(=\frac{8}{3}\), Standard deviation \(=\sqrt{\frac{8}{3}}\) |
| In a pie chart, a category has value 25 out of total 100. Find its central angle. | \[ \text{Central angle}=\frac{25}{100}\times 360^\circ=90^\circ \] | \(90^\circ\) |
| If study hours increase and marks also increase, what type of correlation is shown? | Both variables move in the same direction. | Positive correlation |
| If price increases and demand decreases, what type of correlation is shown? | One variable increases while the other decreases. | Negative correlation |
Note: For mean, median and mode questions, first check whether data is raw, arranged, frequency-based or grouped.
Common Traps and Shortcuts
Common Traps
- Finding median without arranging the data.
- Confusing frequency with data value.
- Using bar chart instead of histogram for continuous grouped data.
- Forgetting that cumulative frequency is a running total.
- Using wrong total while calculating pie chart angle.
- Confusing variance and standard deviation.
- Assuming correlation always means causation.
- Ignoring extreme values while interpreting mean.
- Comparing standard deviations without checking units or scale.
Useful Shortcuts
- Mean = sum of values divided by number of values.
- For frequency data, use \(\bar{x}=\frac{\sum fx}{\sum f}\).
- Median needs arranged data.
- Mode is the most frequent value.
- Histogram bars touch each other.
- Pie chart total angle is always \(360^\circ\).
- Lower standard deviation means more consistency.
- Positive correlation means both variables move together.
- Negative correlation means variables move in opposite directions.
Practice
A) Multiple Choice Questions
-
The mean of \(2,4,6,8\) is:
4 5 6 8
-
The middle value of arranged data is called:
Mean Median Mode Range
-
The value occurring most frequently is called:
Mean Median Mode Variance
-
Pie chart total angle is:
90° 180° 270° 360°
-
If two variables move in opposite directions, the correlation is:
Positive Negative Zero Perfect positive
B) Solve the Higher-Order Problems
- Find the mean of \(10, 20, 30, 40, 50\). (Hint: Use \(\bar{x}=\frac{\sum x}{n}\).)
- Find the median of \(12, 7, 15, 10, 9\). (Hint: Arrange first.)
- Find the mode of \(4, 6, 8, 6, 10, 6, 8\). (Hint: Find the most frequent value.)
- In a pie chart, a category has value 40 out of total 200. Find its central angle. (Hint: Use \(\frac{\text{value}}{\text{total}}\times 360^\circ\).)
- Find the variance of \(3,5,7\). (Hint: First find mean, then squared deviations.)
C) Match the Concept with the Correct Meaning
| Concept | Correct Meaning / Rule |
|---|---|
| Primary Data | Data collected directly by the investigator |
| Frequency | Number of times a value occurs |
| Cumulative Frequency | Running total of frequencies |
| Histogram | Graph for grouped continuous frequency data |
| Mean | Arithmetic average |
| Median | Middle value after arranging data |
| Mode | Most frequently occurring value |
| Regression | Prediction of one variable using another variable |
Statistics Reminder
Statistics begins with collection and organisation of data. Data is classified, tabulated and represented graphically using bar charts, pie charts, histograms and frequency polygons. Mean, median and mode describe central tendency, while variance and standard deviation describe spread. Correlation studies relationship between variables and regression helps in prediction.
Task: Create five Statistics questions using one question each from frequency distribution, graphical representation, mean, median, standard deviation, correlation and regression.
Show Suggested Answers
Multiple Choice
-
5
\[ \bar{x}=\frac{2+4+6+8}{4}=\frac{20}{4}=5 \] -
Median
The middle value of arranged data is called median. -
Mode
The value occurring most frequently is called mode. -
360°
The total angle in a pie chart is \(360^\circ\). -
Negative
If two variables move in opposite directions, the correlation is negative.
Higher-Order Problems
- \[ \bar{x}=\frac{10+20+30+40+50}{5}=\frac{150}{5}=30 \] Answer = 30.
-
Arrange the data:
\[
7,9,10,12,15
\]
Middle value is 10.
Answer = 10. -
In \(4,6,8,6,10,6,8\), the value 6 occurs most often.
Answer = 6. - \[ \text{Central angle}=\frac{40}{200}\times 360^\circ=72^\circ \] Answer = \(72^\circ\).
- Data: \(3,5,7\) \[ \bar{x}=\frac{3+5+7}{3}=5 \] Deviations: \(-2,0,2\). Squared deviations: \(4,0,4\). \[ \sigma^2=\frac{4+0+4}{3}=\frac{8}{3} \] Answer = \(\frac{8}{3}\).
Concept Matching
- Primary Data → Data collected directly by the investigator
- Frequency → Number of times a value occurs
- Cumulative Frequency → Running total of frequencies
- Histogram → Graph for grouped continuous frequency data
- Mean → Arithmetic average
- Median → Middle value after arranging data
- Mode → Most frequently occurring value
- Regression → Prediction of one variable using another variable
Clue Explanation
Statistics questions require identifying the type of data and the required measure. Use tables for frequency and cumulative frequency, charts for visual representation, mean/median/mode for central tendency, variance and standard deviation for spread, and correlation/regression for relationship and prediction.
Exam tips
- Arrange data before finding median.
- Use \(\sum fx\) for frequency-based mean.
- Histogram bars touch each other.
- Pie chart total is always \(360^\circ\).
- Mode is the most frequent value.
- Lower standard deviation means more consistency.
- Positive correlation means same direction movement.
- Negative correlation means opposite direction movement.
- Regression is used for prediction.