Table of contents
Frequency distribution refers number of times a value repeats itself over frequent intervals of time. It is represented either in the form of a graph or a table.
It can be shown as:
Variables in dataset --> Frequency of its occurrence in the dataset --> depicted in Graphical or Tabular form --> helps in decision making.
let's take an example of this sample dataset
Flowers: Rose, Lily, Sunflower, Rose, Lily, Sunflower, Rose, Lily, Lily, Rose
Frequency:
Flower Dataset | Frequency |
Rose | Rose - 3 |
Lily | Lily - 4 |
Sunflower | Sunflower - 2 |
Rose | |
Lily | |
Sunflower | |
Rose | |
Lily | |
Lily |
With the above frequency table, we can make a pie chart also.
Now going on to Cumulative Frequency Distribution
Flower | Frequency | Cumulative Frequency |
Rose | 4 | 4 |
Lily | 4 | 8 |
Sunflower | 2 | 10 |
To calculate cumulative frequency you take the first value and then add the 2nd value and to the addition add the 3rd value and accordingly you keep on adding the values till the end value to get the Cumulative Frequency.
When going till last in the cumulative frequency we find how many flowers are present.
Through the above data, we can also derive the Bar Graph and Histogram
The Bar Graphh for the above examples can be represented as below
Here the values are discrete variables.
Now talking about the Histogram. While making the histogram the data should be continuous.
example: Age: {10,12,14,18,24,36,30,35,36,37,40,41,42,43,50,51}
also in the histogram, we make bins. now bins are basically grouping the data. We bin the data while representing let's suppose the value of bin as 10. Now there will be an interval of 10 values
Now we will make frequent at X-axis and bins at Y-axis
It can be represented as:
Now between 10-20, we have 4 different values so we will mark frequency till 4
20-30 have 3 values in between them so we will mark till 3
There is a concept of PDF in statistics
PDF (Probability Density Function)
PDF is known as the smoothening of the histogram. The line in the above diagram which goes through the entire graph as a curve is known as the probability density function.
Remember
A Bar Graph is used for Discrete Data.
Histogram is used for Continuous Data.