Descriptive statistics is a branch of statistics used to summarize and describe the characteristics of a dataset. Descriptive statistics involves calculating summary measures, such as the mean, median, mode, quantiles (generally quartile), range, interquartile range, standard deviation, and variance and using visualizations, such as histograms and scatter plots, to understand the distribution and patterns in the data. Descriptive statistics describe the data and do not make any inferences or predictions about the population based on the sample data.
Some common measures used in descriptive statistics include:
Measures of Central Tendency:
Mean: The mean, or average, is the sum of all the values in the dataset divided by the number of values. The mean is often used to describe a dataset's central tendency or typical value.
Median: The median is the middle value in a dataset when the values are ordered from smallest to largest. The median is often used to describe the central tendency of a dataset when there are extreme values or when the distribution is skewed.
Mode: The mode is the value that occurs most frequently in the dataset.
Quantile: A quantile is a measure that indicates the value below which a certain proportion of observations in a group of observations fall. Some examples of quantiles include quartile (4 parts), decile(10 parts) and percentiles(100 parts).
Measures of Dispersion (Spread):
Range: The range is the difference between the largest and smallest values in the dataset. The range is often used to describe the spread or variability of the data.
Interquartile range (IQR): The interquartile range is the difference between the third quartile (Q3) and the first quartile (Q1) in a group of observations. It is calculated by subtracting the value of the first quartile (Q1) from the value of the third quartile (Q3). The IQR is a more robust measure of variation than the range because it is not affected by extreme values.
Standard deviation: The standard deviation measures the spread or dispersion of the data around the mean. The standard deviation is calculated by taking the square root of the sum of the squared differences between each value and the mean divided by the number of values. The standard deviation is often used to describe the variability of the data.
Variance: The variance measures the average squared difference of the observations from the mean of the data. It is calculated by taking the average squared differences between each observation and the mean. The variance is the square of the standard deviation.
These measures are commonly used in descriptive statistics to summarize and describe the characteristics of a dataset. By calculating these measures, researchers can better understand the available data.