Standard deviation is a measure of how spread out a dataset is. It is calculated by finding the difference between each data point in the dataset and the mean (average) of the dataset, squaring those differences, finding the average of the squared differences (also known as the variance), and then taking the square root of the variance.

Sounds complicated? Let's understand with the help of a simple example.

To calculate the standard deviation manually, you can follow these steps:

- Calculate the mean of the dataset.
- Calculate the differences between each value in the dataset and the mean.
- Square the differences.
- Calculate the sum of the squared differences.
- Divide the sum of the squared differences by the number of values in the dataset. This is the variance.
- Take the square root of the variance to get the standard deviation.

For example, let's say you have the following dataset: 1, 3, 5, 7

- The mean of the dataset is (1+3+5+7)/4 = 16/4 = 4.
- The differences between each value and the mean are -3, -1, 1, 3.
- The squared differences are 9, 1, 1, 9.
- The sum of the squared differences is 20.
- The variance is 20/4 = 5.
- The standard deviation is the square root of the variance, which is √5 = 2.236.

## Sample vs Population Standard Deviation

### 1. Population Standard Deviation:

The population standard deviation is used when you have access to the entire population, and you want to calculate the dispersion or spread of the population. The calculation you saw above was for the population.

The formula for the standard deviation of a population is:

$$\sigma = \sqrt{\frac{\sum(x_i - \mu)^2}{N}}$$

In this, σ represents the standard deviation, \(x_i\) represents each individual value in the sample or population, μ represents the population's mean, Σ indicates that the sum of the values should be calculated. N represents the number of values in the population.

### 2. Population Variance:

Variance is the square of standard deviation.

The formula for the variance of a population is:

$$\sigma^2 = {\frac{\sum(x_i - \mu)^2}{N}}$$

In this, \(\sigma^2\) represents the population variance, \(x_i\) represents each individual value in the sample or population, μ represents the population's mean, Σ indicates that the sum of the values should be calculated. N represents the number of values in the population.

### 3. Sample Standard Deviation:

The sample standard deviation is usually used when you don't have access to the entire population and want to estimate the population standard deviation based on a sample. It is generally a more accurate estimate of the population standard deviation if the sample is representative of the population and if the sample size is large enough.

The formula for the sample standard deviation is:

$$s = \sqrt{\frac{\sum(x_i - \overline{x})^2}{n-1}}$$

In this, s represents the sample standard deviation, \(x_i\) represents each individual value in the sample or population, \(\overline{x}\) represents the sample mean, Σ indicates that the sum of the values should be calculated. n represents the number of values in the sample.

### 4. Sample Variance:

Variance is the square of standard deviation.

The formula for the sample variance is:

$$s^2 = {\frac{\sum(x_i - \overline{x})^2}{n-1}}$$

In this, \(s^2\) represents the sample standard deviation, \(x_i\) represents each individual value in the sample or population, \(\overline{x}\) represents the sample mean, Σ indicates that the sum of the values should be calculated. n represents the number of values in the sample.

## Sample vs Population Standard Deviation

The main difference between the standard deviation of a sample and the standard deviation of a population is the denominator of the formula. In the formula for the standard deviation of a sample, the denominator is n-1, while in the formula for the standard deviation of a population, the denominator is N. **The sample standard deviation is in fact the estimate of the population standard deviation based on limited samples. **This estimate is subject to error. The standard deviation of a sample is typically slightly larger than the standard deviation of a population with the same values. To address that, (n-1) is used in the denominator instead of N. This is the simplistic explanation of using (n-1) in the denominator instead of n. There are more complex explanations are available, but we are not going through them.

Sample Standard Deviation | Population Standard Deviation | |
---|---|---|

Definition | The square root of the sample variance, estimating the variability within a sample. | The square root of the variance, measuring the variability within an entire population. |

Formula | $$s = \sqrt{\frac{\sum(x_i - \overline{x})^2}{n-1}}$$ | $$\sigma = \sqrt{\frac{\sum(x_i - \mu)^2}{N}}$$ |

When to use | When data is a random sample from a larger population. | When data includes every member of the population. |

Degree of freedom | n-1 | N |

Typically yields | Greater value (due to use of n-1 in the denominator). | Smaller value (due to use of N in the denominator). |

## Calculating Variance and Standard Deviation Using Excel

To calculate the standard deviation in Excel, you can use the "STDEV.S" function for a sample or the "STDEV.P" function for a population. For example:

=STDEV.S(A1:A5) for sample standard deviation, and

=STDEV.P(A1:A5) for population standard deviation

To calculate the variance in Excel, you can use the "VAR.S" function for a sample or the "VAR.P" function for a population. For example:

=VAR.S(A1:A5) for sample standard deviation, and

=VAR.P(A1:A5) for population standard deviation

## Conclusion

In conclusion, calculating standard deviation and variance for a sample or a population is an important tool for understanding the dispersion or spread of a dataset. It is useful for identifying patterns and trends in the data and comparing groups. Using the appropriate formula and functions, you can easily calculate standard deviation and variance in Excel or manually. Understanding the differences between sample and population standard deviation and variance is also important, as it allows you to choose the correct formula and make more accurate estimates and predictions.