The Hypergeometric Distribution is a probability distribution used to model the number of successes in a sample drawn from a finite population without replacement.
Properties of the Hypergeometric Distribution:
The Hypergeometric Distribution is defined by three parameters: the population size (N), the number of successes in the population (K), and the sample size (n). It has several important properties, including:
- Discrete values: The Hypergeometric Distribution is a discrete probability distribution, meaning that it can take on only specific values rather than a continuous range of values.
- Two possible outcomes: The binomial distribution assumes that each trial has only two possible outcomes: success or failure. The probability of success on each trial is constant across all trials.
- Bounded: The number of successes in a sample drawn from a finite population cannot exceed the sample size or the number of successes in the population. As a result, the possible values of the Hypergeometric Distribution are bounded between 0 and the minimum of n and K.
- Uniform sampling: The Hypergeometric Distribution assumes that all elements in the population are equally likely to be chosen for the sample.
- The outcome of one trial affects the outcome of another event: Unlike Binomial distribution (where the trials are independent, the outcome of one trial does not affect the outcome of other trials), here, the outcome of one trial does affect the probability of the next event.
Probability Mass Function (PMF) - Hypergeometric Distribution
The probability mass function (PMF) of the Hypergeometric Distribution gives the probability of a given number of successes occurring in a sample of a given size drawn from a finite population. The formula for the PMF is as follows:
$$\Large{f(x) = \frac{{K\choose x} . {(N-K)\choose (n-x)}}{{N\choose n}}}$$
Where:
f(x) is the probability mass function
\({K\choose x}\) is the binomial coefficient, which represents the number of ways to choose x successes from K possibilities
\({(N-K)\choose (n-x)}\) is the binomial coefficient, which represents the number of ways to choose n-x failures from N-K possibilities
\({N\choose n}\) is the binomial coefficient, which represents the total number of possible samples of size n from a population of size N
x is the number of successes in the sample
n is the sample size
N is the population size
K is the number of successes in the population
Hypergeometric Distribution Calculator:
Hypergeometric Distribution (Left or Right Tail)
Hypergeometric Distribution
Mean and Variance of the Hypergeometric Distribution
The mean and variance of the Hypergeometric Distribution can be calculated using the following formulas:
Mean:
$$\Large{\mu = n \frac{K}{N}}$$
Where:
\(\mu\) is the mean
n is the sample size
K is the number of successes in the population
N is the population size
Variance:
$$\Large{\sigma^2 = n \frac{K(N-K)}{N(N-1)} }$$
Where:
\(\sigma^2\) is the variance
n is the sample size
K is the number of successes in the population
N is the population size
Uses of the Hypergeometric Distribution
There are several common uses for Hypergeometric Distribution in statistics and probability:
- Modelling the number of successes in a sample drawn from a finite population: The Hypergeometric Distribution can be used to model the number of successes in a sample drawn from a finite population without replacement.
- Sampling without replacement: The Hypergeometric Distribution can be used to model the probability of certain events occurring when sampling without replacement. For example, it could be used to model the probability of drawing a certain number of aces from a deck of cards.
- Quality control/Acceptance Sampling: The Hypergeometric Distribution can be used in quality control to determine the probability of a certain number of defective items being present in a sample drawn from a smaller batch of products.
Excel Functions for the Hypergeometric Distribution
HYPGEOM.DIST:
The HYPGEOM.DIST function in Microsoft Excel calculates the probability mass function (PMF) of the Hypergeometric Distribution for a given number of successes, sample size, number of successes in the population and population size. The function has the following syntax:
HYPGEOM.DIST(x, number_sample, successes_population, number_population, cumulative)
Where:
x: is the number of successes for which you want to calculate the probability.
number_sample: is the sample size.
number_population: is the population size.
successes_population: is the number of successes in the population.
cumulative: is a logical value that specifies whether you want to calculate the PMF (FALSE) or CDF (TRUE) of the Hypergeometric Distribution.
The function returns the probability mass or cumulative probability of the given number of successes under the Hypergeometric Distribution with the specified parameters.
Example:
Suppose you have a batch of 100 products, and you know that 10% of them are defective. You want to determine the probability of selecting a sample of 20 products from the batch and finding at least 2 defective products.
To solve this problem, you can use HYPGEOM.DIST function with the following arguments:
HYPGEOM.DIST(2, 20, 10, 100, TRUE) = 0.68122
This will return the cumulative probability of finding at least 2 defective products in a sample of 20 products drawn from a batch of 100 products with 10% defective products.
Conclusion:
The Hypergeometric Distribution is useful for modelling the number of successes in a sample drawn from a finite population without replacement. It has numerous applications in statistical hypothesis testing, sampling without replacement, and quality control.