A contingency table is a tool used to summarize and analyze the relationship between two categorical variables. It is a type of cross-tabulation that displays the frequencies or counts of the combinations of categories for the two variables.
To create a contingency table, the following steps are typically followed:
- Identify the two categorical variables to be analyzed.
- Collect and summarize the data. Count the number of observations in each combination of categories for the two variables.
- Organize the data in a table with the categories of the first variable listed along the rows and the categories of the second variable listed along the columns.
- Enter the counts or frequencies in the cells of the table.
An example of a contingency table is shown below, which displays the relationship between gender (male or female) and political party (Democrat or Republican):
Democrat | Republican | Total | |
---|---|---|---|
Male | 20 | 30 | 50 |
Female | 30 | 40 | 70 |
Total | 50 | 70 | 120 |
In this example, the contingency table shows that there are 50 males in the sample, with 20 identifying as Democrats and 30 identifying as Republicans. There are 70 females in the sample, with 30 identifying as Democrats and 40 identifying as Republicans.
Contingency tables can be used to perform a Chi-square test to determine whether there is a significant association between the two variables. To do this, the following steps are typically followed:
- Calculate the expected frequencies for each combination of categories using the formula: $$\LARGE{E_{ij} = \frac{R_iC_j}{n}} $$ Where: • \(E_{ij}\) is the expected frequency for the combination of categories i and j • \(R_i\) is the row total for category i • \(C_j\) is the column total for category j • \(n\) is the total sample size
- Calculate the Chi-square statistic using the formula: $$\LARGE{\chi^2 = \sum_{i=1}^{n}\sum_{j=1}^{m}\frac{(O_{ij} - E_{ij})^2}{E_{ij}}} $$ Where: • \(O_{ij}\) is the observed frequency for the combination of categories i and j • \(E_{ij}\) is the expected frequency for the combination of categories i and j • \(n\) is the number of rows • \(m\) is the number of columns
- Determine the critical value of the Chi-square statistic based on the significance level (alpha) of the test and the degrees of freedom. The degrees of freedom are calculated as (n - 1) x (m - 1).
- Compare the calculated Chi-square statistic to the critical value to determine whether to reject or fail to reject the null hypothesis. If the calculated Chi-square statistic exceeds the critical value, the null hypothesis is rejected, and the alternative hypothesis is accepted.
In conclusion, contingency tables are useful for summarizing and analyzing the relationship between two categorical variables. They can be used to perform a Chi-square test, which can help determine whether there is a significant association between the two variables.
In addition to analyzing the relationship between two categorical variables, contingency tables can also be used to analyze the relationship between a categorical variable and a continuous variable. In this case, the continuous variable is typically grouped into intervals or categories, and a contingency table is created to summarize the frequencies or counts for each combination of categories.
Overall, contingency tables are a useful tool for analyzing the relationship between two categorical variables and can provide valuable insights into the patterns and trends in the data.