Types of Data
Nominal: Categorical data without an inherent order.
Ordinal: Categorical data with a defined order but not evenly spaced.
Interval: Numerical data with equal intervals but no true zero.
Ratio: Numerical data with equal intervals and a true zero.
Measures of Central Tendency
Provide a central value for the data set.
Mean (Average): \(\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\)
Median: Middle value when data is ordered.
Mode: Most frequently occurring value.
Measures of Dispersion
Indicate the spread or variability of a data set.
Range: Difference between the highest and lowest values.
Variance: Average of the squared differences from the Mean.
\(\sigma^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n}\) for a population,
\(s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}\) for a sample.
Standard Deviation: Square root of the variance. \(\sigma\) for population, \(s\) for sample.
Interquartile Range (IQR): Difference between the 75th percentile (Q3) and the 25th percentile (Q1).
Skewness and Kurtosis
Skewness: Measure of the asymmetry of the probability distribution.
Kurtosis: Measure of the 'tailedness' of the probability distribution.
Graphical Representations
Visualize data to identify patterns, trends, and outliers.
- Bar Chart: Represents categorical data with rectangular bars.
- Histogram: Represents the distribution of numerical data.
- Box Plot: Visual representation of the five-number summary (Minimum, Q1, Median, Q3, Maximum).
- Scatter Plot: Shows the relationship between two quantitative variables.
Z-Scores
Measure of how many standard deviations an element is from the mean.
\(z = \frac{x - \bar{x}}{\sigma}\) where \(x\) is a score from the population, \(\bar{x}\) is the mean of the population, and \(\sigma\) is the standard deviation of the population.
Correlation
Measure of the strength and direction of a linear relationship between two variables.
\(r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}\)