What Is Statistics: Descriptive, Inferential, and How to Interpret Data

What Statistics Actually Is

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. At its heart, it is the discipline of reasoning under uncertainty — of drawing conclusions and making decisions when the information available is incomplete, imprecise, or subject to chance variation. This is virtually every real-world situation: we cannot survey every voter in a country to predict an election, we cannot give a new drug to every patient to know if it works, we cannot measure every product coming off a factory line to check for defects. Statistics gives us the tools to reason from samples to populations, from experiments to general truths, with quantified confidence in our conclusions.

The field is traditionally divided into two major branches. Descriptive statistics concerns summarizing and describing data that has already been collected — computing averages, measuring spread, identifying patterns, and creating visualizations. Inferential statistics uses sample data to draw conclusions (inferences) about a larger population from which the sample was drawn, typically with statements about the uncertainty of those conclusions. Both branches are essential: description without inference is often incomplete (we know what happened in our sample but cannot generalize), while inference without careful description risks missing important features of the data that can invalidate statistical models.

The scope of statistics extends far beyond traditional academic contexts. Statistics underlies medicine (clinical trials, epidemiology), public policy (economic forecasting, survey methodology), business (market research, quality control, financial risk modeling), sports (player evaluation, game strategy), criminal justice (forensic evidence interpretation, recidivism modeling), and the social sciences. The rise of large datasets and machine learning has created a new hybrid discipline — data science — that blends classical statistics with computational approaches, but the foundational ideas of statistical thinking remain as relevant as ever.

Descriptive Statistics: Summarizing What You Have

Descriptive statistics condenses large collections of data into summaries that capture essential features. The most basic summaries are measures of central tendency — ways of identifying a "typical" or "central" value in a dataset. The mean (arithmetic average) is the sum of all values divided by the number of values; it is sensitive to outliers (extreme values that pull it toward them). The median is the middle value when data is sorted in order; it is resistant to outliers, making it a better summary of central tendency when the distribution is skewed (as with income data, where a few billionaires dramatically raise the mean without affecting most people's experience). The mode is the most frequently occurring value, useful for categorical data or for identifying the most common outcome.

Measures of spread describe how varied or dispersed the data is around its center. The range (maximum minus minimum) is simple but sensitive to outliers. The variance is the average of squared deviations from the mean — squaring ensures that positive and negative deviations don't cancel each other out and emphasizes larger deviations. The standard deviation is the square root of the variance, bringing it back to the same units as the original data. The interquartile range (IQR) — the difference between the 75th and 25th percentiles — describes the spread of the middle 50 percent of the data and is robust to outliers. A dataset with a small standard deviation has values clustered close to the mean; a large standard deviation indicates high variability.

Visualizing distributions is an essential complement to numeric summaries. Histograms show the frequency of values within equal-width intervals, revealing the shape of a distribution — whether it is symmetric, skewed left, skewed right, or bimodal. Box plots display the median, quartiles, and outliers compactly and facilitate comparison between groups. Scatter plots reveal relationships between two variables. The shape of a distribution matters because it affects which statistical methods are appropriate: many classical statistical tests assume normally distributed data (the bell curve), and applying them to highly skewed data can give misleading results.

Probability Distributions: The Models Behind Statistics

Statistics is built on probability theory — the mathematical framework for reasoning about random events. A probability distribution describes the range of possible values a variable can take and the probability (or probability density) associated with each value. The normal (Gaussian) distribution is the most important in classical statistics, characterized by its symmetric bell shape and described entirely by two parameters: its mean (center) and standard deviation (spread). The Central Limit Theorem, one of the most powerful results in probability, states that the distribution of the mean of any sufficiently large sample will be approximately normal, regardless of the distribution of the original data. This result is why normal-based statistical tests work well even when the underlying data is not perfectly normal.

Other important distributions serve different data types. The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success (like the number of heads in 100 coin flips). The Poisson distribution models the number of rare events in a fixed time or space (like the number of calls a call center receives per minute). The chi-squared distribution arises in tests of association between categorical variables. The t-distribution, similar to the normal but with heavier tails, is used when estimating means from small samples where the true population standard deviation is unknown — a situation common in scientific research.

Bayes' theorem — a fundamental result in probability — describes how to update the probability of a hypothesis given new evidence. The Bayesian statistical framework uses prior knowledge (expressed as a probability distribution over possible parameter values) and updates it using observed data (through the likelihood function) to produce a posterior distribution representing updated beliefs. This approach contrasts with classical (frequentist) statistics, which estimates parameters without incorporating prior information and interprets probability as long-run frequency rather than degree of belief. The Bayesian-frequentist debate is one of the most interesting philosophical divisions in statistics, with active practitioners on both sides and increasing use of Bayesian methods in machine learning, medical research, and policy analysis.

Inferential Statistics: Drawing Conclusions from Samples

Inferential statistics allows us to generalize from a sample to a population with quantified uncertainty. The fundamental logic is this: we observe a sample, compute statistics from it, and use probability theory to make statements about the population parameters those statistics estimate. Two key concepts structure most inferential statistics: estimation and hypothesis testing. Estimation produces point estimates (single best guesses) and interval estimates (ranges of plausible values) for unknown population parameters. Hypothesis testing evaluates evidence for or against specific claims about the population.

A confidence interval provides a range of values within which the true population parameter is likely to fall, with a specified level of confidence — typically 95 percent. A 95 percent confidence interval for a population mean, for example, means that if the procedure were repeated many times with different random samples, approximately 95 percent of the calculated intervals would contain the true population mean. Critically, a 95 percent CI does not mean "there is a 95 percent probability that the true value is in this interval" — once the interval is calculated, the true value either is or is not in it. The probability statement applies to the procedure, not to any particular interval. This distinction is subtle but important for correct interpretation.

Hypothesis testing asks: is the effect or difference I observed in my sample plausible if the null hypothesis (typically, "no effect" or "no difference") were true? The p-value quantifies this: it is the probability of observing data as extreme as (or more extreme than) what was observed, assuming the null hypothesis is true. A small p-value (typically below 0.05) indicates that the data would be unusual if the null hypothesis were true, leading to rejection of the null hypothesis in favor of the alternative. A large p-value indicates insufficient evidence to reject the null. The choice of 0.05 as a threshold is arbitrary convention, not a hard boundary between "significant" and "insignificant" — a p-value of 0.051 and 0.049 carry essentially the same evidentiary weight.

Common Misinterpretations and Statistical Pitfalls

Statistical results are widely misunderstood, even by scientists and journalists. The p-value is perhaps the most misinterpreted concept in science. A p-value of 0.03 does NOT mean "there is a 3 percent probability that the null hypothesis is true" or "there is a 97 percent probability that the alternative hypothesis is true." It means only that, if the null hypothesis were true, there would be a 3 percent chance of observing data this extreme by chance. This distinction matters enormously: a p-value below 0.05 can be obtained through random chance approximately 5 percent of the time even when there is no real effect — which, given the thousands of hypothesis tests conducted in science each year, virtually guarantees that many published "significant" results reflect chance rather than true effects. This is a major contributor to the replication crisis in many scientific fields.

Statistical significance and practical significance are entirely different things. A study with a very large sample size can detect a statistically significant effect that is so small as to be practically meaningless. A drug that reduces blood pressure by 0.1 mmHg with p = 0.001 is statistically significant but clinically irrelevant. Effect size measures — Cohen's d, correlation coefficients, odds ratios — quantify the magnitude of an effect, separate from its statistical significance. Good statistical reporting always includes both. Conversely, a study with a small sample might fail to detect a real and important effect (a Type II error or false negative), not because the effect doesn't exist, but because the study lacked sufficient statistical power to detect it.

Correlation does not imply causation — perhaps the most famous warning in statistics. Two variables can be highly correlated because one causes the other, because a third variable causes both (a confounding variable), or simply by chance (especially with small samples or data mining across many variables). Ice cream sales and drowning deaths are correlated not because ice cream causes drowning, but because both increase in hot summer weather. Establishing causation requires either a randomized controlled experiment (which controls for confounding by random assignment) or sophisticated observational study designs that attempt to account for confounding. Regression analysis, while powerful for modeling relationships between variables, cannot by itself distinguish correlation from causation.

Regression Analysis: Modeling Relationships

Regression analysis is one of the most widely used statistical tools, modeling the relationship between a dependent variable (outcome) and one or more independent variables (predictors). Simple linear regression models the relationship as a straight line: y = β₀ + β₁x + ε, where β₀ is the intercept, β₁ is the slope, and ε is a random error term. The goal is to estimate β₀ and β₁ using data in a way that minimizes the sum of squared differences between observed and predicted values (ordinary least squares, OLS). The slope β₁ represents the expected change in y for a one-unit increase in x, holding other factors constant.

Multiple regression extends this to include multiple predictors: y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε. This allows researchers to estimate the relationship between an outcome and one predictor while controlling for the effects of other predictors — a critical capability for observational studies where confounding is pervasive. For example, estimating the effect of education on income while controlling for age, work experience, and family background requires multiple regression. The R² statistic measures how much of the variance in the outcome is explained by the model, ranging from 0 (no explanatory power) to 1 (perfect fit).

Logistic regression handles binary outcomes (yes/no, success/failure) by modeling the log-odds of the outcome as a linear function of predictors. It produces odds ratios that quantify how much a one-unit increase in a predictor multiplies the odds of the outcome. Logistic regression underlies many medical risk models, credit scoring algorithms, and spam filters. More advanced regression methods — including Poisson regression for count data, Cox proportional hazards models for time-to-event data, and mixed-effects models for hierarchical or clustered data — extend the regression framework to a wide variety of data types and research designs. These methods share the fundamental logic of quantifying relationships between variables while accounting for uncertainty and controlling for confounding.

Statistics in Practice: From Data to Decision

Good statistical practice begins long before data is collected. Sample size determination — calculating how many observations are needed to detect a meaningful effect with adequate statistical power — should precede any study. Random sampling, when feasible, is essential for the validity of inferential conclusions. Pre-registration of study hypotheses and analysis plans — publicly committing to specific hypotheses and methods before seeing the data — prevents the p-hacking (selective reporting of analyses that happen to produce significant results) that corrupts much scientific literature. These practices are increasingly required by leading journals and funding agencies.

Data cleaning and exploratory data analysis are essential steps before formal statistical modeling. Real datasets contain missing values, measurement errors, outliers, and coding inconsistencies that can dramatically distort statistical results if not addressed. Visualizing distributions, checking for outliers, examining relationships between variables, and verifying that data meets the assumptions of planned statistical tests are all critical preliminary steps. The adage "garbage in, garbage out" applies with full force to statistical analysis: the most sophisticated method applied to poor-quality data produces unreliable results.

Communicating statistical results to non-specialist audiences requires translating technical concepts into understandable language without sacrificing accuracy. Confidence intervals should be explained as ranges of plausible values rather than ranges with a fixed probability of containing the true value. Relative risks and odds ratios should be accompanied by absolute risk differences — "the drug reduced the risk of heart attack by 25 percent" is misleading without knowing whether the baseline risk was 4 percent (reduced to 3 percent, a 1 percentage point absolute difference) or 40 percent (reduced to 30 percent, a 10 percentage point difference). The best statistical communication is transparent about uncertainty, honest about limitations, and focused on the practical implications of findings rather than their statistical significance alone.