What is a Statistical Significance? Definition, Examples & Best Practices
Statistical significance is used to judge whether a survey result, experiment result or observed relationship is strong enough to treat as unlikely to be random variation under a null hypothesis. In market research, it is often used to compare groups, such as whether younger customers rate a product higher than older customers, or whether purchase intent differs between two concepts.
The result depends on a significance test, the sample size, the estimated effect, variation in the data and the chosen significance level. A p-value below the threshold, often 0.05, is commonly called statistically significant.
Statistical significance is not the same as practical importance. A tiny difference can become statistically significant in a very large sample, while a commercially meaningful difference may fail to reach significance in a small sample. Researchers should interpret statistical significance alongside effect size, confidence intervals, sample quality and the decision being made.
Why Statistical Significance Became Controversial
The foundations of statistical significance come from early twentieth-century hypothesis testing. Fisher (1925) popularised the use of p-values as evidence against a null hypothesis. Neyman and Pearson (1933) developed a formal decision framework using Type I error, Type II error and pre-set significance levels. These ideas shaped how survey researchers, biostatisticians, social scientists and product analysts test differences.
The standard decision rule is: if p-value is less than or equal to alpha, reject the null hypothesis. Alpha is the significance level. At the 0.05 level of significance, the researcher accepts a 5% risk of rejecting the null hypothesis when it is true, under the assumptions of the test.
Worked example: a brand tracker finds that 48% of 1,000 customers recognise Brand A, while 43% of 1,000 recognise Brand B. A two-proportion z-test gives a p-value of 0.024. Since 0.024 is below 0.05, the difference is statistically significant at the 5% level.
NIST defines a p-value as the probability of observing a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. The American Statistical Association’s 2016 statement warned that a p-value does not measure the probability that the hypothesis is true, nor the size or importance of an effect. Amrhein, Greenland and McShane (2019) reported in Nature that more than 800 signatories supported a call to stop treating statistical significance as a bright-line threshold.
The strength of significance testing is discipline. It reduces the chance of overreacting to noise. Its weakness is false certainty. A significant result can still be biased, trivial or non-replicable.
How to Test Statistical Significance
A statistical significance test starts with a null hypothesis. In a survey comparison, the null might be: there is no difference in purchase intent between Concept A and Concept B. The alternative hypothesis is that a difference exists.
The researcher then chooses a significance level before analysing the data. A common level is 0.05. Stricter studies may use 0.01. The test then calculates a test statistic and p-value. The exact formula depends on the data type: a t-test for comparing means, a z-test for proportions, a chi-square test for categorical association, or a correlation significance test for relationships between numeric variables.
For two survey proportions, the test statistic is often:
z = difference between sample proportions divided by the standard error of the difference.
Worked example: 420 out of 1,000 respondents prefer Concept A, so p1 = 0.42. 370 out of 1,000 prefer Concept B, so p2 = 0.37. The observed difference is 0.05, or 5 percentage points. A two-proportion test produces a p-value of about 0.023. At a 0.05 significance level, the difference is statistically significant.
Confidence intervals provide another view. If the 95% confidence interval for the difference excludes zero, the result usually aligns with significance at the 0.05 level for a two-sided test. If the interval includes zero, the result is usually not statistically significant.
Real-World Example
A UK ecommerce retailer tests two checkout messages before Black Friday. Message A says “Free returns for 30 days.” Message B says “Try at home, return free.” The team surveys 2,400 recent customers, randomly assigning 1,200 to each message.
Purchase intent is 46% for Message A and 51% for Message B. The 5-point difference gives a p-value of 0.014 using a two-proportion significance test, so the result is statistically significant at the 0.05 level. The 95% confidence interval for the uplift is roughly 1.0 to 9.0 percentage points.
The team chooses Message B for the campaign, but it does not treat the survey as final proof. It runs an A/B test on the website before full rollout because statistical significance in a survey measures stated intent, not actual conversion.
Sources Cited
Amrhein, V., Greenland, S. and McShane, B. (2019). “Scientists Rise Up Against Statistical Significance.” Nature.
Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd.
National Institute of Standards and Technology (accessed 2026). “Critical Values and p Values.” NIST/SEMATECH e-Handbook of Statistical Methods.
National Institute of Standards and Technology (accessed 2026). “Confidence Interval Approach.” NIST/SEMATECH e-Handbook of Statistical Methods.
Neyman, J. and Pearson, E. S. (1933). “On the Problem of the Most Efficient Tests of Statistical Hypotheses.” Philosophical Transactions of the Royal Society of London.
Ou, F. S., Michiels, S., Shyr, Y., Adjei, A. A. and Oberg, A. L. (2020). “Guidelines for Statistical Reporting in Medical Journals.” Journal of Thoracic Oncology.
Wasserstein, R. L. and Lazar, N. A. (2016). “The ASA Statement on p-Values: Context, Process, and Purpose.” The American Statistician.
Frequently Asked Questions
Statistical significance means an observed result would be unlikely under a specified null hypothesis, based on the chosen test and significance level. It is commonly assessed using a p-value. A statistically significant result does not automatically mean the finding is important, causal, unbiased or useful.
A significance level, often called alpha, is the threshold used to decide whether a p-value is small enough to reject the null hypothesis. The 0.05 level of significance is common. It means the test allows a 5% Type I error rate under the test assumptions if the null hypothesis is true.
A p-value is the probability of observing a result at least as extreme as the one found, assuming the null hypothesis is true. A smaller p-value means the observed result is less compatible with the null hypothesis. It does not tell you the probability that the null hypothesis itself is true.
A p-value is usually called significant when it is less than or equal to the chosen significance level. If alpha is 0.05 and the p-value is 0.03, the result is statistically significant at the 5% level. If the p-value is 0.08, it is not significant at that threshold.
To calculate statistical significance, choose the correct test for the data, define the null hypothesis, set the significance level and calculate the test statistic and p-value. Survey proportions often use a z-test. Mean scores often use a t-test. Categorical relationships often use a chi-square test.
Statistical significance and statistical relevance are not the same. Statistical significance asks whether a result is unlikely under a null hypothesis. Statistical relevance or practical relevance asks whether the result is large enough to matter. A one-point change can be significant in a huge sample but commercially unimportant.
Confidence intervals and statistical significance are closely linked. For a two-sided test at the 0.05 level, a 95% confidence interval that excludes the null value usually indicates statistical significance. For a difference, the null value is often zero. For a ratio, the null value is often one.
Correlation significance tests whether an observed correlation is unlikely under the null hypothesis of no correlation in the population. A statistically significant correlation does not prove causation. Researchers should still check sample size, outliers, non-linear patterns, measurement quality and whether the relationship is meaningful in the real decision context.