Skip to content
MathFree · No signup · 90.5K/month · $1.65 CPC

P-Value Calculator — Z / T / Chi-Square / F Tests

Drop a test statistic and pick the test type (Z, T, Chi-square, or F). Get the exact p-value plus significance verdicts at α=0.05 and α=0.01. One- or two-tailed for Z and T; right-tail for Chi and F. NIST-style numerical methods.

  • Instant result
  • Private — nothing saved
  • Works on any device
  • AI insight included
Reviewed by CalcBold Editorial · Sources: NIST SEMATECH §7 (Product and Process Comparisons) + ASA Statement on P-Values (2016) + Numerical RecipesLast verified Methodology

P-Value Calculator

Z for large samples or known σ. T for small samples + unknown σ. Chi for categorical / goodness-of-fit. F for variance comparison / ANOVA.

Your computed z, t, χ², or F. Chi-square and F must be non-negative.

Required for T (df), Chi (df), and F (df1 = between-groups). Ignored for Z.

F-test denominator degrees of freedom (within-groups). Used only when test type is F.

Two-tailed is the conservative default. One-tailed is appropriate only when you have a directional hypothesis specified BEFORE seeing the data.

Embed builderDrop the P-Value on your site →Free widget · 3 sizes · custom theme · auto-resizes · no signupGet embed code

Frequently Asked Questions

The most common questions we get about this calculator — each answer is kept under 60 words so you can scan.

  • What is a p-value?
    The probability of observing a test statistic at least as extreme as the one you computed, ASSUMING the null hypothesis is true. Small p = the observed data is unlikely under H₀ → reject H₀. Large p = the observed data is consistent with H₀ → fail to reject. P-value is NOT 'probability that H₀ is true' — a common misinterpretation flagged by the ASA Statement on P-Values (2016).
  • What significance level should I use?
    Convention is α = 0.05 (5% false-positive rate) but this is just a convention. Stricter standards (α = 0.01) for medical/safety-critical decisions. Looser standards (α = 0.10) for exploratory work. The right α depends on the cost of false positives vs false negatives in your domain. The calculator shows results at both 0.05 and 0.01 for context.
  • When do I use a Z-test vs T-test?
    Z-test: large samples (n ≥ 30) OR small samples with known population σ. T-test: small samples (n < 30) AND unknown σ. In practice, the T-test is more conservative for small samples; for n ≥ 50 the two converge. If you're using sample SD as an estimate of σ, you should be using a T-test, regardless of sample size.
  • What's the difference between one-tailed and two-tailed?
    Two-tailed: tests whether the observed statistic is significantly DIFFERENT (in either direction) from H₀. One-tailed: tests whether it's significantly LARGER (or SMALLER) than H₀ — directional. Two-tailed p ≈ 2× one-tailed p. Use one-tailed only with a pre-registered directional hypothesis; otherwise default to two-tailed (more conservative).
  • What's a chi-square test for?
    Categorical data. Two common uses: (1) Goodness-of-fit — do observed counts match expected? (rolling a die 60 times, expecting 10 of each face) (2) Independence — are two categorical variables related? (gender vs voting preference). The test statistic = Σ(observed − expected)² / expected. Always right-tailed (large values = bad fit / dependence).
  • What's an F-test for?
    Two main uses: (1) ANOVA — comparing means across 3+ groups. F = between-group variance / within-group variance. (2) Comparing variances of two distributions. (3) Testing overall significance of a regression model. F is always positive and right-tailed; small F = no group differences; large F = significant group differences. Requires both df1 (numerator) and df2 (denominator).
  • Is p < 0.05 'statistically significant'?
    By the conventional definition, yes — but the ASA Statement on P-Values (2016) explicitly warns against treating p < 0.05 as a 'discovery' or p ≥ 0.05 as 'no effect'. P-value combined with effect size, confidence intervals, prior plausibility, and replication is what determines real significance. A p of 0.049 and 0.051 are essentially identical evidence-wise.
  • What is p-hacking and how do I avoid it?
    P-hacking = manipulating data analysis (selectively reporting tests, transforming variables, removing outliers, adjusting sample size) to get p < 0.05. To avoid: (1) pre-register your analysis plan before seeing the data, (2) report ALL tests run, not just significant ones, (3) report effect sizes + confidence intervals alongside p-values, (4) replicate findings in independent samples before concluding.
  • How accurate is this calculator's CDF math?
    Normal CDF: Abramowitz & Stegun 26.2.17 series, accurate to ~7.5e-8. T-CDF and F-CDF: regularized incomplete beta function via continued fractions (Numerical Recipes betacf), accurate to ~1e-10. Chi-square CDF: regularized incomplete gamma function via series + continued fractions, accurate to ~1e-10. Sufficient for any practical hypothesis test.
  • What are the assumptions of these tests?
    Z-test: known σ OR n ≥ 30. T-test: roughly normal distribution + independent observations. Chi-square: expected counts ≥ 5 per cell (rule of thumb). F-test: normally-distributed residuals + equal variances. If assumptions are badly violated (heavy outliers, strong skew), the p-value's interpretation is unreliable; consider non-parametric alternatives (Wilcoxon, Mann-Whitney, Kruskal-Wallis).
  • Can I use this for non-parametric tests?
    Some — Mann-Whitney U test statistic can be converted to a z-score; Kruskal-Wallis converts to chi-square. For Wilcoxon signed-rank, the W statistic has its own table — this calculator doesn't support it directly. For Bayesian alternatives (Bayes factors, posterior probabilities), use dedicated tools like R's `BayesFactor` package.
  • What's the relationship between p-value and confidence interval?
    Direct. A 95% confidence interval that excludes the null hypothesis value corresponds to p < 0.05. A 99% CI that excludes null corresponds to p < 0.01. CIs are generally more informative than p-values because they show the effect size AND uncertainty in one number. When publishing, report both.