Fallacies: p-hacking

What is the P-Hacking Fallacy?

The P-Hacking Fallacy arises from our tendency to:

Misinterpret p-values: We mistakenly attribute too much meaning to low p-values (e.g., 0.05), assuming they indicate a significant effect when, in reality, they only indicate statistical significance.
Fail to account for multiple testing: We neglect the fact that performing multiple tests increases the likelihood of obtaining statistically significant results by chance alone.

Examples:

Fishing expeditions: A researcher conducts 100 independent tests on a dataset, and five tests return p-values below 0.05. The researcher claims that these five tests demonstrate a significant effect, ignoring the fact that five false positives are expected by chance.
Hypothesis generation: An analyst generates multiple hypotheses based on exploratory data analysis and then conducts statistical tests to confirm them. Without proper correction for multiple testing, the analyst may claim to have discovered statistically significant effects when they are merely exploiting chance.

Why do we fall prey to this fallacy?

We succumb to the P-Hacking Fallacy due to:

Lack of understanding of p-values: We often misconstrue p-values as measures of effect size or probability, rather than recognizing them as indicators of statistical significance.
Failure to account for research design: We neglect the fact that study designs, such as multiple testing, can inflate the likelihood of obtaining statistically significant results.

Consequences:

The P-Hacking Fallacy can lead to:

False positives and publication bias: Research papers may report statistically significant findings that are actually due to chance, contributing to a literature filled with unreliable results.
Inadequate research design: Studies may be poorly designed or underpowered, leading to an overestimation of the reliability of the results.

How to avoid this fallacy?

To avoid the P-Hacking Fallacy:

Use proper multiple testing correction: Apply methods like Bonferroni correction or Benjamini-Hochberg procedure to adjust p-values for multiple tests.
Pre-register studies and hypotheses: Define research questions and methods in advance, reducing the likelihood of data-driven hypothesis generation.
Consider effect sizes and confidence intervals: Instead of relying solely on p-values, focus on estimating the size of effects and their uncertainty.

Real-world applications:

The P-Hacking Fallacy has implications for:

Scientific research: Inflated claims of significance can lead to a distorted view of the scientific evidence, affecting decision-making in fields like medicine, psychology, and climate science.
Decision-making: Misinterpreting p-values can result in overconfidence in predictions or recommendations, leading to poor choices.

Macdaddy4sure's Blog