Paired within-subject A/B experiment — April 7, 2026
Statistically Significant — p = 0.015Why this experiment is trustworthy: the paired within-subject design eliminates every known confound.
Does CAM-PULSE's knowledge base (3,044 mined coding methodologies from 329 real repos) measurably improve AI agent code quality?
Each block is one pair. Green = variant won. Red = control won. Gray = tie (<0.01 diff).
All 5 agents show positive mean diff — the KB effect is agent-independent.
Every pair, every score. Hover for details.
| # | Task | Agent | Control | Variant | Diff | C | V | Result |
|---|
All one-sided (H1: variant > control). Multiple independent tests converge.
| Test | Statistic | p-value | Interpretation |
|---|---|---|---|
| Paired t-test | t = 2.248 | p = 0.017 | Significant at α=0.05 |
| Wilcoxon signed-rank | W = 122 | p = 0.015 | Non-parametric confirmation |
| McNemar (binary) | 6 vs 1 discordant | p = 0.063 | Marginal; 6:1 ratio is striking |
| Bootstrap 95% CI | [+0.023, +0.270] | excludes 0 | Effect is reliably positive |
| Cohen's dz | 0.45 | — | Medium effect size |