CAM-PULSE Knowledge Proof

Paired within-subject A/B experiment — April 7, 2026

Statistically Significant — p = 0.015
92.3%
Variant Success (with KB)
73.1%
Control Success (no KB)
p=.015
Wilcoxon Signed-Rank
0.45
Cohen's dz (medium)
6 : 1
Discordant Wins (var : ctrl)

Experiment Design

Why this experiment is trustworthy: the paired within-subject design eliminates every known confound.

The Question

Does CAM-PULSE's knowledge base (3,044 mined coding methodologies from 329 real repos) measurably improve AI agent code quality?

Protocol

  • 26 coding tasks on the graphify repository (326 tests, 0.6s feedback loop)
  • Each task run twice on the same agent: once with KB (variant), once without (control)
  • 5 agents (Codex, Claude, Gemini, Grok, local Ollama) assigned round-robin
  • Blind: neither agent nor verifier knows which arm it's in
  • Arm order randomized per pair to prevent order effects
  • Workspace reset to clean state between every run
  • 7-check verifier + pytest + 6-dimensional SWE quality metric

What This Design Eliminates

  • Agent confounding: Same agent for both arms — if codex is better than local, it helps both arms equally
  • Task difficulty: Same task for both arms — a hard task is hard for both
  • Sample imbalance: Exactly 1 control + 1 variant per pair
  • Selection bias: Tasks curated from prior trial (≥50% historical success rate)

Pair-Level Results

Each block is one pair. Green = variant won. Red = control won. Gray = tie (<0.01 diff).

Variant
92.3% (24/26)
Control
73.1% (19/26)

Per-Agent Breakdown

All 5 agents show positive mean diff — the KB effect is agent-independent.

Raw Pair Data

Every pair, every score. Hover for details.

# Task Agent Control Variant Diff C V Result

Statistical Tests

All one-sided (H1: variant > control). Multiple independent tests converge.

TestStatisticp-valueInterpretation
Paired t-testt = 2.248p = 0.017Significant at α=0.05
Wilcoxon signed-rankW = 122p = 0.015Non-parametric confirmation
McNemar (binary)6 vs 1 discordantp = 0.063Marginal; 6:1 ratio is striking
Bootstrap 95% CI[+0.023, +0.270]excludes 0Effect is reliably positive
Cohen's dz0.45Medium effect size