Composition, model routing, agent orchestration — the working layer.
The stack is the strategy made tangible. Most operators argue about tools when the real lever is composition: which model gets which job, how agents pass work to each other, what passes through a verifier and what does not. Stack research is how we measure that composition into rules anyone can copy.
Does a verifier-with-tool-access pattern lift irreversible-publish quality from 4.94 to 4.97 at acceptable cost? Replication invited.
Reports under this program
2 entriesClaude 4.7 wins voice-fit by a 23% margin on first draft. Gemini 3 Pro wins on factual density. GPT-5.1 wins on structural variety. The honest verdict: model choice depends on what you're optimising — and most creators are optimising the wrong thing.
Read the report →Verifier-loop wins on irreversible work (publish, send, charge) by 38% on quality at ~3× cost. Pipeline wins on multi-format expansion (one essay → eight outputs) by 22% quality at 1.4× cost. Pure swarm loses to a single Sonnet on every metric except ego.
Read the report →Use this in your stack
The Factory takes the findings under this program and turns them into a personalised stack — Persona Stack, cohort, vault config, what to read next.