Research program

Generative AI Stack

Composition, model routing, agent orchestration — the working layer.

Build your stack from this research →All programs

Thesis

The stack is the strategy made tangible. Most operators argue about tools when the real lever is composition: which model gets which job, how agents pass work to each other, what passes through a verifier and what does not. Stack research is how we measure that composition into rules anyone can copy.

Open question (replication invited)

Does a verifier-with-tool-access pattern lift irreversible-publish quality from 4.94 to 4.97 at acceptable cost? Replication invited.

Contribute a replication →RSS feed

Reports under this program

2 entries

22 Apr 202611 min

Claude 4.7 vs Gemini 3 Pro vs GPT-5.1 on long-form writing — 50 tasks, measured

Claude 4.7 wins voice-fit by a 23% margin on first draft. Gemini 3 Pro wins on factual density. GPT-5.1 wins on structural variety. The honest verdict: model choice depends on what you're optimising — and most creators are optimising the wrong thing.

Read the report →

22 Apr 202612 min

Three agent orchestration patterns we ran for four weeks — costs, outcomes, and which one earned its keep

Verifier-loop wins on irreversible work (publish, send, charge) by 38% on quality at ~3× cost. Pipeline wins on multi-format expansion (one essay → eight outputs) by 22% quality at 1.4× cost. Pure swarm loses to a single Sonnet on every metric except ego.

Read the report →

Use this in your stack

Open the Factory and route this research into your operating system.

The Factory takes the findings under this program and turns them into a personalised stack — Persona Stack, cohort, vault config, what to read next.

Build your GenCreator stack Get reports first — free