Connor Holly

← AI Skills

Gemini Swarm

Agent Orchestration

geminiresearchcost-optimization

A 3-stage parallel research pipeline that uses cheap, fast models for discovery and reserves expensive models for synthesis — cutting research costs by 50-80% while maintaining quality on the final output.

The Pattern

Stage 1: CONTEXT           Stage 2: DISCOVERY            Stage 3: ANALYSIS
(gather raw material)      (parallel cheap-model calls)  (single expensive-model call)

Search results ----+
File contents -----+---->  [Cheap Model Call 1] --+
API responses -----+       [Cheap Model Call 2] --+----> [Expensive Model] --> Final Output
Directory listings-+       [Cheap Model Call 3] --+
                           ...                    |
                           [Cheap Model Call N] --+

Stage 1 (Context): Gather raw material programmatically. Search results, file contents, API responses, database queries. No AI calls here — just data collection.

Stage 2 (Discovery): Split the raw material into chunks and run N parallel calls to a cheap, fast model. Each call extracts insights, identifies patterns, or answers a specific sub-question from its chunk. Run 10-50 calls in parallel. Individual failures don't kill the pipeline — aggregate whatever succeeds.

Stage 3 (Analysis): Feed all stage 2 discoveries into a single call to a stronger model. This model synthesizes findings, resolves contradictions between chunks, and produces the final structured output.

Key Decisions

Cheap models for extraction, expensive models for synthesis. Extraction (pulling facts from text, identifying relevant sections, summarizing passages) doesn't require frontier-model intelligence. Synthesis (resolving contradictions, drawing conclusions, producing coherent narratives) does. Match model capability to task difficulty.

Parallel over sequential. Stage 2 calls are independent — they process different chunks of the same corpus. Running them in parallel reduces wall-clock time from O(N) to O(1) while keeping cost identical.

Structured JSON between stages. Each stage outputs structured data (not prose) so the next stage can reliably parse it. Stage 2 outputs JSON objects with typed fields. Stage 3 receives an array of these objects. This makes the pipeline composable and debuggable.

Graceful degradation. If 3 out of 20 stage 2 calls fail, you still have 85% of the insights. Set a minimum success threshold (e.g., 60%) and proceed if met. Log failures for investigation but don't block the pipeline.

When to Use It

Research tasks: competitive analysis, codebase exploration, document review, literature surveys. Any time you're answering a question that requires reading and synthesizing a large volume of material. The cost structure makes it viable to run frequently — daily competitive scans, weekly documentation reviews — where a pure frontier-model approach would be prohibitively expensive.

Typical cost: $0.01-0.05 for stages 1-2 (20 cheap-model calls), $0.05-0.30 for stage 3. Total: under $0.50 for comprehensive research that would cost $5-15 with expensive models throughout.