How We Matched Gemini Deep Research Max at 1/100th the Cost Using the Lovelace YottaGraph


Can an agent powered by a lightweight LLM and the Lovelace Yottagraph (and nothing else) provide deep research-grade reports significantly cheaper and faster than a flagship Deep Research model?
By Jonathan Macoskey and James Sharpnack. Macoskey is Head of Machine Learning and Product at Lovelace; Sharpnack is Senior Staff Machine Learning Engineer.
/ /
TL;DR. We built an investment-banking research agent on top of our flagship context engine, the Lovelace YottaGraph, and Gemini 3.1 Flash Lite with no grounding beyond the YottaGraph itself. Across 12 investment banking topics judged on a six-dimension, 1–10 rubric, it scored 9.67 mean versus 9.87 for Gemini Deep Research Max (3.1 Pro) at roughly six cents per report instead of seven dollars, and in under five minutes instead of seventeen minutes.
The Question We Started With
Give a frontier deep-research agent a hard, fact-heavy prompt such as “Should Marvell Technology be valued as a standalone AI infrastructure compounder or as a strategic acquisition candidate for Broadcom?” and it will think for fifteen minutes, burn two million tokens, and hand back a polished, 5,000-word memo with inline citations from ~100 sources scraped from the internet. It’s impressive. It’s also expensive and primarily limited to Google Search. The industry is betting on massive data centers and enormous power infrastructure to run AI foundation models so they can run these types of jobs at scale.
We take a different approach. We built the Lovelace YottaGraph, a massive context engine with >60M entities and billions of facts about them from over a dozen datasets, to provide agents with a thorough view of all the things happening around the planet in real time. We did this because our hypothesis is that the quality of the report is proportional to the quality, volume, and organization of the context, not the size of the model.
We therefore asked a simple question: Can an agent powered by a lightweight LLM hooked up to Lovelace’s YottaGraph (and nothing else) provide deep research-grade reports significantly cheaper and faster than a flagship Deep Research model?
The answer is yes.
This post documents the benchmark, the architecture, and the results from this line of questioning.
How We Did It
All prompts and methods described below can be found at github.com/lovelace-ai/blog.
The Benchmarks
We asked 12 investment-banking research questions across three topic archetypes:
- Strategic alternatives (four topics): target standalone case vs. strategic buyer case. Marvell/Broadcom, CVS/Cigna, Kroger/Albertsons, Hasbro/Mattel.
- Competitive financial profile (four topics): two-company comparative equity research on a market or theme. NVIDIA/AMD, Coca-Cola/PepsiCo, Visa/Mastercard, Exxon/Chevron.
- Single-company investment memo (four topics): one-company diligence. Costco, Caterpillar, JPMorgan, UnitedHealth.
We instructed agents to follow a fixed, six-section outline specific to each archetype as well as a set of “core properties” that should appear if relevant. Generally speaking, the outlines included a situational overview, basic financials of the companies in question, head-to-head comparisons or deal feasibility, valuation, and risk analysis and mitigation strategies. We specified a tailored set of judge instructions for each archetype that follow these six dimensions:
- Financial grounding: Does the report cite specific, sourced financial figures?
- Standalone specificity: Does the report make concrete, evidence-based cases with specific cited figures from evidence?
- Acquisition specificity: Does the report make concrete, evidence-based cases for a company’s financial profile and how it compares to the other company (if applicable)?
- Feasibility assessment: Does the report assess the durability and sustainability of each company’s financial position?
- Analytical coherence: Does the evidence presented actually support the conclusions reached?
- Citation coverage: Are factual claims backed by inline citations?
The judge was a third Gemini 3.0 Flash Preview model that scored every report 1–10 on these dimensions. A report passed if the mean score was greater than or equal to 9, with no dimension being less than 8. Both pipelines were judged independently on every report (i.e., the judge maintained no state from one report to the next).
Reports were, in part, scored on whether the right cited facts were in the right sections. Citation density alone could not earn points; the cited facts had to be the right ones for the dimension.
The Architecture
Gemini Deep Research Max (deep-research-max-preview-04–2026), which uses the latest Gemini 3 Pro model as of writing, was executed via the Interactions API. We used a single request to produce the report. Any sub-agent or workflow generation used by Deep Research was entirely controlled by Gemini. From our perspective, it’s “prompt-in, text-out.” Under the hood, it is well known that Gemini’s Deep Research kicks off a complex, iterative agentic workflow. Gemini Deep Research had full access to the internet and any other resources at its disposal.
The Lovelace YottaGraph (YG) agent (gemini-3.1-flash-lite-preview) also received a single input request. In contrast to Deep Research, the YottaGraph agent had no internet access, nor an ability to access data from anywhere but the YottaGraph. Under the hood, the YG agent followed a much simpler five-step, linear process to produce its report. The five steps interleave three narrow LLM calls with deterministic retrieval. The design comes from a single observation: a “research agent” is really three distinct jobs masquerading as one. Deciding what evidence to fetch (planning), turning raw facts into a thesis-aligned dossier (curation), and writing a report from that dossier (synthesis) are different cognitive tasks. We gave each job to a separate Flash Lite call with a narrow input/output contract.
Media inquiries
Contact: media@lovelace.ai