Doug Turnbull
A search relevance expert mentioned for discussing LLM evaluation. The post emphasizes careful pairwise evaluation and checking both directions.
Key Highlights
- Doug Turnbull is repeatedly cited as a practical expert on search relevance, retrieval systems, and LLM-era evaluation.
- He emphasizes that LLM pairwise relevance evaluation should check both comparison directions to avoid misleading conclusions.
- His work helps teams choose among search engines, vector databases, and hybrid retrieval architectures based on real trade-offs.
- He connects classical IR concepts like BM25 with modern AI product concerns such as RAG performance and score calibration.
- His commentary on tests, late interaction models, and minimalist retrieval approaches gives AI PMs concrete frameworks for product decisions.
Doug Turnbull
Overview
Doug Turnbull is a search relevance expert whose work shows up repeatedly in discussions about retrieval quality, ranking systems, and practical evaluation methods for AI-powered search. For AI Product Managers, he is especially relevant because his writing bridges classic information retrieval ideas—like BM25, hybrid search, and engine selection—with modern LLM-era concerns such as pairwise evaluation and retrieval design for RAG systems.Across the newsletter mentions, Turnbull appears as a pragmatic voice focused on how search systems actually perform in production. His perspectives matter to AI PMs because they help teams move beyond hype: choosing the right retrieval stack, understanding scoring trade-offs, using late interaction models thoughtfully, and evaluating search relevance carefully by checking comparisons in both directions.
Key Developments
- 2026-02-03 — Mentioned for "Check twice, cut once with LLM search relevance eval", emphasizing that LLM pairwise evaluation should test both directions rather than assuming a single comparison is reliable.
- 2026-03-07 — Featured for "Can BM25 be a probability?", exploring BM25 through odds, probabilities, and a Bayesian framing with implications for calibrated hybrid search.
- 2026-03-11 — Cited in "The tests are the code now", arguing that in AI-assisted coding workflows, tests increasingly become the core artifact for preserving software quality.
- 2026-03-21 — Highlighted for "How to actually choose a retrieval engine", comparing trade-offs among search engines and vector databases such as Elasticsearch, OpenSearch, Solr, Vespa, Pinecone, Turbopuffer, and Weaviate.
- 2026-03-24 — Referenced in "Why tiny late interaction models win", discussing the growing importance of late interaction retrieval and a LightOn demonstration involving Antoine Chaffin and a compact 150M model.
- 2026-04-07 — Mentioned in "Is grep all you need for RAG?", arguing that a RAG-like search system can be built with enough engineering effort using simpler primitives, while cautioning that this path is difficult and rarely the easy answer.
Relevance to AI PMs
1. Improves retrieval and RAG decision-making Turnbull’s work helps AI PMs evaluate when to use traditional search engines, vector databases, or hybrid architectures. This is useful when defining product requirements for retrieval quality, latency, cost, and maintainability.2. Provides better evaluation discipline for LLM features
His emphasis on pairwise evaluation and checking both directions is highly practical for PMs designing offline evals for search, ranking, recommendation, or answer quality. It reduces the risk of overtrusting noisy LLM judgments.
3. Connects classical IR concepts to modern AI systems
By discussing BM25, late interaction, hybrid search, and engine trade-offs, Turnbull gives PMs a framework for talking with engineers about why retrieval quality changes and how to prioritize experiments that materially improve user outcomes.
Related
- RAG — Turnbull’s views on retrieval design and even minimalist approaches like grep-based systems connect directly to how RAG products are built.
- grep — Referenced as a provocative example of how far simple tooling can go with enough engineering effort in retrieval pipelines.
- LightOn — Appears in the discussion of tiny late interaction models, illustrating efficient retrieval approaches.
- Antoine Chaffin — Connected through the LightOn late interaction demonstration referenced alongside Turnbull’s commentary.
- Elasticsearch, OpenSearch, Solr, Vespa — Core search engine options discussed in retrieval engine selection.
- Pinecone, Turbopuffer, Weaviate — Vector database and retrieval infrastructure options considered in build-vs-buy and architecture decisions.
- tests-as-the-code — Linked to Turnbull’s argument that tests become the key artifact in AI-assisted software development.
- BM25 — A central ranking concept in his writing, especially for understanding lexical retrieval and hybrid scoring.
- hybrid-search — Closely connected to his work on combining lexical and semantic retrieval signals.
- llm-search-relevance-eval and llm-pairwise-evaluation — Directly tied to his emphasis on careful, bidirectional evaluation of LLM-based search judgments.
Newsletter Mentions (6)
“#11 📝 Doug Turnbull Is grep all you need for RAG? - Doug argues that with enough engineering effort you can build a RAG-style search system using only grep, but cautions that this approach is difficult and not for the faint of heart.”
#11 📝 Doug Turnbull Is grep all you need for RAG? - Doug argues that with enough engineering effort you can build a RAG-style search system using only grep, but cautions that this approach is difficult and not for the faint of heart.
“Doug Turnbull Why tiny late interaction models win - Discusses the recent prominence of late interaction models, highlighting a LightOn demonstration (with developer Antoine Chaffin) using a 150M model and the implications for retrieval and interaction approaches.”
#12 📝 Doug Turnbull Why tiny late interaction models win - Discusses the recent prominence of late interaction models, highlighting a LightOn demonstration (with developer Antoine Chaffin) using a 150M model and the implications for retrieval and interaction approaches.
“Doug Turnbull How to actually choose a retrieval engine - Explains how teams should choose a retrieval engine by comparing vector databases and search engines, and considering trade-offs between options like Elasticsearch, OpenSearch, Solr, Vespa, Pinecone, Turbopuffer, and Weaviate.”
#5 📝 Doug Turnbull How to actually choose a retrieval engine - Explains how teams should choose a retrieval engine by comparing vector databases and search engines, and considering trade-offs between options like Elasticsearch, OpenSearch, Solr, Vespa, Pinecone, Turbopuffer, and Weaviate. Emphasizes practical selection criteria beyond hype.
“#15 📝 Doug Turnbull The tests are the code now - Argues that with AI-assisted coding, tests become the most important artifact for maintaining code quality.”
Doug Turnbull is cited for a piece arguing that AI-assisted coding elevates the importance of tests. The newsletter uses him to support a broader software quality point.
“#20 📝 Doug Turnbull Can BM25 be a probability? - Explores the relationship between BM25 scores framed as odds versus probabilities and introduces a Bayesian view of BM25.”
GenAI PM Daily March 07, 2026 GenAI PM Daily 🎧 Listen to this brief 3 min listen Today's top 25 insights for PM Builders, ranked by relevance from LinkedIn, YouTube, X, and Blogs. #20 📝 Doug Turnbull Can BM25 be a probability? - Explores the relationship between BM25 scores framed as odds versus probabilities and introduces a Bayesian view of BM25. Discusses implications for calibrating hybrid search systems when combining lexical and probabilistic signals.
“Check twice, cut once with LLM search relevance eval - Highlights the importance of checking both directions in LLM pairwise evaluation of search relevance.”
GenAI PM Daily February 03, 2026 GenAI PM Daily Today's top 10 insights for PM Builders, ranked by relevance from Blogs, X, YouTube, and LinkedIn. OpenAI Launches Codex App 📝 OpenAI News Introducing the Codex app - OpenAI has launched the Codex app, enhancing user interaction with AI. Read more → 𝕏 claire vo 🖤 @clairevo Claire overhauled Maplewood’s architecture by migrating to Inngest workflows and persisting stories/actions in NeonDB, added infinite scroll for event feeds, and squashed an auto-scroll bug. Read more → 📝 Doug Turnbull Check twice, cut once with LLM search relevance eval - Highlights the importance of checking both directions in LLM pairwise evaluation of search relevance.
Related
A common pattern for grounding model responses in retrieved documents. The newsletter contrasts LlamaIndex's newer agentic document processing approach against RAG.
Elasticsearch is referenced in the context of hybrid search and kNN query behavior in practice.
Classic lexical retrieval scoring function referenced in the context of probabilistic framing and hybrid search calibration.
Stay updated on Doug Turnbull
Get curated AI PM insights delivered daily — covering this and 1,000+ other sources.
Subscribe Free