Philipp Schmid
AI developer advocate and educator known for tutorials around Gemini and open-source AI tooling. He is referenced here for a guide to the Gemini Interactions API.
Key Highlights
- Philipp Schmid is a key educator for practical adoption of Gemini APIs, agent workflows, and open-source AI tooling.
- He is especially relevant for AI PMs tracking the evolution of the Gemini Interactions API, including step-based inputs and context management.
- His posts often convert raw platform updates into actionable implementation guidance for product and engineering teams.
- Recent mentions connect him to Deep Research, Gemini Embedding 2, multimodal file search, and Gemma 4 performance improvements.
Philipp Schmid
Overview
Philipp Schmid is an AI developer advocate and educator best known for practical, implementation-focused tutorials covering Google’s Gemini ecosystem and open-source AI tooling. In this knowledge base, he appears primarily as a translator of fast-moving platform changes into builder-friendly guides—especially around the Gemini Interactions API, Deep Research workflows, embeddings, file search, and Gemma model capabilities.For AI Product Managers, Schmid matters because he consistently surfaces the “how” behind newly released AI features: what changed, how to test it quickly, and which developer patterns unlock real product value. His posts and guides are especially useful when a PM needs to evaluate new model capabilities, understand agent workflow primitives, or brief engineering teams on emerging Gemini platform features without reading raw API docs end-to-end.
Key Developments
- 2026-04-22: Shared a hands-on guide to Google AI Studio’s Gemini Deep Research Agent, including setup, prompting tactics, advanced web-scraping workflows, and sample code.
- 2026-04-23: Announced Gemini Embedding 2 general availability, highlighting a single multimodal embedding model for text, images, video, audio, and PDFs with flexible output dimensions.
- 2026-04-25: Introduced collaborative planning in Gemini API Deep Research via a `collaborative_planning` flag, enabling iterative outline creation and refinement.
- 2026-04-30: Published a developer getting-started guide for building and running Deep Research workflows with the Gemini API.
- 2026-05-05: Highlighted Interactions API error-message improvements that make debugging easier by naming exact fields, bad values, supported enums, expected vs. actual formats, and nested field paths.
- 2026-05-06: Launched Multi-Token Prediction for Gemma 4, reporting roughly 3x inference speed improvements without quality loss for supported variants under Apache 2.0.
- 2026-05-06: Outlined four subagent coordination patterns—tool calls, spawns, pools, and teams—for structuring multi-agent workflows.
- 2026-05-07: Shared that the Gemini API File Search tool supports multimodal PDF and image retrieval using gemini-embedding-2, handling chunking, embedding, indexing, and grounding in one flow.
- 2026-05-08: Updated developers on a major Gemini Interactions API change: replacing rigid `user`/`model` roles with discrete step types such as `user_input`, `thought`, `function_call`, `tool_call`, and `model_output`.
- 2026-05-12: Shared Google’s Gemini API interactions quickstart guide to help builders rapidly set up and test the new interaction model.
- 2026-05-13: Published a guide to new Interactions API `thought` steps and encrypted signatures, explaining stateful vs. stateless modes, model switching, and context management for agent development.
Relevance to AI PMs
1. Fast translation of API changes into product implications: Schmid’s tutorials often explain what a new Gemini feature means in operational terms—useful for PMs deciding whether to adopt capabilities like Interactions API steps, Deep Research planning, or multimodal retrieval.2. Practical input for agent and workflow design: His coverage of subagent coordination, Deep Research, and Interactions API mechanics helps PMs define product requirements for agent memory, orchestration, tool use, and context handling.
3. Useful signal for evaluating platform readiness: Posts on error handling, quickstarts, embeddings, and file search give PMs a practical read on developer experience, integration complexity, and whether a feature is mature enough for prototyping or production.
Related
- Gemini Interactions API / interactions-api / gemini-api: The strongest connection; Schmid is repeatedly referenced for guides, quickstarts, and change explanations around the API.
- Google / Google DeepMind / Google AI Studio / AI Studio: Much of his work centers on explaining Google’s AI product releases and developer tooling.
- Gemma / Gemma 4 / functiongemma: Connected through his updates on open-model performance, inference improvements, and implementation details.
- Gemini Embedding 2 / File Search / Deep Research: He frequently covers the practical adoption path for multimodal retrieval and research workflows.
- AI agents / agent skills / subagent coordination patterns / evals / llm-as-judge: His educational material is relevant to teams designing agentic systems and evaluating model-driven workflows.
- Sebastian Raschka, Simon Willison, Addy Osmani, Jeff Dean, Demis Hassabis, Sundar Pichai: Adjacent voices in the broader AI tooling, research, and platform ecosystem that overlaps with Schmid’s audience and subject matter.
Newsletter Mentions (46)
“#7 𝕏 Philipp Schmid published a guide for Gemini Interactions API’s new `thought` steps and encrypted signatures, detailing stateful vs. stateless modes, seamless model switching, and effortless context management to supercharge agent development.”
#7 𝕏 Philipp Schmid published a guide for Gemini Interactions API’s new `thought` steps and encrypted signatures, detailing stateful vs. stateless modes, seamless model switching, and effortless context management to supercharge agent development. Also covered by: @Sundar Pichai
“Philipp Schmid shares Google’s Gemini API interactions quickstart guide, helping PM builders quickly set up and test the new Gemini AI model.”
#20 𝕏 Philipp Schmid shares Google’s Gemini API interactions quickstart guide, helping PM builders quickly set up and test the new Gemini AI model. #21 𝕏 Lenny Rachitsky shares eight actionable insights from Eric Ries—spanning financial gravity, CEO retention post-IPO, public-benefit corp structures like AnthropicAI, mission protection, and principled decision-making exemplified by Cloudflare.
“#4 𝕏 Philipp Schmid updated the Gemini Interactions API to replace rigid `user`/`model` roles with discrete “steps” (user_input, thought, function_call, tool_call, model_output, etc.), consolidated response_format controls, and added a toggle in the docs.”
The item describes API changes and new interaction structure for Gemini.
“Philipp Schmid : The Gemini API File Search tool now offers true multimodal PDF and image retrieval using `gemini-embedding-2`, handling chunking, embedding, indexing and grounding in one call.”
#4 𝕏 Philipp Schmid : The Gemini API File Search tool now offers true multimodal PDF and image retrieval using `gemini-embedding-2`, handling chunking, embedding, indexing and grounding in one call. #15 𝕏 Philipp Schmid shows Gemma 4 pushing the Pareto frontier on Code Arena, with Gemma-4-31b at #13 and Gemma-4-26b-a4b at #17 among open models you can run on a MacBook Pro.
“Philipp Schmid launched Multi-Token Prediction for Gemma 4, tripling inference speed with zero quality loss—now available E2B/E4B under Apache 2.0.”
#13 𝕏 Philipp Schmid launched Multi-Token Prediction for Gemma 4, tripling inference speed with zero quality loss—now available E2B/E4B under Apache 2.0. #14 𝕏 Philipp Schmid outlines four subagent coordination patterns—tool calls, spawns, pools, and teams—to structure multi-agent workflows.
“#9 𝕏 Philipp Schmid launched a QoL upgrade for the Interactions API: errors now name the exact field and bad value, list supported enum options, show expected vs. actual formats, and pinpoint field paths like `input[0].name`.”
#8 𝕏 Logan Kilpatrick rolled out extensive error message improvements for the Interactions API, making its feedback far more human- and agent-readable. #9 𝕏 Philipp Schmid launched a QoL upgrade for the Interactions API: errors now name the exact field and bad value, list supported enum options, show expected vs. actual formats, and pinpoint field paths like `input[0].name`. #10 𝕏 Guillermo Rauch launched npx deepspec, an open-source agent orchestrator that leverages thousands of parallel coding agents in Vercel Sandbox to uncover critical security vulnerabilities in minutes. #11 ▶️ AI Agents run my business and life Greg Isenberg Andrew Wilkinson demonstrates running an autonomous SaaS business using Claude-based OpenClaw agents orchestrated in Harbor to handle support ticket triage (including auto-fixing P0 issues and merging PRs) and marketing campaigns via Post Hog and Meta/Reddit ads.
“#10 𝕏 Philipp Schmid published a developer getting-started guide on building and running Deep Research workflows with the Gemini API, covering API setup, workflow construction, and executing deep research queries.”
#10 𝕏 Philipp Schmid published a developer getting-started guide on building and running Deep Research workflows with the Gemini API, covering API setup, workflow construction, and executing deep research queries. #11 𝕏 Cursor launched the Cursor SDK, letting PM Builders spin up agents with the same runtime, harness, and models that power Cursor.
“Philipp Schmid launched collaborative planning in the Gemini API’s Deep Research, letting you use a `collaborative_planning` flag to request and iterate on a draft research outline (e.g., “add a section on power efficiency”).”
#6 𝕏 Philipp Schmid launched collaborative planning in the Gemini API’s Deep Research, letting you use a `collaborative_planning` flag to request and iterate on a draft research outline (e.g., “add a section on power efficiency”).
“#5 𝕏 Philipp Schmid announced Gemini Embedding 2 now GA: a single embedding model unifying text, images, video, audio and PDFs in one 8,192-token, 100+ language space with native audio embeddings and flexible 768/1,536/3,072-dim outputs.”
#5 𝕏 Philipp Schmid announced Gemini Embedding 2 now GA: a single embedding model unifying text, images, video, audio and PDFs in one 8,192-token, 100+ language space with native audio embeddings and flexible 768/1,536/3,072-dim outputs.
“Philipp Schmid shared a hands-on guide to Google AI Studio’s Gemini Deep Research Agent, detailing setup steps, prompt engineering tactics, and advanced web-scraping workflows complete with sample code.”
#10 𝕏 Philipp Schmid shared a hands-on guide to Google AI Studio’s Gemini Deep Research Agent, detailing setup steps, prompt engineering tactics, and advanced web-scraping workflows complete with sample code.
Related
An AI coding assistant with agentic and fast modes for development workflows. The newsletter notes a new Fast mode for Claude Opus 4.7 in Cursor.
Developer and writer known for his AI tooling commentary and the `llm` project. He is credited here with the 0.32a2 release note.
Google’s frontier AI research organization. The newsletter references it for launching interactive experiments in Google AI Studio.
A developer platform referenced for environment secret handling in preview and production settings. Relevant for AI PMs concerned with secure deployment workflows.
The company behind Gemini, referenced through a Gemini API quickstart guide. It is relevant for model access and developer onboarding.
Google’s AI model/product family, mentioned as one of the LLMs that names brands in category queries. In this newsletter it appears in the context of AI visibility and brand discovery.
AI researcher and educator known for practical machine learning content. In this newsletter he is credited with sharing a from-scratch Gemma 4 notebook on GitHub.
A protocol for connecting AI models and agents to external tools and context. In the newsletter it appears as a building block for multi-agent systems.
Google’s environment for building and experimenting with Gemini-powered apps and prototypes. It appears here as the venue for interactive UI experiments and an intelligent mouse pointer prototype.
Google’s research organization, mentioned for partnering with CGIAR on an AI crop-breeding model.
Co-founder and CEO of Google DeepMind. He is mentioned here in relation to new funding for Isomorphic Labs and a Gemini-powered UI prototype.
Google Research/AI leader known for technical announcements around model deployment and infrastructure. Here, he is cited for announcing Gemini-powered translations in Google Search.
CEO of Google and Alphabet. He is cited here as the announcer of Gemini Intelligence at Android Show I/O.
Google’s developer API for Gemini, mentioned via an interactions quickstart guide. It is relevant for PM builders who need to prototype and test model capabilities quickly.
An LLM application framework mentioned in the context of autonomous web-browsing agents and integrations.
Autonomous or semi-autonomous systems that can plan and execute tasks using tools and models. The newsletter frames several product launches and startup strategies around agent-first workflows.
A Gemma model referenced alongside Multi-Token Prediction, with variants E2B/E4B. Important for PMs interested in open models and inference optimization.
Google’s API for building agentic interactions with Gemini, including stateful and stateless modes. The newsletter highlights new `thought` steps, encrypted signatures, and context management features.
A Gemini model variant used here to power agentic workflow examples and multi-agent systems. It is relevant to AI PMs as an example of frontier model capability enabling more complex automated workflows.
Google’s app-building environment for experimenting with model-powered workflows and UI editing. PMs may use it for rapid prototyping and vibe coding.
A Gemini model variant that was noted as moving out of preview status.
A W3C-backed browser extension that exposes website functionality to MCP-capable agents. It lets developers register site functions as structured tools in the browser.
Google’s search product, mentioned here in the context of translation improvements powered by Gemini LLMs. The newsletter frames this as an example of AI being embedded into core search infrastructure.
An embedding model powering multimodal file search in the Gemini API. Relevant for PMs designing retrieval, citation, and metadata-aware workflows.
A Google AI product or feature mentioned as part of the Google AI Pro bundle. The newsletter gives no deeper detail, but it is notable as a bundled AI offering.
A workflow/mode for using AI systems to search the web, synthesize information, and produce detailed reports. The newsletter frames it as a practical capability for research-heavy PM work.
Google’s mapping product used as a grounding source in AI Studio. It is mentioned as part of building location-aware, citation-backed apps.
A Google AI text-to-speech model with native multi-speaker dialogue support across many languages. It is positioned as part of the Gemini product family.
Google’s video generation model with updates to portrait mode, visual consistency, and higher-resolution upscaling.
Google AI Edge Gallery is a Google tool for showcasing and running on-device AI experiences at the edge, including offline use cases.
A framework for defining, managing, and retiring capabilities that AI agents can use. The newsletter frames it as an operational way to keep agent behavior current and useful.
An open-source command-line tool for dynamic discovery of Model Context Protocol servers. It is described as reducing MCP token usage and improving AI agent tool interactions.
An API whose error messages were improved to be more human- and agent-readable. The newsletter highlights more precise field-level feedback and validation details.
Stay updated on Philipp Schmid
Get curated AI PM insights delivered daily — covering this and 1,000+ other sources.
Subscribe Free