Gemini Interactions API
Google’s API for building agentic interactions with Gemini, including stateful and stateless modes. The newsletter highlights new `thought` steps, encrypted signatures, and context management features.
Key Highlights
- Gemini Interactions API evolved from a multimodal beta into a more structured agent platform with discrete interaction steps and stronger context handling.
- Recent updates introduced `thought` steps and encrypted signatures, signaling better support for controllable and auditable agent workflows.
- The API supports multimodal function calling, enabling agents to process and return both text and real images.
- Newsletter coverage emphasized practical developer momentum through guides, TypeScript frameworks, CLI skills, and long-video understanding demos.
Gemini Interactions API
Overview
Gemini Interactions API is Google’s API for building agentic applications on top of Gemini models, with support for both stateful and stateless interaction patterns. Across newsletter coverage, it is positioned as an increasingly capable developer interface for orchestrating multimodal inputs, tool and function calling, long-context processing, and structured interaction flows. It has also been referred to as the Gemini API, Deep Research API, and Interactions API in different contexts.For AI Product Managers, the API matters because it reflects a shift from simple prompt-response usage toward production-grade agent systems. Recent updates emphasized discrete interaction steps such as `user_input`, `thought`, `function_call`, `tool_call`, and `model_output`, along with encrypted signatures and improved context management. Together, these changes suggest a platform designed for more controllable, auditable, and modular agent experiences—important for PMs evaluating reliability, UX, cost controls, and extensibility across multimodal use cases.
Key Developments
- 2026-01-07: Gemini Interactions API was highlighted in beta as supporting multimodal inputs including images, PDFs, CSVs, and custom data via Deep Research.
- 2026-02-14: Philipp Schmid announced support for multimodal function calling, enabling agents to see, process, and return real images alongside text with Gemini 3 capabilities.
- 2026-02-17: Philipp Schmid shared a minimal TypeScript agent framework for the Gemini Interactions API, split into agents-core for core agent loops, streaming events, and tool calling, and agent for higher-level tools, hooks, sessions, skills, and subagents.
- 2026-03-05: A new Gemini Interactions API skill for advanced agentic apps was launched, installable globally through the Vercel or Context7AI CLIs.
- 2026-03-10: The API was showcased as able to process minutes to hours of YouTube video content in seconds with a single API call, signaling strong long-video understanding and multimodal summarization potential.
- 2026-03-17: Philipp Schmid published a guide pairing nano-banana-2 with the Gemini Interactions API for use cases including text-to-image generation, Web Search grounding, Image Search, and reference-based creative workflows.
- 2026-05-08: The interaction model was updated to replace rigid `user`/`model` roles with discrete steps such as `user_input`, `thought`, `function_call`, `tool_call`, and `model_output`; response format controls were also consolidated.
- 2026-05-13: Philipp Schmid published a guide covering new `thought` steps and encrypted signatures, plus stateful vs. stateless modes, seamless model switching, and context management improvements for agent development.
Relevance to AI PMs
- Designing production agent UX: The shift from simple roles to explicit interaction steps gives PMs clearer building blocks for workflows like planning, tool use, approvals, and final outputs. This is useful when designing traceable agent experiences or evaluating how much control a team has over orchestration.
- Prioritizing multimodal product features: Support for images, PDFs, CSVs, custom data, multimodal function calling, and long-video processing makes the API relevant for roadmap decisions in research assistants, media analysis, enterprise knowledge tools, and creative applications.
- Managing reliability, governance, and scale: Stateful/stateless modes, encrypted signatures, and context management features matter when PMs need to balance memory, compliance, debugging, and cost. These capabilities can shape decisions around session persistence, auditability, and enterprise readiness.
Related
- Google / Gemini: Google is the platform owner, and Gemini is the model family powering the API’s agentic and multimodal capabilities.
- Deep Research: The API has been associated with Deep Research workflows, especially for ingesting custom data and supporting richer multimodal inputs.
- Philipp Schmid / Phil Schmid / phil-schmid: A key public educator and builder around the API, frequently publishing guides, frameworks, and implementation examples.
- Logan Kilpatrick: Related through broader Gemini API ecosystem coverage, including practical guidance such as cost-control levers.
- agents-core / agent: Open TypeScript frameworks built around the Gemini Interactions API for loops, tools, sessions, hooks, skills, and subagents.
- nano-banana-2: Demonstrated in combination with the API for multimodal and creative workflows such as grounded image generation.
- gemini-3-deep-think: Relevant as part of the broader Gemini model ecosystem and the trend toward more advanced reasoning-oriented agent behavior.
Newsletter Mentions (8)
“#7 𝕏 Philipp Schmid published a guide for Gemini Interactions API’s new `thought` steps and encrypted signatures, detailing stateful vs. stateless modes, seamless model switching, and effortless context management to supercharge agent development.”
#7 𝕏 Philipp Schmid published a guide for Gemini Interactions API’s new `thought` steps and encrypted signatures, detailing stateful vs. stateless modes, seamless model switching, and effortless context management to supercharge agent development. Also covered by: @Sundar Pichai
“Philipp Schmid updated the Gemini Interactions API to replace rigid `user`/`model` roles with discrete “steps” (user_input, thought, function_call, tool_call, model_output, etc.), consolidated response_format controls, and added a toggle in the docs.”
The newsletter highlights a structural change to Gemini’s interaction model for developers.
“#7 𝕏 Philipp Schmid wrote a developer guide for Nano Banana 2 with the Gemini Interactions API, walking through four use cases: text-to-image photorealistic Kyoto travel poster generation, Web Search grounding with real landmark facts, Image Search for accurate photos, and referen...”
Today's top 25 insights for PM Builders, ranked by relevance from Blogs, X, YouTube, and LinkedIn. #7 𝕏 Philipp Schmid wrote a developer guide for Nano Banana 2 with the Gemini Interactions API, walking through four use cases: text-to-image photorealistic Kyoto travel poster generation, Web Search grounding with real landmark facts, Image Search for accurate photos, and referen... #8 𝕏 Logan Kilpatrick explains that the Gemini API offers two cost-control levers—global billing account caps to cap overall spend and user-set spend caps to limit individual usage—detailing how each works to manage billing.
“#10 𝕏 Philipp Schmid shows how the Gemini Interactions API can process minutes to hours of YouTube video content in seconds with a single API call, highlighting a major leap in video understanding.”
The newsletter presents the API as a breakthrough for processing and understanding long video content. It is framed as a practical capability for builders working with multimodal data.
“Philipp Schmid launched a new Gemini Interactions API skill for building advanced agentic apps with Gemini models, installable globally via the Vercel or Context7AI CLIs.”
#6 𝕏 Philipp Schmid launched a new Gemini Interactions API skill for building advanced agentic apps with Gemini models, installable globally via the Vercel or Context7AI CLIs.
“Philipp Schmid built a minimal TypeScript agent framework for the Gemini Interactions API, split into agents-core (~500 LOC for a clean loop, streaming events and tool calling) and agent (built-in tools, hooks, sessions, skills & subagents).”
#3 𝕏 Philipp Schmid built a minimal TypeScript agent framework for the Gemini Interactions API, split into agents-core (~500 LOC for a clean loop, streaming events and tool calling) and agent (built-in tools, hooks, sessions, skills & subagents). #4 ▶️ Claude Code built me a $273/Day online directory Greg Isenberg Frey Chu uses Claude Code, Outscraper, Crawl for AI and Claude Vision to automate scraping, cleaning and enriching 71,000 Google Maps entries into a luxury restroom trailers directory of 725 listings in four days for under $250.
“Philipp Schmid announced the Gemini Interactions API now supports multimodal function calling, letting agents natively see, process, and return real images (not just text) with Gemini 3’s image processing and mixed text/image outputs.”
#2 𝕏 Philipp Schmid announced the Gemini Interactions API now supports multimodal function calling, letting agents natively see, process, and return real images (not just text) with Gemini 3’s image processing and mixed text/image outputs. Also covered by: @Jeff Dean
“Phil Schmid @_philschmid shared that Gemini Interactions API (beta) now supports multimodal inputs like images, PDFs, CSVs, and custom data via Deep Research.”
AI Tools & Applications Deep Research API : Phil Schmid @_philschmid shared that Gemini Interactions API (beta) now supports multimodal inputs like images, PDFs, CSVs, and custom data via Deep Research. v0 Prompt Directory : V0 @v0 highlighted a prompt directory by v0 Ambassador @rajoninternet as a quick start to ship AI apps. LlamaSheets : Llama Index @llama_index launched LlamaSheets to parse complex Excel files into AI-ready data while preserving semantic context and hierarchy.
Related
AI developer advocate and educator known for tutorials around Gemini and open-source AI tooling. He is referenced here for a guide to the Gemini Interactions API.
A product lead associated here with Gemini API and AI Studio announcements. Known for shipping developer-facing AI product features.
The company behind Gemini, referenced through a Gemini API quickstart guide. It is relevant for model access and developer onboarding.
Google’s AI model/product family, mentioned as one of the LLMs that names brands in category queries. In this newsletter it appears in the context of AI visibility and brand discovery.
A state-of-the-art image generation and editing model from Google DeepMind. It is described as Google’s best image model yet and is powered by Gemini-based world understanding plus live web and weather context.
AI product and developer advocate who shares predictions on generative AI trends. Relevant for AI PMs tracking market direction and product strategy.
A workflow/mode for using AI systems to search the web, synthesize information, and produce detailed reports. The newsletter frames it as a practical capability for research-heavy PM work.
Stay updated on Gemini Interactions API
Get curated AI PM insights delivered daily — covering this and 1,000+ other sources.
Subscribe Free