How CrewAI’s Iris auto-codes PRs in Slack

Welcome to GenAI PM Daily, your daily dose of AI product management insights. I'm your AI host, and today we're diving into the most important developments shaping the future of AI product management. First off, Logan Kilpatrick rolled out AI Studio, which defaults to higher reasoning levels for developers, trading some latency and cost for enhanced model capabilities. That stands in contrast with Google’s Gemini app, serving 900 million monthly users by optimizing latency, cost, and intelligence for a consumer-friendly experience. In related news, Garry Tan announced that GBrain now offers synthesized Q&A responses beyond basic retrieval, with an A/B test showing steady daily improvements in answer accuracy and relevance. Another key update in tooling: There's An AI For That is offering a free listing in the TAAFT AI tools directory—the #1 AI tool registry—and a feature in its newsletter for one selected AI tool. On the product side, Jason Zhou highlighted Sonnet 4.5’s context-awareness. It now tracks its own token usage and recommends prompt-level token limits, helping teams manage budgets without manual counting. On strategic insights, Peter Yang explored how solo founders and engineers leverage AI agents to boost output tenfold, covering AI stack definition, end-to-end agent workflows, interview stacks, and parallel agent orchestration—offering a clear playbook for PMs integrating autonomous assistants. Separately, Logan Kilpatrick noted that Gemini 3.5 Flash sits on the cost-versus-intelligence Pareto frontier on Vending Bench, which measures cost per unit of intelligence, emphasizing that benchmark choice and model selection must align with specific use-case requirements. In industry news, Google DeepMind expanded its partnership with Singapore to accelerate drug discovery, scientific research, pandemic preparedness, and healthcare improvements through safe AI deployment. Meanwhile, Demis Hassabis celebrated robust enterprise adoption of Gemini Live, highlighting strong performance gains confirmed by user case studies. Shifting to recent demos, CrewAI introduced Iris, an autonomous Slack–based coding agent that maintains memory, writes new skills and flows, and updated nearly half of its pull requests in a single week. It extracted 130 hard-coded color values for design-system integration and published a public skills library at skills.creai.com, including a “decide” module that encodes company decision logic directly in engineers’ terminals. Next, Luke Kim demonstrated Spice AI’s open-source agent data stack with OpenClaw, federating SQL across Parquet, Iceberg, Snowflake, MySQL, MongoDB, and Elasticsearch, while accelerating queries locally via DuckDB and SQLite with Vortex. The agent spotted a latency spike in a load test, recommended scaling replicas and tweaking the PostgreSQL pooler mode, and restored performance to baseline—all without direct backend access using an OpenAI-compatible API. Finally, Umei’s AI agent automated the full build and fine-tuning of a bullet-point news summarization model. It synthesized test data, defined evaluators—completeness, conciseness, format adherence, faithfulness—and used LoRA to fine-tune Qwen 3.54B in minutes, hitting 100% completeness and 90% faithfulness out of the box. A leading healthcare partner then deployed the custom model, boosting record-extraction accuracy by 20% and cutting inference costs by 70%. Umei also fine-tuned a 0.8-billion-parameter model to 81.5% accuracy—outperforming Anthropic’s Opus 4.6 while running 100× faster and cheaper on-device. That's a wrap on today's GenAI PM Daily. Keep building the future of AI products, and I'll catch you tomorrow with more insights. Until then, stay curious!

How CrewAI’s Iris auto-codes PRs in Slack

Transcript

The AI Product Management Brief You Actually Look Forward To

Share this podcast

How CrewAI’s Iris auto-codes PRs in Slack