GenAI PM
person44 mentions· Updated Feb 2, 2026

Simon Willison

Developer and writer known for hands-on AI and tooling tutorials. Here he provides a Docker-based walkthrough for running OpenClaw locally.

Key Highlights

  • Simon Willison is a key interpreter of real-world AI tooling, model behavior, and agent workflows for builders.
  • His 'lethal trifecta' framework gives AI PMs a practical way to assess agent security risk.
  • His research into LLM provider APIs helps teams think clearly about abstraction layers, interoperability, and streaming behavior.
  • His hands-on reviews of local AI products like Google AI Edge Gallery surface concrete UX and platform trade-offs.
  • He also contributes practical tutorials, including a Docker-based walkthrough for running OpenClaw locally.

Simon Willison

Overview

Simon Willison is a developer, writer, and prolific experimenter whose work has become a reliable signal for how AI tools actually behave in practice. He is especially known for hands-on tutorials, detailed implementation notes, security framing, and rapid analysis of new model capabilities, APIs, and agent workflows. In this context, he is also noted for providing a Docker-based walkthrough for running OpenClaw locally.

For AI Product Managers, Simon matters because he consistently translates fast-moving AI developments into concrete product lessons: what works, what breaks, what is changing in model APIs, and where the real risks are. His writing often sits at the intersection of developer tooling, model evaluation, agent design, prompt injection, local inference, and AI-assisted software engineering—making his work highly actionable for teams building AI features rather than just tracking headlines.

Key Developments

  • 2026-04-11: Simon observed that OpenAI's voice mode appears to run on an older, weaker model, creating surprising capability differences depending on access point. The takeaway is that product experience can vary significantly across surfaces even within the same vendor ecosystem.
  • 2026-04-10: Lenny Rachitsky highlighted Simon Willison's "lethal trifecta": AI agents become especially dangerous when they combine access to private data, intake of untrusted content, and the ability to exfiltrate information. Simon's framing provides a practical security model for agent design.
  • 2026-04-07: Simon reviewed Google AI Edge Gallery, Google's iPhone app for running local Gemma models, noting strong on-device performance, image Q&A, short audio transcription, and tool-like "skills" interactions, while also pointing out limitations such as ephemeral conversations and missing logs.
  • 2026-04-06: Simon's coverage of Google AI Edge Gallery emphasized fast local inference with Gemma 4-class models and surfaced an important product insight: useful local AI experiences still need durable conversation history and better logging to support repeated use.
  • 2026-04-05: Simon published research-llm-apis, a repository investigating different LLM providers' HTTP APIs to support a major update to the abstraction layer in his LLM Python library. This work reflects his focus on provider interoperability, streaming behavior, and implementation-level API differences.
  • 2026-04-04: Simon wrote about Vulnerability Research Is Cooked, amplifying Thomas Ptacek's argument that frontier LLMs and coding agents will radically accelerate vulnerability research through bug-class pattern matching and exploitability analysis.
  • 2026-04-04: In broader discussion of AI coding, Simon's views were cited on the inflection point for coding agents and on the emerging dark factory pattern, where teams increasingly rely on agents to generate and manipulate code that humans may barely inspect directly.
  • 2026-04-03: Simon detailed agentic engineering patterns, describing how tools like Claude Code and GPT-5.4 can be used for red/green TDD, thin templates, and reusable public GitHub project scaffolds to improve engineering output and reliability.

Relevance to AI PMs

1. Use his work to evaluate real product behavior, not vendor marketing. Simon frequently documents edge cases, API inconsistencies, capability gaps, and UX limitations. AI PMs can use these observations to pressure-test roadmap assumptions before committing to a model, agent pattern, or platform dependency.

2. Apply his security framing when designing agents. The "lethal trifecta" is a useful tactical checklist for reviewing agentic products. If your feature combines sensitive data access, untrusted inputs, and outbound actions, you should redesign permissions, isolation, or tool access before launch.

3. Learn from his implementation-first approach to tooling and interoperability. His work on LLM APIs, local inference, and agentic coding patterns helps PMs make better decisions about abstraction layers, logging, observability, test workflows, and whether to build for one provider or many.

Related

  • OpenClaw: Simon is noted here for a Docker-based walkthrough for running OpenClaw locally, connecting him to practical self-hosted AI tooling.
  • Lenny Rachitsky: Frequently amplifies Simon's ideas, especially around the lethal trifecta and AI coding inflection points.
  • AI agents / coding agents / agentic engineering: Central themes in Simon's recent writing, especially around safe deployment and developer workflow transformation.
  • research-llm-apis / LLM Python library: Simon's provider-level API research connects directly to product architecture decisions around model abstraction and integration.
  • Thomas Ptacek / vulnerability-research: Simon's commentary links frontier models and coding agents to changing security research dynamics.
  • Google AI Edge Gallery / Gemma 4 / Gemma 3 / Google / Google DeepMind: His hands-on review of local model experiences highlights trade-offs in on-device AI products.
  • Claude Code / OpenAI / Codex / Claude / Gemini / Anthropic: Simon's work often compares the practical strengths and weaknesses of major model and coding-agent ecosystems.
  • Prompt injection / dark factory / redgreen-tdd / automated-tests / manual-testing: These are recurring concepts in Simon's analysis of how AI changes software development, safety, and quality assurance.

Newsletter Mentions (44)

2026-04-11
Simon observes that OpenAI's voice mode appears to run on an older, weaker model, leading to surprising differences in capability depending on access point.

#11 📝 Simon Willison Voice mode is weaker - Simon observes that OpenAI's voice mode appears to run on an older, weaker model, leading to surprising differences in capability depending on access point. This reflection was inspired by an Andrej Karpathy tweet about how different domains and reward functions drive divergent model improvements.

2026-04-10
Lenny Rachitsky spotlights Simon Willison’s “lethal trifecta”: AI agents with private data access, untrusted content intake, and exfiltration capability pose a massive security risk that only dropping one of these legs can solve.

Lenny Rachitsky spotlights Simon Willison’s “lethal trifecta”: AI agents with private data access, untrusted content intake, and exfiltration capability pose a massive security risk that only dropping one of these legs can solve. #11 𝕏 Cognition warns that 92% of COBOL developers will retire in the next four years and 68% of enterprise COBOL modernization projects are failing, and outlines how software agents can streamline and accelerate COBOL modernization at Fortune 500 companies.

2026-04-10
Lenny Rachitsky spotlights Simon Willison’s “lethal trifecta”: AI agents with private data access, untrusted content intake, and exfiltration capability pose a massive security risk that only dropping one of these legs can solve.

#10 𝕏 Lenny Rachitsky spotlights Simon Willison’s “lethal trifecta”: AI agents with private data access, untrusted content intake, and exfiltration capability pose a massive security risk that only dropping one of these legs can solve.

2026-04-10
Lenny Rachitsky spotlights Simon Willison’s “lethal trifecta”: AI agents with private data access, untrusted content intake, and exfiltration capability pose a massive security risk that only dropping one of these legs can solve.

#10 𝕏 Lenny Rachitsky spotlights Simon Willison’s “lethal trifecta”: AI agents with private data access, untrusted content intake, and exfiltration capability pose a massive security risk that only dropping one of these legs can solve.

2026-04-07
#2 📝 Simon Willison Google AI Edge Gallery - Google's official iPhone app for running Gemma 4 models (E2B, E4B and some Gemma 3 family) works very well locally, with the E2B model a 2.54GB download; it supports image Q&A, short audio transcription and an interactive "skills" demo but conversations are ephemeral and the app lacks permanent logs.

#2 📝 Simon Willison Google AI Edge Gallery - Google's official iPhone app for running Gemma 4 models (E2B, E4B and some Gemma 3 family) works very well locally, with the E2B model a 2.54GB download; it supports image Q&A, short audio transcription and an interactive "skills" demo but conversations are ephemeral and the app lacks permanent logs.

2026-04-06
Google AI Edge Gallery - Google's official app for running Gemma 4 models on iPhone provides fast, useful local inference (notably the E2B model) plus image question answering, short audio transcription, and an interesting 'skills' demo showing tool-calling via HTML widgets.

Google Launches AI Edge Gallery App for iPhone #1 📝 Simon Willison Google AI Edge Gallery - Google's official app for running Gemma 4 models on iPhone provides fast, useful local inference (notably the E2B model) plus image question answering, short audio transcription, and an interesting 'skills' demo showing tool-calling via HTML widgets. The app works well but conversations are ephemeral and it lacks permanent logs.

2026-04-05
Simon Willison research-llm-apis 2026-04-04 - New repository capturing research into various LLM providers' HTTP APIs to inform a major change to the LLM Python library's abstraction layer, including scripts and captured outputs for streaming and non-streaming modes.

#9 📝 Simon Willison research-llm-apis 2026-04-04 - New repository capturing research into various LLM providers' HTTP APIs to inform a major change to the LLM Python library's abstraction layer, including scripts and captured outputs for streaming and non-streaming modes.

2026-04-05
#9 📝 Simon Willison research-llm-apis 2026-04-04 - New repository capturing research into various LLM providers' HTTP APIs to inform a major change to the LLM Python library's abstraction layer, including scripts and captured outputs for streaming and non-streaming modes.

#8 𝕏 Andrej Karpathy outlines an AI-driven platform that ingests budgets, legislation, and lobbying data to deliver real-time government transparency and accountability. #9 📝 Simon Willison research-llm-apis 2026-04-04 - New repository capturing research into various LLM providers' HTTP APIs to inform a major change to the LLM Python library's abstraction layer, including scripts and captured outputs for streaming and non-streaming modes. #10 𝕏 Qwen’s Qwen3.6-Plus hit #1 on OpenRouter and became the first model there to process over 1 trillion tokens in a single day, a milestone driven by its developer community.

2026-04-04
#7 📝 Simon Willison Vulnerability Research Is Cooked - Thomas Ptacek argues that frontier LLMs and coding agents will rapidly and dramatically change vulnerability research, automating exploit discovery by pattern matching known bug classes and searching for reachability/exploitability.

GenAI PM Daily April 04, 2026 GenAI PM Daily 🎧 Listen to this brief 3 min listen Today's top 17 insights for PM Builders, ranked by relevance from X, Blogs, and LinkedIn. Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. #7 📝 Simon Willison Vulnerability Research Is Cooked - Thomas Ptacek argues that frontier LLMs and coding agents will rapidly and dramatically change vulnerability research, automating exploit discovery by pattern matching known bug classes and searching for reachability/exploitability. #8 𝕏 Lenny Rachitsky shares Simon Willison’s insight that November 2025 was the inflection point for AI coding, unleashing autonomous coding agents benchmarked by Pelican Benchmark and driving red/green TDD with “thin templates. #15 𝕏 Lenny Rachitsky highlights @simonw’s “dark factory” pattern as the next leap in AI software engineering—teams no longer write or even look at their own code.

2026-04-03
Simon Willison details agentic engineering patterns—using coding agents like Claude Code and GPT-5.4 for red/green TDD, thin project templates, and public GitHub hoarding—to boost software productivity and reliability.

▶️ Why AI came for coders first, automation timelines, and how we’re inside the AI inflection Lennys Podcast Simon Willison details agentic engineering patterns—using coding agents like Claude Code and GPT-5.4 for red/green TDD, thin project templates, and public GitHub hoarding—to boost software productivity and reliability. GPT-5.1 and Claude Opus 4.5 released in November 2025 advanced coding agents from “mostly working” to “almost always following instructions,” enabling engineers to churn out up to 10,000 lines of code per day. Invoking the prompt “red/green TDD” directs agents to write tests first, run them to confirm failure, implement the code, then rerun tests to confirm success. Willison’s GitHub repositories include simonw/tools with 193 HTML/JavaScript client-side utilities and simonw/ressearch with 75 AI-driven research projects to hoard reusable code experiments.

Related

Claude Codetool

Anthropic's coding-focused agentic tool for building and automating software workflows. In this newsletter it is discussed as being integrated with Vercel AI Gateway and as a Chrome extension for browser automation.

Anthropiccompany

Anthropic is mentioned as a comparison point in the AI chess game and as the focus of a successful enterprise coding strategy. For PMs, it is framed as a company benefiting from sharp product focus.

OpenAIcompany

AI research and product company behind GPT models, including GPT-5.2 as referenced here. Relevant to AI PMs as a benchmark-setting model company.

Claudetool

Anthropic's general-purpose AI assistant and model family. It appears here as a comparison point for strategy work and in discussions around browser automation and coding.

Lenny Rachitskyperson

The author and host cited for reporting on AI agents replacing most SDR work. Relevant to AI PMs for go-to-market automation and sales workflow shifts.

Philipp Schmidperson

AI engineer and educator known for sharing practical model and agent-building insights. Here he predicts that 2026 will be the year of Agent Harnesses.

OpenClawtool

An open-source digital assistant built on Claude Code that can manage emails, transcribe audio, negotiate purchases, and automate tasks via skills and hooks.

Andrej Karpathyperson

AI researcher and commentator frequently cited on autonomous driving and frontier model progress. In this newsletter, he is credited with showcasing a 100% autonomous Tesla FSD drive.

Codextool

An AI agent framework mentioned alongside Claude Code and OpenCode in a browser automation workflow. It is relevant to AI PMs as part of the growing ecosystem of code agents and orchestration tools.

Logan Kilpatrickperson

A Google AI product leader mentioned announcing a billing rollout for Gemini API and AI Studio. Relevant to AI PMs for platform updates and developer experience changes.

Google DeepMindcompany

Google DeepMind is presenting the Interactions API beta, positioned as a unified interface for Gemini models and agents. For AI PMs, it signals continued investment in agent infrastructure and product surfaces for 2026.

Sebastian Raschkaperson

An AI researcher mentioned for sharing transformer residual connection improvements. Relevant to AI PMs because model architecture advances affect capability and training stability.

Googlecompany

Technology company behind Gemini and related AI initiatives. Mentioned here through Jeff Dean's comments on personalized learning.

Geminitool

Google's AI model family referenced as a tool for personalized education. Useful to AI PMs as an example of applied model use in learning products.

Qwentool

Qwen is showcasing Qwen-Image-2512 and its fast high-resolution image generation. In AI PM terms, it signals model-product speed and quality improvements in multimodal experiences.

Hugging Facecompany

Open-source AI platform for models, datasets, and demos. The newsletter references it as the place where three models trended.

Demis Hassabisperson

CEO and cofounder associated with Google DeepMind and AI research. Here he is referenced teasing a robotics collaboration involving Gemini Robotics.

Jeff Deanperson

Google leader and AI researcher cited for discussing personalized learning with AI models. Relevant to education product use cases and model applications.

Anthropic Labscompany

Anthropic Labs is mentioned as the organization where Henry Shi works with the founders. It appears as part of the credibility framing for the sponsored AI PM certification.

Sundar Pichaiperson

CEO of Google, cited here for announcing the Universal Commerce Protocol and sharing updates on Walmart and Wing drone delivery expansion. Relevant to AI PMs as a public signal of platform strategy and ecosystem orchestration.

Opus 4.6tool

Anthropic’s latest Opus-class model release with a 1 million-token context window. It is positioned for long-context planning, coding, and agentic task execution.

agentic codingconcept

A software-building pattern where AI agents generate, modify, and ship code with increasing autonomy. For PMs, it changes the economics of product development and accelerates prototyping.

AI agentsconcept

Autonomous or semi-autonomous systems used here in sales and coding workflows. The newsletter highlights their role in replacing human SDR tasks and orchestrating complex tasks.

GPT 5.4tool

A newer OpenAI model release with improved natural dialogue, longer context, and stronger tool use. It is discussed as a model now available in Cursor and chatprd.

Applecompany

Consumer technology company that builds iPhone, Mac, and Apple Intelligence features. In this newsletter it is referenced as partnering with Google for future Apple Intelligence capabilities.

OpenRoutertool

A model-routing platform used to call multiple LLMs through a common interface. Here it is used to run four models in parallel for comparison and generation tasks.

OpenAI Codextool

OpenAI's code-focused assistant used for debugging and diagnosing AI-generated builds.

Sonnet-4.6tool

A Claude model version referenced for more intelligent outputs with higher token usage. It is discussed alongside Opus 4.6 and effort settings for economical runs.

coding agentsconcept

AI agents that help write, analyze, and operate on codebases. The newsletter frames them as useful for documentation, maintainability, and terminal-based workflows.

Qwen3.5tool

A Qwen model release with day-0 support for multimodal integration. The newsletter highlights its immediate compatibility with MLX-VLM for visual-language workflows.

Gemini 3.1 Flash-Litetool

A streamlined, high-speed multimodal model optimized for low-latency text and vision tasks. AI PMs would care about its performance-cost tradeoffs, on-device suitability, and throughput gains.

WebMCPtool

A W3C-backed browser extension that exposes website functionality to MCP-capable agents. It lets developers register site functions as structured tools in the browser.

Google AI Edge Gallerytool

Google AI Edge Gallery is a Google tool for showcasing and running on-device AI experiences at the edge, including offline use cases.

lethal trifectaconcept

A security risk pattern where AI agents have private data access, ingest untrusted content, and can exfiltrate data. For AI PMs, it is a key framework for designing safe agent features.

LLMconcept

Large language models used in production systems, benchmarking, and agentic workflows. The newsletter emphasizes their failure modes, evaluation, and infrastructure sensitivity.

red/green TDDconcept

A test-driven development pattern adapted for coding agents. It emphasizes an iterative failure/success loop that can make agentic coding more reliable.

Qwen3.5-Plustool

A proprietary hosted Qwen3.5 model option mentioned alongside the open-weight release. For PMs, it represents the managed deployment path versus self-hosting.

agentic engineeringconcept

The practice of building software systems where agents plan and execute tasks with autonomy. The newsletter uses it in the context of anti-patterns and agent behavior management.

Agentic Engineering Patternsconcept

A collection of patterns for building and operating agentic systems. The newsletter highlights it as a reference hub for practical coding-agent workflows like red/green TDD.

Gemma 3tool

A model family from Google used as the base for TranslateGemma. It matters to PMs as an example of reusing a foundation model for a specialized, deployable product.

Qwen3.5-397B-A17Btool

An open-weight multimodal model in Alibaba's Qwen3.5 series, aimed at agentic and vision-capable use cases. It is relevant to PMs evaluating model capabilities, openness, and deployment options.

cognitive debtconcept

A product and engineering concept describing the hidden cost of AI-accelerated development when teams lose shared understanding of the system. It reframes debt from code maintenance to team cognition and system comprehension.

Soohoon Choiperson

A quoted individual in a commentary about code quality incentives in AI systems. The newsletter uses him as the source of a viewpoint on maintainable code.

LLM Python librarytool

A Python library for working with LLM providers through an abstraction layer. The newsletter notes that API research is informing a major change to its provider abstraction.

prompt injectionconcept

Attack technique where malicious prompts manipulate AI systems or agents. Here it is connected to a GitHub issue triage workflow exploit.

research-llm-apistool

A repository for researching LLM providers' HTTP APIs. It supports abstraction-layer decisions for developers building against multiple model providers.

GPT-5.3-Codex-Sparktool

A Codex-powered model release from OpenAI aimed at developers and product teams. The newsletter emphasizes its availability as a research preview and its high token throughput.

Stay updated on Simon Willison

Get curated AI PM insights delivered daily — covering this and 1,000+ other sources.

Subscribe Free