Jason Zhou
An AI practitioner cited for observing model behavior around tool calls and context budgeting. The newsletter credits him with the Sonnet 4.5 insight.
Key Highlights
- Jason Zhou is cited for practical discoveries about agent workflows, coding automation, and model behavior in production-like settings.
- The newsletter credits him with the Sonnet 4.5 insight that the model appears to track token budget after tool calls.
- He surfaced implementation patterns around Codex and Hermes `/goal` loops for stateful autonomous coding.
- His work is especially relevant to AI PMs designing tool use, context management, and operational agent products.
- He also highlighted tactics for improving output quality and reducing cost, including better tool examples and token-compression workflows.
Jason Zhou
Overview
Jason Zhou is an AI practitioner and builder frequently cited for practical discoveries about agent behavior, coding workflows, tool use, and context management. Across newsletter mentions, he appears less as a theorist and more as an operator who ships experiments, uncovers hidden product behaviors, and shares implementation patterns for agentic development stacks such as Codex, Claude Code, and Hermes-style autonomous loops.For AI Product Managers, Zhou matters because his work sits at the intersection of model capability and product execution. His observations often translate directly into PM-relevant decisions: how to structure tool calls, how to manage context budgets, how to design stateful agent loops, and how to turn LLM systems into useful internal automation for engineering and operations. The newsletter specifically credits him with the Sonnet 4.5 insight that the model appears to track token budget after tool calls, a notable clue for designing more reliable long-context workflows.
Key Developments
- 2026-02-23: Highlighted Anthropic’s advanced tool-calling capabilities, emphasizing programmatic invocation, dynamic filtering, built-in search, and practical real-world usage patterns.
- 2026-02-23: Shared that giving LLMs concrete examples for complex tools with optional fields and dependencies can materially improve structured JSON output accuracy in Anthropic benchmarks.
- 2026-03-06: Argued that the December 2025 wave of progress in memory environments, verification loops, and atomic tooling made always-on, long-running autonomous agents viable.
- 2026-03-30: Highlighted RTK (Rust Token Killer), an open-source utility for reducing Claude Code token usage by stripping noise, collapsing repetition, and removing low-value text artifacts.
- 2026-04-01: Launched direct Codex integration inside Claude Code, enabling code review workflows through a simple plugin installation and setup flow.
- 2026-04-23: Built a Crewlet agent that detected referral farming by tracing temp-email signup spikes back to a single referral source and then automating remediation steps.
- 2026-04-24: Demonstrated an AI agent that could read a support ticket and autonomously submit a pull request in about 10 minutes, automating a customer crediting workflow.
- 2026-04-30: Revealed lesser-known Claude Code setup tricks, including discovery and use of the hidden `.claude/rules/` directory for configuration and workflow control.
- 2026-05-04: Unveiled Codex’s `/goal` command, describing a stateful loop that sets goals, tests progress, self-corrects, and repeats until completion or budget exhaustion.
- 2026-05-10: Extended `/goal` support into CodeX and Hermes agents for one-step autonomous coding, recommending interview mode, explicit stop conditions, and a goal-buddy pattern for managing state and goal files.
- 2026-05-24: Discovered that Sonnet 4.5 appears context-aware in a way that lets it automatically track remaining token budget after each tool call, implying that prompting with a token limit may be sufficient in some workflows.
Relevance to AI PMs
1. Designing more reliable agent workflows: Zhou’s work on `/goal`, stateful loops, and goal-buddy patterns gives PMs concrete ways to scope autonomous coding and task execution. This is useful when defining product requirements for agent UX, stopping rules, retry logic, and human handoff.2. Improving tool-use quality and structured outputs: His observations around Anthropic tool calling and example-driven tool usage are directly applicable when PMs are deciding how to design schemas, provide demonstrations, and evaluate JSON accuracy for production agents.
3. Managing context and cost in product design: His RTK mention and Sonnet 4.5 context-budget insight help PMs think tactically about context compression, token budgeting, and tool-call overhead. These are critical for making agent products both reliable and economically viable.
Related
- Codex / OpenAI Codex: Strongly connected through Zhou’s work on direct integration in Claude Code and the `/goal` command for iterative autonomous coding.
- Claude Code / Claude / Anthropic: Central to several of his discoveries, including setup tricks, token reduction tactics, tool-calling commentary, and context-budget observations around Sonnet 4.5.
- RTK: Linked through his highlighting of token optimization techniques for Claude Code workflows.
- Memory environments, verification loops, atomic tooling: Concepts he tied to the rise of persistent, long-running autonomous agents.
- Crewlet, support-ticket, multi-agent-system, Hermes agents, goal-buddy: Examples of his practical work building applied agent systems that connect model capabilities to operational automation.
- Tool-calling, autonomous-agents, tree-search, UX exploration: Broader themes adjacent to his contributions, especially for PMs evaluating agent architecture and execution patterns.
Newsletter Mentions (17)
“Jason Zhou discovered Sonnet 4.5 is context-aware, automatically tracking its token budget after each tool call—suggesting you could simply prompt the model with a token limit.”
#10 𝕏 Jason Zhou discovered Sonnet 4.5 is context-aware, automatically tracking its token budget after each tool call—suggesting you could simply prompt the model with a token limit.
“#1 𝕏 Jason Zhou launched `/goal` support in CodeX and Hermes agents for one-step autonomous coding, advising use of interview mode, clear stop conditions, and a goal-buddy to manage state and goal files.”
GenAI PM Daily May 10, 2026 GenAI PM Daily 🎧 Listen to this brief 3 min listen Today's top 11 insights for PM Builders, ranked by relevance from X, Blogs, and LinkedIn. PromptLayer’s multi-step agent evaluation framework #1 𝕏 Jason Zhou launched `/goal` support in CodeX and Hermes agents for one-step autonomous coding, advising use of interview mode, clear stop conditions, and a goal-buddy to manage state and goal files. #2 📝 PromptLayer Blog What Is Agent Evaluation? A Practical Guide for AI Teams - Agent evaluation tests whether an AI agent reliably completes tasks across real inputs, edge cases, and new versions by scoring not just final outputs but multi-step behavior via black-box, trajectory, and component-level evaluations, using metrics like task completion rate, tool selection accuracy, unsupported-claim rate, latency/cost per step, and regression pass rate. PromptLayer offers tracing with span-level context, reusable datasets, batch evaluations, backtesting, regression testing, automated evaluation triggers on new prompt versions, and flexible pipelines including code execution, human input, conversation simulation, regex checks, and LLM assertions. #3 in Udi Menkes built his new product’s entire data flow in a single interactive HTML file—complete with diagrams, in-page navigation, and color-coded complexity—letting his team understand it in minutes instead of hours. #4 𝕏 Garry Tan suggests diagramming your AI agent codebases and architecture in plain ASCII, then relentlessly questioning each component to clarify design and accelerate product development. #5 𝕏 Boris Cherny says Claude Code’s switch to a native installer means npm-only stats undercount its real usage. On Thursday it hit its second-highest signup day ever with 15× growth since Jan 1—now you can ask Claude to debug your SQL. #6 𝕏 Boris Cherny is enhancing Claude Code’s UX for snappier performance and adding debug logs so users can self-serve hang diagnostics. #7 𝕏 Harrison Chase calls LangSmith an org-wide platform for building AI agents that speeds up cross-functional collaboration and tightens feedback loops. #8 𝕏 Santiago showcases a step-by-step guide for constructing Python-powered multi-agent systems from scratch, leveraging MCP and A2A patterns to incrementally add complexity and enable collaborative AI agents. #9 𝕏 Garry Tan spends $2K/mo on Openclaw AI tokens to turbocharge product development and startup insights. He’s “tokenmaxxing” now with a goal to make these capabilities affordable for everyone in 18 months. #10 𝕏 Harrison Chase argues that treating AI agents as systems to measure and iteratively improve isn’t just a technical challenge—it demands intentional human collaboration and team processes. #11 in Peter Yang warns that unedited AI-generated markdown can compound small errors over time—what starts as 5% “slop” quickly balloons into an overwhelming pile of confusing, unverified content. Found this valuable? Share it with another PM - they can subscribe at genaipm.com Unsubscribe • Switch to Weekly
“OpenAI Codex unveils /goal stateful loop command #1 𝕏 Jason Zhou unveils Codex’s new /goal command, introducing a stateful Ralph-loop that iteratively sets goals, tests, self-corrects, and repeats until the mission is complete or the budget runs out.”
GenAI PM Daily May 04, 2026 GenAI PM Daily 🎧 Listen to this brief 3 min listen Today's top 12 insights for PM Builders, ranked by relevance from X, YouTube, and LinkedIn. OpenAI Codex unveils /goal stateful loop command #1 𝕏 Jason Zhou unveils Codex’s new /goal command, introducing a stateful Ralph-loop that iteratively sets goals, tests, self-corrects, and repeats until the mission is complete or the budget runs out. #2 ▶️ Everything You Need to Know About Context Engineering in 40 Minutes | Ravi Mehta Peter Yang Use 3-layer context engineering (functional spec, Figma wireframe, JSON data enriched via Claude and a custom Cloud Code MCP server) to generate a high-fidelity music genre detail page prototype in Reforge Build that can be instantly re-themed by swapping the data.json file.
“#18 𝕏 Jason Zhou reveals three lesser-known Claude Code setup tricks—beyond the usual `npm install`—including the discovery and use of the hidden `.claude/rules/` directory.”
#18 𝕏 Jason Zhou reveals three lesser-known Claude Code setup tricks—beyond the usual `npm install`—including the discovery and use of the hidden `.claude/rules/` directory. #19 𝕏 Harrison Chase predicts that by 2026 closed-model costs will be prohibitively high and he’s optimizing deepagents for peak performance on OSS models.
“Jason Zhou built an AI agent that reads a support ticket and autonomously submits a PR in just 10 minutes, instantly automating customer crediting.”
#10 𝕏 Jason Zhou built an AI agent that reads a support ticket and autonomously submits a PR in just 10 minutes, instantly automating customer crediting. #11 𝕏 claire vo 🖤 GPT-5.5 ran a 6-hour autonomous validation, migrated 2M+ records with just one unhandled exception, closed all security issues for a zero-issue pen test, and even reverse-engineered a Divoom MiniToo Bluetooth speaker’s image encoding.
“#7 𝕏 Jason Zhou built a Crewlet agent that detected referral farming by spotting a spike in temp email signups and tracing them to one referral link in the database.”
#7 𝕏 Jason Zhou built a Crewlet agent that detected referral farming by spotting a spike in temp email signups and tracing them to one referral link in the database. He then flagged the fake accounts and reset their credits via a SQL script.
“Jason Zhou launched direct Codex integration in Claude Code, enabling CC code reviews via four simple plugin commands—/plugin marketplace add openai/codex-plugin-cc, /plugin install codex@openai-codex, /reload-plugins, and /codex:setup.”
𝕏 Jason Zhou launched direct Codex integration in Claude Code, enabling CC code reviews via four simple plugin commands—/plugin marketplace add openai/codex-plugin-cc, /plugin install codex@openai-codex, /reload-plugins, and /codex:setup.
“#3 𝕏 Jason Zhou highlights RTK (Rust Token Killer), an open-source tool that cuts Claude Code tokens by up to 60% by stripping noise, merging repeated content, and removing blank lines and progress bars.”
#3 𝕏 Jason Zhou highlights RTK (Rust Token Killer), an open-source tool that cuts Claude Code tokens by up to 60% by stripping noise, merging repeated content, and removing blank lines and progress bars.
“Jason Zhou says December 2025’s LLM breakthrough—fueled by memory environments, verification loops and atomic tooling—enables always-on, long-running autonomous agents.”
GenAI PM Daily March 06, 2026 GenAI PM Daily 🎧 Listen to this brief 3 min listen Today's top 25 insights for PM Builders, ranked by relevance from Blogs, X LinkedIn, and YouTube. OpenAI Introduces GPT-5.4 Model #1 📝 OpenAI News Introducing GPT-5.4 - Announcement of GPT-5.4 as a new product release, highlighting improvements and new capabilities over prior models. The post introduces features and potential applications of GPT-5.4. Also covered by: @There's An AI For That , @Kevin Weil 🇺🇸 #16 𝕏 Jason Zhou says December 2025’s LLM breakthrough—fueled by memory environments, verification loops and atomic tooling—enables always-on, long-running autonomous agents.
“𝕏 Jason Zhou hails Anthropic’s advanced tool calling—featuring programmatic invocation, dynamic filtering, built-in search and real-world use examples—as underrated gold in his quick 3-minute breakdown.”
GenAI PM Daily February 23, 2026 | Today's top 12 insights for PM Builders, ranked by relevance from X, LinkedIn, Blogs, and YouTube. Anthropic Launches Advanced Tool Calling #1 𝕏 Jason Zhou hails Anthropic’s advanced tool calling—featuring programmatic invocation, dynamic filtering, built-in search and real-world use examples—as underrated gold in his quick 3-minute breakdown. #8 𝕏 Jason Zhou shows that providing LLMs with concrete tool-use examples for complex tools with many optional fields and dependencies boosts JSON output accuracy from 72% to 90% in Anthropic’s benchmarks.
Related
Anthropic's coding assistant used for programming and automation tasks. The newsletter references it for building a custom approval device and for writing and research workflows inside AI agents.
AI company behind Claude. The newsletter references Claude usage and later notes Anthropic may have reached product-market fit.
Anthropic's model family used for agent orchestration and developer workflows. In this newsletter it is highlighted as powering CodeRabbit's agent orchestration system.
OpenAI's coding agent/tool used here for self-improving tax workflows and long-running autonomous loops. It is presented as capable of iterative task execution with plugins and goal-based runs.
An AI software engineering agent used for cloud-based automation and code changes. In the newsletter it’s used for scheduled automations, tests, and reviewing/merging code.
A Gemini model variant used here to power agentic workflow examples and multi-agent systems. It is relevant to AI PMs as an example of frontier model capability enabling more complex automated workflows.
OpenAI's coding assistant referenced as a runtime for NVIDIA-Verified Agent Skills. It appears alongside Claude and Cursor.ai as an interoperable platform.
A W3C-backed browser extension that exposes website functionality to MCP-capable agents. It lets developers register site functions as structured tools in the browser.
An architecture where multiple specialized agents collaborate instead of one general-purpose agent. The newsletter includes debate over whether this is necessary versus using a single tool-loaded agent.
A company referenced for experimenting with Slack bot-based monitoring and collaboration. It is cited as an example of per-channel task outcome tracking in workplace AI workflows.
An OpenAI model variant discussed here for its ability to collaborate with HarmonicMath on near-autonomous proof generation. For AI PMs, it highlights stronger reasoning and math capabilities in advanced LLMs.
Stay updated on Jason Zhou
Get curated AI PM insights delivered daily — covering this and 1,000+ other sources.
Subscribe Free