Codex
OpenAI's coding agent/tool used here for self-improving tax workflows and long-running autonomous loops. It is presented as capable of iterative task execution with plugins and goal-based runs.
Key Highlights
- Codex is evolving from a coding assistant into a goal-driven agent that can execute multi-hour or multi-day workflows.
- OpenAI positioned Codex as enterprise-ready with sandboxing, approval gates, RBAC, auditable workspaces, and hybrid deployment options.
- A Tax AI pilot used a Codex-driven improvement loop to significantly improve tax return quality within six weeks.
- Codex now spans multiple surfaces including app, CLI, IDE extensions, and ChatGPT mobile with remote monitoring.
- For AI PMs, Codex is a strong reference point for designing closed-loop, autonomous product workflows with human oversight.
Codex
Overview
Codex is OpenAI’s coding agent and developer tool for planning, editing, testing, and iterating on software tasks with increasing autonomy. Across the mentions here, it appears not just as a code completion product, but as an agentic system that can run goal-based workflows for hours, use tools and plugins, operate in IDEs and the CLI, and support long-running execution loops. It is also described as powering self-improving operational systems, including a tax workflow where structured traces and practitioner feedback fed a Codex-driven improvement loop.For AI Product Managers, Codex matters because it signals a shift from “AI helps write code” to “AI executes bounded engineering and ops work over time.” The coverage highlights practical capabilities that matter in production: autonomous goal mode, enterprise governance controls, mobile remote monitoring, sandboxed execution, hybrid/on-prem deployment paths, and use cases that extend beyond coding into backlog cleanup, email triage, testing, and workflow automation. In short, Codex is becoming an execution layer for software and adjacent knowledge work, not just a developer assistant.
Key Developments
- 2026-05-12: OpenAI launched Daybreak, a frontier AI platform that combines its top models and Codex with security partners to accelerate cyber defense and continuous software protection.
- 2026-05-14: OpenAI detailed a custom Windows sandbox for Codex after finding existing approaches too restrictive or unsafe. The default sandbox allows broad reads, limits writes to the workspace, and blocks internet access unless explicitly enabled.
- 2026-05-15: OpenAI launched a preview of Codex in the ChatGPT mobile app on iOS and Android, enabling users to connect to laptops or managed remote environments and monitor live screenshots, terminal output, diffs, tests, and approvals via secure relay. OpenAI said more than 4 million people use Codex weekly.
- 2026-05-19: OpenAI and Dell Technologies announced a partnership to bring Codex to hybrid and on-prem enterprise environments, connecting it with Dell’s on-prem AI Data Platform and exploring integration with the Dell AI Factory.
- 2026-05-20: OpenAI said it became a C2PA Conforming Generator Product and would add provenance signals including C2PA-compatible Content Credentials and SynthID watermarking for images generated via ChatGPT, Codex, and the OpenAI API.
- 2026-05-22: Sam Altman announced the new Codex, positioning it as an AI-driven coding assistant designed to accelerate developer workflows.
- 2026-05-23: OpenAI launched Goal mode in the Codex app, IDE extension, and CLI, enabling hands-off, long-running, goal-driven coding sessions that can run for hours or days.
- 2026-05-23: OpenAI said Gartner named it a leader in enterprise coding agents, citing Codex usage, enterprise customers, stronger tool use, faster performance, governance features such as approval gates and RBAC, auditable workspaces, OS-level sandboxing, and deployment options including Amazon Bedrock.
- 2026-05-25: Ryan Carson described using Codex alongside OpenClaw and Devin to automate executive assistant work, sales prospecting, and software delivery, reportedly shipping more than 10 pull requests per day.
- 2026-05-26: Claire Vo shared that Codex planned and executed a core-app refactor, published HTML docs, ran iterative browser smoke tests, handled linting and regression fixes, and even cleaned thousands of emails.
- 2026-05-28: In a Tax AI pilot processing 7,000 returns, a Codex-driven improvement loop used practitioner feedback and structured production traces to improve field completion quality from about 25% of returns reaching at least 75% correct completion at launch to 86% within six weeks.
- 2026-05-28: A podcast demo showed Codex’s `/goal` command running autonomous loops for roughly six hours to reduce Sentry errors, clean inboxes, and organize Linear tasks.
Relevance to AI PMs
1. Useful for designing agentic product workflows, not just coding features. Codex is repeatedly shown executing multi-step goals across code, testing, documentation, bug fixing, inbox cleanup, and task management. AI PMs can use this pattern to scope products around outcomes and approval checkpoints rather than single prompts.2. Important benchmark for enterprise readiness. The mentions emphasize capabilities AI PMs often need to evaluate before rollout: sandboxing, RBAC, approval gates, auditable workspaces, hybrid/on-prem deployment, mobile oversight, and provenance features. Codex provides a concrete example of what “production-grade agent tooling” now includes.
3. Strong case study for self-improving systems. The Tax AI example is especially relevant to PMs building closed-loop AI products: collect structured traces, incorporate expert feedback, let the agent iterate on failures, and expand task complexity over time. That is a practical template for improving reliability in high-stakes workflows.
Related
- OpenAI: Codex is presented as OpenAI’s coding agent/tool and is tightly linked to OpenAI’s broader product stack and deployment strategy.
- ChatGPT / ChatGPT mobile / ChatGPT Enterprise / chatgpt-pro: Codex appears inside ChatGPT surfaces, including mobile, and overlaps with enterprise workspace and governance capabilities.
- Codex CLI / codex-cli / Codex app / IDE extension: These are the main interfaces through which users run Codex, including the newer goal-based autonomous workflows.
- GPT-5.3 Codex / gpt-53-codex / gpt-55 / gpt-54-thinking / gpt-54-pro: These related model names suggest Codex is part of a broader evolving model family and tooling stack.
- openai-codex / codeex: Likely alternate references or naming variants associated with Codex.
- Claude Code / Anthropic / Claude / Claude Opus / Claw Code / OpenClaw / Gemini CLI / Cursor / Devin / OpenCode: These are adjacent coding-agent or developer-tool competitors/comparators in the same agentic software workflow space.
- Sentry / Gmail / Linear: Examples of external systems Codex is described interacting with in longer-running autonomous workflows.
- Tax AI: A prominent case study showing Codex used in a self-improving operational loop for tax return drafting.
- Dell Technologies / Dell AI Factory / Amazon Bedrock: Important deployment and infrastructure partners that expand how enterprises can run Codex in governed environments.
- C2PA / SynthID / OpenAI API / OpenAI Compliance Platform / OpenTelemetry: Related ecosystem pieces around provenance, integration, observability, API usage, and enterprise governance.
Newsletter Mentions (51)
“At launch only ~25% of returns reached ≥75% correct field completion, rising to 86% within six weeks as the system used practitioner feedback, structured production traces, and a Codex-driven improvement loop to autonomously iterate and expand into more complex filings.”
#1 📝 OpenAI News Building self-improving tax agents with Codex - Tax AI processed 7,000 tax returns in a Crete pilot, saved practitioners about one-third of their time, drafts returns with up to 97% accuracy, and increased throughput by roughly 50%. At launch only ~25% of returns reached ≥75% correct field completion, rising to 86% within six weeks as the system used practitioner feedback, structured production traces, and a Codex-driven improvement loop to autonomously iterate and expand into more complex filings. #17 ▶️ I let Codex run for 6 hours. Here’s what happened. How I AI Podcast The episode demonstrates how to use OpenAI Codex’s /goal command to execute multi-hour autonomous loops—eliminating thousands of Sentry errors in ChatPRD, cleaning 3,900 emails to 68 unread, and organizing hundreds of Linear tasks.
“#21 𝕏 claire vo 🖤 After 20 years of coding, Codex took over: planning and executing a full core-app refactor (publishing HTML docs, running iterative loops with browser smoke tests, linting and regression fixes) and even cleaning out 4,000 emails, leaving me to just ask “OK, what’s...”
#21 𝕏 claire vo 🖤 After 20 years of coding, Codex took over: planning and executing a full core-app refactor (publishing HTML docs, running iterative loops with browser smoke tests, linting and regression fixes) and even cleaning out 4,000 emails, leaving me to just ask “OK, what’s... #22 𝕏 Logan Kilpatrick announced that Lyria 3 is now available via the API, enabling developers to build with it.
“Ryan Carson demonstrates how he leverages OpenClaw's ClawChief cron jobs and markdown skills together with Codex and cloud-based Devin to automate his executive assistant workflow, nightly sales prospecting via the Firecrawl API, and ship over 10 pull requests per day.”
▶️ How This 5x Founder Runs His Startup Solo With AI Agents (OpenClaw, Codex, Devin) | Ryan Carson Peter Yang Ryan Carson demonstrates how he leverages OpenClaw's ClawChief cron jobs and markdown skills together with Codex and cloud-based Devin to automate his executive assistant workflow, nightly sales prospecting via the Firecrawl API, and ship over 10 pull requests per day. The “executive assistant sweep” cron in OpenClaw’s ClawChief setup runs every 15 minutes to check Gmail via the Google CLI, sync Todoist tasks, parse and book Calendly links, ping updates in Slack threads, and proactively follow up on emails.
“OpenAI launched Goal mode in the Codex app, IDE extension, and CLI, enabling hands-off, long-running goal-driven coding that can run autonomously for hours or even days.”
#5 𝕏 OpenAI launched Goal mode in the Codex app, IDE extension, and CLI, enabling hands-off, long-running goal-driven coding that can run autonomously for hours or even days. #21 📝 OpenAI News OpenAI named a Leader in enterprise coding agents by Gartner - OpenAI was named a Leader in Gartner’s 2026 Magic Quadrant for Enterprise AI Coding Agents, citing Codex—used by more than 4 million people weekly and customers including Cisco, Datadog, Dell, and NVIDIA—and highlighting recent improvements such as GPT‑5.5, stronger tool use, faster performance, enterprise governance features (approval gates, RBAC, customizable policies, OS‑level sandboxing, auditable workspaces), and expanded deployment options including Codex on Amazon Bedrock and GSI partners like Accenture, Capgemini, Cognizant, Infosys, PwC, and TCS.
“Sam Altman ships the new Codex today, delivering an AI-driven coding assistant to help developers accelerate their workflows.”
#2 𝕏 Sam Altman ships the new Codex today, delivering an AI-driven coding assistant to help developers accelerate their workflows.
“OpenAI announced it has become a C2PA Conforming Generator Product and is strengthening provenance by adding C2PA-compatible Content Credentials, adopting Google DeepMind’s SynthID invisible watermarking for images generated via ChatGPT, Codex, and the OpenAI API, and previewing a public verification tool to detect those signals.”
#7 📝 OpenAI News Advancing content provenance for a safer, more transparent AI ecosystem - On May 19, 2026, OpenAI announced it has become a C2PA Conforming Generator Product and is strengthening provenance by adding C2PA-compatible Content Credentials, adopting Google DeepMind’s SynthID invisible watermarking for images generated via ChatGPT, Codex, and the OpenAI API, and previewing a public verification tool to detect those signals. The company says the multi-layered approach—combining metadata and watermarking—addresses metadata loss (noting prior visible watermarks in Sora and audio watermarking in Voice Engine), the verification tool will initially only cover OpenAI-generated content, and it will avoid definitive conclusions when no provenance signals are detected.
“OpenAI and Dell Technologies partner to bring Codex to hybrid and on-premises enterprise environments - Codex—used by more than 4 million developers weekly—will connect with Dell’s on‑premises AI Data Platform and is being explored for integration with the Dell AI Factory so enterprises can deploy Codex and Codex‑powered agents closer to internal codebases, documentation, business systems, and workflows in hybrid or on‑premises environments.”
OpenAI connects Codex with Dell on-prem AI Data Platform #1 📝 OpenAI News OpenAI and Dell Technologies partner to bring Codex to hybrid and on-premises enterprise environments - Codex—used by more than 4 million developers weekly—will connect with Dell’s on‑premises AI Data Platform and is being explored for integration with the Dell AI Factory so enterprises can deploy Codex and Codex‑powered agents closer to internal codebases, documentation, business systems, and workflows in hybrid or on‑premises environments. The collaboration aims to provide the controls and infrastructure to build, test, automate, analyze, and deploy AI applications (including ChatGPT Enterprise and API solutions) against governed enterprise data on Dell infrastructure.
“OpenAI launched a preview of Codex inside the ChatGPT mobile app on iOS and Android (rolling out across Free, Go, and paid plans in supported regions), letting users connect to their laptops or managed remote environments and see live state — screenshots, terminal output, diffs, test results and approvals — via a secure relay while keeping files, credentials, and permissions on the host machine; OpenAI reports more than 4 million people use Codex weekly.”
#1 📝 OpenAI News Work with Codex from anywhere - On May 14, 2026 OpenAI launched a preview of Codex inside the ChatGPT mobile app on iOS and Android (rolling out across Free, Go, and paid plans in supported regions), letting users connect to their laptops or managed remote environments and see live state — screenshots, terminal output, diffs, test results and approvals — via a secure relay while keeping files, credentials, and permissions on the host machine; OpenAI reports more than 4 million people use Codex weekly. Remote SSH and Hooks are generally available, programmatic access tokens are offered on Enterprise and Business plans, HIPAA-compliant local-environment use is supported for eligible ChatGPT Enterprise workspaces, and support for connecting phones to the Codex app on Windows is coming soon. Also covered by: @OpenAI
“OpenAI built a custom, unelevated sandbox for Codex on Windows after finding AppContainer too narrowly scoped, Windows Sandbox incompatible with acting on the user's real checkout (and unavailable on Home), and MIC unsafe because relabeling workspaces lowers their integrity; the prototype gives the sandbox a distinct identity via synthetic SIDs and uses write-restricted tokens to limit where Codex can modify files.”
OpenAI built a custom, unelevated sandbox for Codex on Windows after finding AppContainer too narrowly scoped, Windows Sandbox incompatible with acting on the user's real checkout (and unavailable on Home), and MIC unsafe because relabeling workspaces lowers their integrity; the prototype gives the sandbox a distinct identity via synthetic SIDs and uses write-restricted tokens to limit where Codex can modify files. The default Codex sandbox runs commands with reduced permissions—allowing reads broadly, writes only inside the workspace, and no internet access unless explicitly enabled. #4 𝕏 Aravind Srinivas : Perplexity is building a secure, scalable agent runtime sandbox that handles proxy API keys securely, runs safety detection on all agent-accessed content, encrypts connector data, and decouples storage from compute. #5 in Udi Menkes built a “second brain” in Cloud Code a month ago that now sends him daily briefs—managing his content pipeline, tracking initiatives, and suggesting actions—while he simply feeds it context, feedback, and approvals.
“OpenAI launched Daybreak, a frontier AI platform combining its top models and Codex with security partners to accelerate cyber defense and enable continuous software protection.”
#1 𝕏 OpenAI launched Daybreak, a frontier AI platform combining its top models and Codex with security partners to accelerate cyber defense and enable continuous software protection. #2 𝕏 Sam Altman launched Daybreak, OpenAI’s AI-powered program to accelerate cyber defense and provide continuous software security, and is inviting companies to partner now. #3 📝 OpenAI News OpenAI launches the OpenAI Deployment Company to help businesses build around intelligence - OpenAI is launching the OpenAI Deployment Company, a majority‑owned subsidiary backed by more than $4 billion of initial investment and a partnership of 19 firms led by TPG (with Advent, Bain Capital, and Brookfield as co‑leads), to embed Forward Deployed Engineers into organizations and help build production AI systems.
Related
Anthropic's coding assistant used for programming and automation tasks. The newsletter references it for building a custom approval device and for writing and research workflows inside AI agents.
AI company behind Claude. The newsletter references Claude usage and later notes Anthropic may have reached product-market fit.
AI company behind Codex and other products. The newsletter references its Codex-based tax agents and the OpenAI Foundation's initial commitment.
Anthropic's model family used for agent orchestration and developer workflows. In this newsletter it is highlighted as powering CodeRabbit's agent orchestration system.
An AI coding editor and automation platform. The newsletter highlights multi-repository support for automations across codebases.
A creator mentioned again as raising seed funding and choosing AI agents for onboarding and role learning. He is also the source credit on the Ryan Carson item.
A newsletter/podcast operator cited for summarizing Dan Shipper’s view on AI, work, and value creation. He connects the discussion to skill commoditization and recombination.
An AI agent workflow system used to automate founder and operator tasks with cron jobs, skills, and integrations. The newsletter cites it as part of a solo-founder operating stack alongside Codex and Devin.
A general-purpose AI chat product used here as an example of a platform that adds tools, memory, skills, and context on top of a model. The newsletter argues the harness matters more than the base model.
A practitioner who used Claude and Cursor to generate a design system from GitHub repos. Relevant to PMs for rapid product and design-system iteration.
Well-known AI researcher and builder, mentioned here as joining Anthropic to use Claude for research acceleration. Relevant to AI PMs as a signal of AI-powered research workflows and talent movement.
A company shipping verified agent skills and broader AI infrastructure/tools. The mention signals ecosystem support for cross-platform agent capabilities.
CEO of OpenAI and a prominent AI industry leader. Here he is quoted announcing the OpenAI Foundation's initial $250M commitment.
Henry Shi is a technical staff member at Anthropic Labs and co-runner of the AI Product Management Certification. He is described as a former co-founder of Super.com.
Rohan Varma is an AI product operator and instructor mentioned as a co-runner of the AI Product Management Certification. He is described as formerly the first PM at Cursor and now at Codex.
An AI practitioner cited for observing model behavior around tool calls and context budgeting. The newsletter credits him with the Sonnet 4.5 insight.
An AI software engineering agent used for cloud-based automation and code changes. In the newsletter it’s used for scheduled automations, tests, and reviewing/merging code.
A discovery or directory platform that is described here as launching LlamaParse.
OpenAI product leader/executive who publicly praised GPT-5.2 in the newsletter. Useful context for AI PMs tracking product and model reception.
A project and ticket management tool used here as the system of record for agent workflows. PMs can use it to route tasks to coding agents and track review states.
A frontier coding-capable model referenced in a benchmark comparison. The newsletter says it outperformed earlier coding models but still lagged behind human senior engineers in Every’s test.
A newer OpenAI model release with improved natural dialogue, longer context, and stronger tool use. It is discussed as a model now available in Cursor and chatprd.
OpenAI's coding assistant referenced as a runtime for NVIDIA-Verified Agent Skills. It appears alongside Claude and Cursor.ai as an interoperable platform.
OpenAI’s coding-focused model/release highlighted for benchmark performance, steerability, and speed improvements. The newsletter frames it as a strong coding agent option with multiple benchmark scores.
A coding agent mentioned as supporting context forking, where users can rewind or branch from prior turns.
A vibe-coding tool mentioned alongside Cloud Code in Notion’s prototyping workflow. It supports direct code-based iteration for AI feature exploration.
A founder and writer cited for doing writing, research, and email inside AI agents. The newsletter uses him as an example of agent-native knowledge work.
Google DeepMind’s watermarking technology for AI-generated and other digital content. It is positioned here as a cross-industry standard for content provenance.
Google’s command-line interface for working with Gemini in developer workflows. It is mentioned as a compatible tool alongside agent skills in antigravity.
Google's email product, referenced here as gaining Gemini-powered AI Inbox and Overviews features. For PMs, it is an example of AI being embedded into a mature productivity workflow.
A paid ChatGPT subscription tier with expanded model access and higher usage limits. For AI PMs, this is a packaging and monetization lever that affects power users and workflow depth.
A Python-derived clone created from leaked Claude Code TypeScript. It is described as a fast-growing GitHub repo.
Amazon Bedrock is AWS's managed platform for building and running generative AI applications and agents.
OpenAI leader and product/engineering voice associated here with confirming Codex’s unification with the main model. The newsletter cites him via Simon Willison’s note.
OpenTelemetry is an observability standard for traces, logs, and metrics. The newsletter mentions Codex exporting agent-aware telemetry through it for auditing and monitoring.
Anthropic’s Claude model used locally in Paperclip’s agent orchestration demo. It is used for task execution, company simulation, and coding workflows.
Stay updated on Codex
Get curated AI PM insights delivered daily — covering this and 1,000+ other sources.
Subscribe Free