Andrej Karpathy
An AI researcher and founder known for practical prompting advice. Here he recommends ending prompts with HTML or slideshow formatting to get richer rendered outputs.
Key Highlights
- Karpathy is cited here as a practical voice on prompting, agent workflows, and the product implications of modern LLMs.
- He recommends asking LLMs to output HTML or slideshow-style responses to create richer, browser-rendered experiences.
- His menugen experiments illustrate both the promise of code-free app creation and the operational pain of deploying agent systems.
- He framed OpenClaw’s breakout as an interface-led adoption moment that exposed non-technical users to advanced agents.
- His comments on Read endpoints emphasize that pricing, documentation, and developer experience can determine whether AI features get adopted.
Andrej Karpathy
Overview
Andrej Karpathy is an AI researcher, engineer, educator, and founder whose commentary often shapes how practitioners think about large language models, agentic systems, developer tooling, and product design. In this corpus, he appears less as a purely academic figure and more as a practical guide to how modern AI systems are actually being used: prompting models to emit richer HTML outputs, building code-free apps with LLMs, orchestrating agent workflows, and identifying where infrastructure, pricing, and documentation still break down.For AI Product Managers, Karpathy matters because he consistently surfaces patterns that sit at the intersection of model capability and product usability. His observations connect frontier model behavior to concrete product questions: how to improve outputs with simple prompting conventions, how non-technical users adopt agentic AI, what operational bottlenecks emerge when LLMs build software, and how pricing or API ergonomics can block experimentation. That makes him a useful reference point for PMs trying to translate AI progress into product strategy, workflow design, and user experience decisions.
Key Developments
- 2026-03-24 — Karpathy’s “auto research loop” was referenced in an automated red-team workflow using Claude in Cloud Code, refined by Codex 5.4. The loop used repeated experiments, scoring, and git-based iteration to test a vibecoded paywalled site, showing how his agent-loop ideas were being adapted into practical security workflows.
- 2026-03-27 — He was noted as having built menugen about a year earlier to orchestrate LLM agents for app development, but ran into classic DevOps issues such as reproducible environments, secrets management, and monitoring. This highlighted the gap between impressive agent demos and production-ready systems.
- 2026-04-05 — Karpathy praised Farzapedia as a personal Wikipedia built on LLMs, emphasizing explicit, inspectable memory and a file-over-app design philosophy. He also highlighted BYOAI-style personalization associated with FarzaTV, pointing to a more user-controlled model of AI products.
- 2026-04-06 — He commented that new Read endpoints looked promising, but criticized their cost after reportedly spending $200 in 30 minutes of experimentation. He also flagged fragmented documentation and the absence of XMCP discussion, underscoring the importance of pricing clarity and developer experience.
- 2026-04-10 — Karpathy argued that OpenClaw had a breakout moment because it gave many non-technical users their first hands-on experience with advanced agentic AI, rather than limiting them to the standard ChatGPT-style interface. This was a notable framing of adoption: the interface shift mattered as much as the model.
- 2026-04-11 — A reflection from Simon Willison was explicitly inspired by a Karpathy post about how different domains and reward functions drive divergent model improvements. The mention reinforced Karpathy’s role as a commentator on why model capability can vary substantially by modality, product surface, or optimization target.
- 2026-05-01 — At Sequoia Ascent 2026, Karpathy showed that LLMs could build entirely code-free apps, including menugen for image-to-image tasks, and could even replace bash-script installation flows with natural-language setup. This showcased a product future in which the model increasingly becomes both builder and interface.
- 2026-05-12 — He recommended ending prompts with instructions like “structure your response as HTML” or even as a slideshow, so outputs can be rendered richly in a browser. This practical prompting advice stood out as a simple way to increase the usefulness and presentation quality of LLM responses.
Relevance to AI PMs
1. Prompt and output design can materially improve user experience. Karpathy’s HTML/slideshow prompting advice is a reminder that PMs should not only optimize for model correctness, but also for output format. Structured rendering can make responses more legible, interactive, and product-ready without retraining the model.2. Agent demos are easy; operational reliability is hard. His experience with menugen and agent orchestration highlights the real work needed after a compelling prototype: environment reproducibility, secret handling, monitoring, and evaluation loops. AI PMs should plan for this infrastructure early if they want agent features to survive beyond demos.
3. Adoption often depends on interface breakthroughs, not just better models. His OpenClaw observation suggests that new AI products can break out when they let new user segments directly experience advanced capabilities. PMs should watch for moments when a better interface, workflow, or abstraction unlocks demand from non-technical users.
Related
- OpenClaw / agentic-ai — Karpathy connected OpenClaw’s momentum to broader adoption of agentic AI by non-technical users.
- menugen — A recurring example tied to his experiments with LLM-orchestrated app creation and code-free software workflows.
- Farzapedia / FarzaTV / BYOAI — Connected through his praise for explicit memory, personalization, and file-centric AI product design.
- Read endpoints / XMCP — Mentioned in his critique of pricing, documentation quality, and platform ergonomics.
- Simon Willison — Referenced as someone whose reflection was inspired by Karpathy’s thinking on domain-specific model improvement.
- Claude / Codex / Cloud Code / autoresearch — Related through adaptations of his auto research loop ideas into iterative agent workflows.
- HTML / llm-prompts — Directly tied to his practical advice on formatting LLM outputs for richer rendering.
- OpenAI, Google, Tesla, nanoGPT, micrograd, GPT-2, TinyStories, mechanistic interpretability — Broader entities commonly associated with Karpathy’s work and influence across AI research, education, and applied systems thinking.
Newsletter Mentions (31)
“Andrej Karpathy recommends ending your LLM prompts with “structure your response as HTML” (or even as slideshow) so you can view rich, browser-rendered outputs.”
#13 𝕏 Andrej Karpathy recommends ending your LLM prompts with “structure your response as HTML” (or even as slideshow) so you can view rich, browser-rendered outputs. #14 𝕏 clem 🤗 found that open-weight AI on an unchanged 128 GB MacBook Pro soared from a score of 10 (Llama 3 70B) to 47 (DeepSeek V4 Flash on mixed-Q2 GGUF) in 24 months—4.7× better, doubling every 10.7 months.
“Andrej Karpathy showed at Sequoia Ascent 2026 that LLMs can build entirely code-free apps like menugen for image-to-image tasks, replace bash scripts with natural-language install.”
#12 𝕏 Andrej Karpathy showed at Sequoia Ascent 2026 that LLMs can build entirely code-free apps like menugen for image-to-image tasks, replace bash scripts with natural-language install.
“This reflection was inspired by an Andrej Karpathy tweet about how different domains and reward functions drive divergent model improvements.”
#11 📝 Simon Willison Voice mode is weaker - Simon observes that OpenAI's voice mode appears to run on an older, weaker model, leading to surprising differences in capability depending on access point. This reflection was inspired by an Andrej Karpathy tweet about how different domains and reward functions drive divergent model improvements.
“#23 𝕏 Andrej Karpathy suggests OpenClaw’s breakout moment came because it was the first time many non-technical users—who until then equated AI with the ChatGPT website—actually got hands-on with advanced agentic models.”
#23 𝕏 Andrej Karpathy suggests OpenClaw’s breakout moment came because it was the first time many non-technical users—who until then equated AI with the ChatGPT website—actually got hands-on with advanced agentic models.
“Andrej Karpathy suggests OpenClaw’s breakout moment came because it was the first time many non-technical users—who until then equated AI with the ChatGPT website—actually got hands-on with advanced agentic models.”
#23 𝕏 Andrej Karpathy suggests OpenClaw’s breakout moment came because it was the first time many non-technical users—who until then equated AI with the ChatGPT website—actually got hands-on with advanced agentic models.
“Andrej Karpathy thinks the new Read endpoints are promising but warns that 30 minutes of hacking around cost him $200 due to steep pricing.”
#9 𝕏 Andrej Karpathy thinks the new Read endpoints are promising but warns that 30 minutes of hacking around cost him $200 due to steep pricing. He also criticizes the scattered short‐page docs and the lack of any mention of XMCP.
“#6 𝕏 Andrej Karpathy praises Farzapedia as a personal Wikipedia built on LLMs with explicit, inspectable memory and file-over-app integration.”
#6 𝕏 Andrej Karpathy praises Farzapedia as a personal Wikipedia built on LLMs with explicit, inspectable memory and file-over-app integration. He highlights its BYOAI personalization features showcased by @FarzaTV. #7 𝕏 Benoit Berthoux points to a16z spend data—HubSpot’s biggest YoY median increase and Figma’s 25% lift among top buyers—to show AI is stratifying SaaS, not killing it.
“Andrej Karpathy praises Farzapedia as a personal Wikipedia built on LLMs with explicit, inspectable memory and file-over-app integration.”
#6 𝕏 Andrej Karpathy praises Farzapedia as a personal Wikipedia built on LLMs with explicit, inspectable memory and file-over-app integration. He highlights its BYOAI personalization features showcased by @FarzaTV.
“Andrej Karpathy built menugen about a year ago to orchestrate LLM agents for app development, only to hit classic DevOps pain points around reproducible environments, secrets management, and monitoring.”
#15 𝕏 Andrej Karpathy built menugen about a year ago to orchestrate LLM agents for app development, only to hit classic DevOps pain points around reproducible environments, secrets management, and monitoring.
“Runs 16 automated 5-minute red-team attack attempts via Carpathy’s auto research loop using Claude in Cloud Code, refined by Codex 5.4, to test and confirm security of a Vibe coded paywalled site.”
#13 ▶️ Autoresearch Claude Code Hacker - Can It Breach My Vibecoded Site? All About AI Runs 16 automated 5-minute red-team attack attempts via Carpathy’s auto research loop using Claude in Cloud Code, refined by Codex 5.4, to test and confirm security of a Vibe coded paywalled site. Implements Carpathy’s auto research loop with attack.sh and evaluate.sh scripts in Cloud Code, updating program.md each iteration, running 16 experiments (5-minute max each) scored 0–100, and using Git commits to keep higher‐scoring attacks.
Related
A coding environment for Claude mentioned for its keyboard shortcut that opens a full-featured editor for prompt writing. It is highlighted as making long prompts far easier to manage.
A company mentioned as one of the embedding/re-ranking providers being replaced by ZeroEntropy at GBrain. It also appears in the earlier AI visibility context as a source behind ChatGPT.
Anthropic's AI assistant/model used here in multiple contexts: as the product being built next, as a system used to cluster feedback into synthetic evals, and as a tool that non-technical staff use.
Developer and writer known for his AI tooling commentary and the `llm` project. He is credited here with the 0.32a2 release note.
OpenAI’s coding agent/product that can run against local or remote development environments and surface live state for review and approval. For AI PMs, it’s a strong example of agentic coding workflows moving into mobile and enterprise contexts.
An agent referenced as benefiting from GBrain’s memory layers. It serves as an example of agent systems becoming more personalized and context-aware.
The company behind Gemini, referenced through a Gemini API quickstart guide. It is relevant for model access and developer onboarding.
A named individual cited for commentary on Cline and a Computer Use agent. He is presented as a source of hands-on evaluation of agentic coding tools.
Autonomous or semi-autonomous software systems that can act across tools and workflows. The newsletter frames agents as buyers, tool consumers, and the primary audience for protocols like MCP.
Anthropic’s latest Opus-class model release with a 1 million-token context window. It is positioned for long-context planning, coding, and agentic task execution.
A cloud-based coding environment used to build a personal AI assistant or ‘second brain.’ It is described as managing briefs, tracking initiatives, and suggesting actions.
The class of models discussed as having a blind spot with continuous, high-dimensional, noisy data. This concept is used to frame a limitation in current AI capabilities.
A training system or project demonstrated by Andrej Karpathy for low-cost LLM training. For AI PMs, it highlights aggressive cost compression in model development.
Simon Willison’s command-line LLM tool for interacting with models and APIs. This release adds support for OpenAI’s Responses endpoint and better reasoning-token handling.
An approach to AI systems where agents perform tasks autonomously with tools and browser interaction. The newsletter frames 2026 as a year focused less on novelty and more on trust in deployed agentic systems.
AI company that builds frontier models and enterprise AI products. In this newsletter it is associated with previewing Workflows, an orchestration layer for business processes.
Social platform referenced as a source of examples, discussion, and scraping/monetization concerns. In this newsletter it is part of the agent workflow stack and content source.
A small single-GPU repo for autonomous short training loops. It demonstrates an AI agent iterating on hyperparameters while humans only adjust the prompt.
A personal Wikipedia-style product built on LLMs with inspectable memory and file-over-app integration. It is framed as a personalized knowledge tool with BYOAI features.
A minimal GPT training codebase often used to study and teach transformer internals. Here it is discussed as being reduced to atomic operations for clarity.
Stay updated on Andrej Karpathy
Get curated AI PM insights delivered daily — covering this and 1,000+ other sources.
Subscribe Free