Langsmith
LangChain’s platform for observability, evaluation, and collaboration around AI agents. Here it is described as an org-wide platform that improves cross-functional workflows and feedback loops.
Key Highlights
- Langsmith is positioned as an org-wide platform for observability, evaluation, and collaboration around AI agents.
- Its core PM value is turning agent traces, experiments, and feedback into a continuous product improvement loop.
- Recent updates emphasized experiment comparison, debugging tools, tracing, and structured cross-functional evaluation workflows.
- Langsmith is increasingly framed as infrastructure for production-grade agent systems, not just a developer debugging tool.
Langsmith
Overview
Langsmith is LangChain’s platform for observability, evaluation, debugging, and collaboration for AI agents and LLM-powered workflows. Across the mentions here, it is framed not just as a developer tracing tool, but as an org-wide platform for building, measuring, and improving agent systems. Its core value is helping teams inspect what agents actually did, compare versions, collect feedback, and tighten iteration loops across engineering, product, design, and domain experts.For AI Product Managers, Langsmith matters because agent quality is rarely judged by a single output. PMs need visibility into multi-step behavior, tool usage, failure modes, and user feedback over time. Langsmith shows up in that workflow as the system for tracing real runs, evaluating changes, benchmarking experiments, and operationalizing continuous improvement. In practice, that makes it useful for moving from “demo works” to repeatable product quality.
Key Developments
- 2026-02-05: LangSmith launched a redesigned Experiment Comparison View for side-by-side benchmarking of agent and LLM pipelines.
- 2026-02-07: Harrison Chase described LangSmith as supporting dedicated evaluation workflows for engineers, UX designers, and domain experts to collaboratively assess AI agent performance.
- 2026-03-04: LangSmith introduced AI-agent debugging tools demonstrated with a LangChain deepagents example, showing tracing and tuning of tool calls such as Python REPL, vector databases, and memory components.
- 2026-04-01: Harrison Chase highlighted Langsmith as part of a continual agent improvement loop, centered on traces and iterative refinement from LangChain’s agent improvement guidance.
- 2026-04-08: LangSmith’s tracing and evaluation platform was highlighted as a way for teams to track, diagnose, and optimize agent behavior in real-world conditions. On the same date, LangSmith Fleet was also mentioned as integrating with Arcade.dev for access to 8,000+ tools and faster no-code agent building.
- 2026-04-11: Harrison Chase positioned LangSmith as the “Databricks of agent abstractions,” paired with the idea of agent harnesses as stable building blocks for production agent systems.
- 2026-05-06: Chase argued that observability alone is insufficient; LangSmith should also capture feedback data and even automated feedback generation to create a continuous agent improvement loop.
- 2026-05-10: LangSmith was described as an org-wide platform for building AI agents that speeds cross-functional collaboration and tightens feedback loops.
Relevance to AI PMs
- Turn vague agent quality issues into actionable product work. Langsmith helps PMs see step-by-step traces, tool calls, and failure patterns, making it easier to prioritize fixes based on where agents actually break in live workflows.
- Run structured evaluation before and after changes. With experiment comparison and evaluation workflows, PMs can define regression checks, compare prompt or system changes, and judge whether a release improved task success, latency, or reliability.
- Improve collaboration across functions. The repeated positioning of Langsmith as a shared platform for engineers, UX, and domain experts makes it useful for organizing review loops, collecting feedback, and aligning on what “good” agent behavior should look like.
Related
- LangChain: Langsmith is part of the broader LangChain ecosystem and is repeatedly connected to LangChain guides, examples, and agent workflows.
- Harrison Chase: Langsmith’s positioning and major product updates are strongly associated with Harrison Chase, who frequently explains its role in observability, evaluation, and improvement loops.
- Agent harnesses: Langsmith is described alongside agent harnesses as infrastructure for stable, repeatable agent abstractions in production.
- Arcade.dev: LangSmith Fleet was mentioned as integrating with Arcade.dev to expand tool access for agent builders.
- deepagents: A deepagents example was used to showcase Langsmith’s debugging and trace inspection capabilities.
- Agent and LLM pipelines: Langsmith’s experiment comparison view directly supports benchmarking across these pipelines.
- Agent observability: Observability is a core Langsmith use case, though mentions emphasize that feedback capture and evaluation must sit alongside tracing.
- AI agents: Langsmith is consistently framed as infrastructure for building, monitoring, and improving AI agent systems.
- Cross-functional collaboration: One of the strongest themes in the mentions is Langsmith’s role in connecting product, engineering, design, and subject-matter experts around shared feedback loops.
Newsletter Mentions (8)
“#7 𝕏 Harrison Chase calls LangSmith an org-wide platform for building AI agents that speeds up cross-functional collaboration and tightens feedback loops.”
GenAI PM Daily May 10, 2026 GenAI PM Daily 🎧 Listen to this brief 3 min listen Today's top 11 insights for PM Builders, ranked by relevance from X, Blogs, and LinkedIn. PromptLayer’s multi-step agent evaluation framework #1 𝕏 Jason Zhou launched `/goal` support in CodeX and Hermes agents for one-step autonomous coding, advising use of interview mode, clear stop conditions, and a goal-buddy to manage state and goal files. #2 📝 PromptLayer Blog What Is Agent Evaluation? A Practical Guide for AI Teams - Agent evaluation tests whether an AI agent reliably completes tasks across real inputs, edge cases, and new versions by scoring not just final outputs but multi-step behavior via black-box, trajectory, and component-level evaluations, using metrics like task completion rate, tool selection accuracy, unsupported-claim rate, latency/cost per step, and regression pass rate. PromptLayer offers tracing with span-level context, reusable datasets, batch evaluations, backtesting, regression testing, automated evaluation triggers on new prompt versions, and flexible pipelines including code execution, human input, conversation simulation, regex checks, and LLM assertions. #3 in Udi Menkes built his new product’s entire data flow in a single interactive HTML file—complete with diagrams, in-page navigation, and color-coded complexity—letting his team understand it in minutes instead of hours. #4 𝕏 Garry Tan suggests diagramming your AI agent codebases and architecture in plain ASCII, then relentlessly questioning each component to clarify design and accelerate product development. #5 𝕏 Boris Cherny says Claude Code’s switch to a native installer means npm-only stats undercount its real usage. On Thursday it hit its second-highest signup day ever with 15× growth since Jan 1—now you can ask Claude to debug your SQL. #6 𝕏 Boris Cherny is enhancing Claude Code’s UX for snappier performance and adding debug logs so users can self-serve hang diagnostics. #7 𝕏 Harrison Chase calls LangSmith an org-wide platform for building AI agents that speeds up cross-functional collaboration and tightens feedback loops. #8 𝕏 Santiago showcases a step-by-step guide for constructing Python-powered multi-agent systems from scratch, leveraging MCP and A2A patterns to incrementally add complexity and enable collaborative AI agents. #9 𝕏 Garry Tan spends $2K/mo on Openclaw AI tokens to turbocharge product development and startup insights. He’s “tokenmaxxing” now with a goal to make these capabilities affordable for everyone in 18 months. #10 𝕏 Harrison Chase argues that treating AI agents as systems to measure and iteratively improve isn’t just a technical challenge—it demands intentional human collaboration and team processes. #11 in Peter Yang warns that unedited AI-generated markdown can compound small errors over time—what starts as 5% “slop” quickly balloons into an overwhelming pile of confusing, unverified content. Found this valuable? Share it with another PM - they can subscribe at genaipm.com Unsubscribe • Switch to Weekly
“Harrison Chase argues that agent observability in LangSmith is only half the battle—you must embed feedback data collection (and even automated feedback generation) directly into your observability platform to power a continuous AI-agent improvement loop.”
#10 𝕏 Harrison Chase argues that agent observability in LangSmith is only half the battle—you must embed feedback data collection (and even automated feedback generation) directly into your observability platform to power a continuous AI-agent improvement loop.
“Harrison Chase likens agent harnesses to Spark and positions LangSmith as the Databricks of agent abstractions, quoting @bllchmbrs’ analogy of them as stable building blocks.”
#21 𝕏 Harrison Chase likens agent harnesses to Spark and positions LangSmith as the Databricks of agent abstractions, quoting @bllchmbrs’ analogy of them as stable building blocks.
“Harrison Chase unveils LangSmith’s tracing and evaluation platform—spotlighted on new SF & NYC billboards—to help teams track, diagnose, and optimize agent behavior in real-world conditions.”
#8 𝕏 Harrison Chase announced that LangSmith Fleet now integrates with Arcade.dev, offering enterprise-grade access to 8,000+ tools and enabling you to build no-code Claude Cowork/OpenClaw–style agents in minutes. #9 𝕏 Harrison Chase unveils LangSmith’s tracing and evaluation platform—spotlighted on new SF & NYC billboards—to help teams track, diagnose, and optimize agent behavior in real-world conditions.
“Harrison Chase explains how to power a continual agent improvement loop with Langsmith, using trace-centered iteration from LangChain’s “agent improvement loop” guide.”
𝕏 Harrison Chase explains how to power a continual agent improvement loop with Langsmith, using trace-centered iteration from LangChain’s “agent improvement loop” guide.
“Harrison Chase walked through LangSmith’s new AI-agent debugging tools using a Langchain deepagents example—showing how to trace and tweak tool calls (Python REPL, vector DB, memory) and introspect step-by-step reasoning.”
The newsletter notes new AI-agent debugging tools in LangSmith and ties them to a deepagents example.
“Harrison Chase built LangSmith with dedicated evaluation workflows for engineers, UX designers, and domain experts to collaboratively assess AI agent performance.”
#16 𝕏 Harrison Chase built LangSmith with dedicated evaluation workflows for engineers, UX designers, and domain experts to collaboratively assess AI agent performance.
“#7 𝕏 Harrison Chase launched a redesigned Experiment Comparison View in LangSmith to enable side-by-side benchmarking of agent and LLM pipelines.”
#7 𝕏 Harrison Chase launched a redesigned Experiment Comparison View in LangSmith to enable side-by-side benchmarking of agent and LLM pipelines.
Related
A founder or leader associated with LangSmith and AI agent development. He emphasizes platform use, collaboration, and process-oriented measurement of agents.
An LLM application framework mentioned in the context of autonomous web-browsing agents and integrations.
Autonomous or semi-autonomous software systems that can act across tools and workflows. The newsletter frames agents as buyers, tool consumers, and the primary audience for protocols like MCP.
An open-source agent framework associated with Harrison Chase. In the newsletter it is being optimized for open-source models as closed-model costs rise.
Stay updated on Langsmith
Get curated AI PM insights delivered daily — covering this and 1,000+ other sources.
Subscribe Free