Langfuse vs LangSmith for Production Observability
Choose Langfuse for self-hosted, framework-neutral traces. Choose LangSmith for managed LangChain evals, review, alerting, and deployment.

Choose Langfuse when you need framework-neutral tracing, self-hosting, and cost visibility across a mixed AI stack. Choose LangSmith when your production loop is already built around LangChain or LangGraph and you want tracing, evals, human review, alerting, and agent deployment in one managed platform.
The Verdict
Langfuse is the better default for teams that need observability to belong to their own stack. It is strongest when your agents, RAG flows, model calls, and product workflows span several frameworks, when trace data cannot live in a vendor cloud, or when you want pricing that scales around usage units and your own infrastructure choices. Its current public pricing starts with a free Hobby plan, then Core at $29/month, Pro at $199/month, and Enterprise at $2499/month, with 100k units included on paid cloud plans and additional usage at $8 per 100k units, lower with volume.
LangSmith is the better default when LangChain or LangGraph is already the application framework, and the team wants less platform assembly. It gives you tracing, online and offline evals, prompt tools, annotation queues, monitoring and alerting, and adjacent LangSmith products for deployment, Fleet, Engine, and Sandboxes. Its Developer plan is $0 per seat per month with 5k base traces per month, and Plus is $39 per seat per month with 10k base traces per month.
The production choice is not open source versus proprietary in the abstract. The choice is whether the observability layer should be a self-owned data plane with open instrumentation, or a managed agent engineering system tied tightly to the LangChain ecosystem.
The Axis That Actually Separates Them
The split is control versus lifecycle integration. Langfuse gives you more control over where telemetry lives and how the observability layer plugs into the rest of your platform. LangSmith gives you a more integrated managed loop around traces, evals, review, alerting, and deployment.
Langfuse can be deployed locally, in cloud infrastructure, within a VPC, or on-premises, with internet access optional. Its self-hosted architecture uses Postgres for transactional workloads, ClickHouse for traces, observations, and scores, Redis or Valkey for queues and cache, and S3 or blob storage for events, multimodal inputs, and large exports. That is real control, and it is also real infrastructure.
LangSmith is simpler when the application is already LangChain or LangGraph. Its docs say LangSmith tracing can be enabled for LangChain or LangGraph with a single environment variable, and the quickstart uses LANGSMITH_TRACING=true and LANGSMITH_API_KEY. For other providers, it supports wrappers for OpenAI, Anthropic, and Google Gemini, plus manual tracing with @traceable.
Langfuse vs LangSmith: Production Comparison


Langfuse: Best When Observability Has To Belong To Your Stack
Langfuse is the stronger choice when observability is a platform component you expect to operate, extend, and connect to internal systems. It captures prompts, model responses, token usage, latency, tool calls, retrieval steps, timing, inputs, outputs, and metadata. That is the minimum viable trace for production AI: enough context to explain why a request failed, which model path it used, what it cost, and which retrieval or tool step changed the answer.
Langfuse is also the clearer choice when data residency or customer contracts make vendor-hosted traces difficult. The core product is MIT-licensed outside the /ee folders, and Langfuse says all product features are freely available under the MIT license. Enterprise modules such as SCIM, extended audit logging, and data retention policies require a commercial license when self-hosted. That boundary matters: a team can run the core system without treating every production trace as a SaaS procurement decision, while still paying for enterprise controls when those controls are needed.
Self-hosting is not a shortcut. The Langfuse production stack includes Postgres, ClickHouse, Redis or Valkey, and S3 or blob storage. Docker Compose is useful for testing and low-scale deployments, but production-scale self-hosting means Kubernetes Helm, Terraform on AWS, Azure, or GCP, Railway, or a comparable operations setup. If the team does not already run databases, object storage, backups, migrations, and observability for observability itself, the managed cloud plan may be the more honest path.
The cost model is straightforward enough to forecast. Hobby includes 50k units per month, 30 days data access, and 2 users. Core includes 100k units per month, 90 days data access, unlimited users, and additional usage at $8 per 100k units, lower with volume. Pro keeps 100k included units and moves to 3 years data access. The pricing calculator lists graduated tiers: 0-100k units free, 100k-1M units at $8 per 100k, 1-10M at $7 per 100k, 10-50M at $6.5 per 100k, and 50M+ at $6 per 100k.
Langfuse cost tracking is useful, but only if the application sends the right fields. It tracks usage and cost on observations of type generation and embedding. Cost can be ingested through API, SDKs, or integrations, or inferred from the model parameter with predefined models and tokenizers. For reasoning models such as the OpenAI o1 model family, Langfuse says cost inference is not supported when no token counts are ingested. In production, that means you should send provider usage directly whenever the model response includes it.
LangSmith: Best When The Agent Lifecycle Belongs In One Managed System
LangSmith is the stronger choice when the application is already built around LangChain or LangGraph and the team wants a managed loop from trace to eval to review to deployment. LangSmith defines a trace as a single execution of an application that can include many individual steps, such as LLM calls and other tracked events. Its tracing quickstart describes a trace as the complete record of every step in a request, from inputs to final output.
The setup advantage is real for LangChain and LangGraph teams. If the codebase already uses those frameworks, tracing can be turned on with one environment variable and an API key. That matters during rollout because observability fails most often when it is optional per engineer or bolted on after launch. If every run through the framework is traced consistently, the team can debug behavior, collect examples, and build evals without first designing a telemetry standard from scratch.
LangSmith's evaluation model is also production-friendly. Its docs frame evaluation as measuring quality from pre-deployment testing to production monitoring. Offline evaluations cover benchmarking, regression testing, unit testing, and backtesting. Online evaluations cover real-time monitoring, anomaly detection, and production feedback on live traffic. The docs recommend creating 5-10 examples of what good looks like for each critical component, such as retrieval, tool selection, argument formatting, or final answer quality.
That workflow is useful when a team wants evals to be an operating rhythm, not a notebook. A production agent needs failed traces turned into dataset examples, dataset examples turned into regression tests, regression tests tied to release gates, and live quality checks tied to alerting or review. LangSmith gives more of that managed surface in one place.
The tradeoff is cost and platform coupling. Plus is $39 per seat per month, includes 10k base traces per month, and supports unlimited seats and up to 3 workspaces. Base traces retain for 14 days and cost $2.50 per 1k traces. Extended traces retain for 400 days and cost $5.00 per 1k traces, with base-to-extended upgrades at $2.50 per 1k traces. If a team stores every trace as extended history, retention becomes a meaningful bill. If it keeps only failures, reviewed runs, eval examples, and release-critical traces, the economics are easier to control.
LangSmith also adds adjacent product pricing. Plus includes 1 free dev-sized agent deployment, but additional deployment runs cost $0.005 each. Production deployment uptime costs $0.0036 per minute, development deployment uptime costs $0.0007 per minute, additional Fleet runs cost $0.05 per Fleet run, Engine costs $1.50 per LCU, and Sandboxes cost $0.0576 per vCPU-hour, $0.0185 per GiB-hour memory, and $0.000123 per GiB-hour storage. None of those numbers are bad by themselves. They just belong in the architecture decision before LangSmith becomes the default control plane.
The Cost Line
Langfuse is usually easier to reason about when the team has many internal users and wants usage to dominate the bill. LangSmith is usually easier to justify when the team is paying for a managed lifecycle around a LangChain or LangGraph application, not just trace storage.
Here is the production cost question to ask before choosing either one:
monthly observability cost =
traced requests
x spans per request
x retention class
x review/eval sampling rate
x team access model
+ deployment/control-plane usage
+ self-host infrastructure and operationsThat formula keeps the comparison honest. Langfuse Cloud charges around units, plan, and retention. LangSmith charges around seats, trace allowance, trace retention, and optional managed products. Langfuse self-hosting replaces vendor usage fees with infrastructure and operations. LangSmith Enterprise self-hosting exists, but it is part of custom Enterprise packaging rather than the self-serve path.
The dangerous cost pattern is not high traffic alone. It is retaining low-value traces at high value, sending untagged spans that cannot be aggregated by customer or feature, and running evals without a sampling policy. Trace everything briefly. Retain selectively. Promote only the traces that have learning value: failures, regressions, human-reviewed runs, unusual latency, high-cost requests, and examples that become dataset rows.
What To Log Before Either Tool Is Useful
Both products become weak if the application sends thin telemetry. A trace UI cannot fix missing business context. A production AI system should send the fields that let engineering, product, support, and compliance answer the same incident without arguing about what happened.
At minimum, log:
run_id,trace_id,span_id, andparent_span_id- user, tenant, workspace, or account identifiers, with privacy-safe masking
- environment, release, route, feature, and prompt version
- model provider, model name, temperature, tool policy, and fallback path
- retrieved document IDs, retrieval scores, reranker scores, and permission filters
- tool name, arguments, result status, and external system latency
- token usage, provider-reported cost, inferred cost, and total request latency
- evaluator names, evaluator versions, score values, and pass/fail thresholds
- approval state, reviewer, reason code, and handoff target for human decisions
- error class, retry count, fallback used, and final user-visible outcome
This is where many teams get the tool choice backward. They pick a vendor before they define the run contract. The run contract is the product. Langfuse or LangSmith is the database, UI, eval system, and workflow around it.
Decision Rules
Choose Langfuse if you need self-hosting without an enterprise-only hosting gate, have mixed frameworks, want open instrumentation, need unlimited users on paid cloud plans, or expect your observability layer to connect deeply to internal analytics, billing, and security systems. It is also the better fit when the team can operate the stack or can start on Langfuse Cloud and move selected environments into its own infrastructure later.
Choose LangSmith if the application is LangChain or LangGraph-heavy, the team wants the fastest path to consistent tracing, and the managed eval, review, alerting, and deployment surface will reduce platform work. It is the better fit when the agent lifecycle matters more than framework neutrality, and when per-seat and retention pricing are acceptable for the way the team will sample, store, and review traces.
Pick neither as a substitute for an eval policy. The policy decides what gets scored online, what gets turned into a dataset, what blocks a deployment, what triggers human approval, and what gets paged. Without that policy, both tools become expensive screenshots of confusing behavior.
For teams still choosing a broader observability direction, keep the decision inside the evals and observability lane: traces, evals, cost, approval, and release gates should move together.
What is the difference between LangSmith and Langfuse?
Langfuse is stronger for framework-neutral tracing, self-hosting, open instrumentation, and cost visibility across a mixed AI stack. LangSmith is stronger when the application is already built on LangChain or LangGraph and the team wants managed tracing, evals, review, alerting, and deployment in one platform.
Is LangSmith free or paid?
LangSmith has a Developer plan at $0 per seat per month with up to 5k base traces per month and 1 seat. Its Plus plan is $39 per seat per month with up to 10k base traces per month, then pay-as-you-go usage.
What is Langfuse used for?
Langfuse is used for LLM application tracing, token and cost tracking, prompt management, datasets, experiments, evaluation scores, human annotation, online evals, and offline evals.
Can Langfuse be self-hosted for production?
Yes. Langfuse can be deployed in cloud infrastructure, inside a VPC, or on-premises, but production-scale self-hosting means operating Postgres, ClickHouse, Redis or Valkey, S3 or blob storage, migrations, backups, and throughput.
Book an AI Engineering Audit
Find the tracing, eval, cost, logging, and approval gaps before your production AI system scales.
Jun 2, 2026
