Pgvector vs Pinecone for Production RAG

Choose pgvector when retrieval belongs in Postgres. Choose Pinecone when scale, namespaces, or managed ops justify a separate vector system.

Thursday, June 4, 2026

Omid Saffari

Use pgvector when retrieval belongs next to your product data and permissions. Use Pinecone when vector search has become its own production system: high-scale ingestion, strict namespace isolation, managed read capacity, and a team that wants to buy vector operations instead of owning them in Postgres.

The Verdict: Keep RAG In Postgres Until Retrieval Becomes Its Own System

The default production choice is pgvector, not because Postgres is always faster, but because most RAG failures start at the boundary between retrieved chunks, product permissions, tenant data, and source-of-truth records. If those records already live in Postgres, pgvector keeps retrieval in the same transaction model, the same backup policy, and the same SQL permission surface.

Pinecone earns the extra system boundary when retrieval is no longer a feature inside the application database. That line usually appears when the corpus is large enough that index builds and query latency compete with product traffic, when each customer needs hard namespace isolation, when the team wants managed vector capacity, or when hybrid retrieval and read scaling deserve their own control plane.

The practical rule:

Choose pgvector for the first serious RAG build, internal knowledge search, product docs search, permission-heavy SaaS retrieval, and any workflow where SQL joins are part of the answer.
Choose Pinecone when the vector layer is a platform service: many tenants, many indexes, high ingest, independent retrieval SLOs, and a team that would rather pay for managed operations than tune Postgres memory, vacuum, and indexes.

That still leaves room to start with pgvector and move later. The mistake is not starting in Postgres. The mistake is letting the application call pgvector directly from every feature without a retrieval interface, logs, evals, and stable source IDs. Build the boundary on day one, even if the first backend is just a Postgres table.

Pgvector vs Pinecone: Production Comparison Table

Decision axis	pgvector in Postgres	Pinecone	Production decision
Ownership model	Postgres extension for vector similarity search and embeddings	Managed vector database and retrieval platform	Pick pgvector when your database team owns retrieval. Pick Pinecone when retrieval needs a separate owner and budget.
Pricing unit	Postgres compute, memory, disk, backups, logs, and engineering time	Plan minimums plus storage, read units, write units, imports, backups, inference, and optional add-ons	pgvector is cheaper when you already pay for Postgres headroom. Pinecone is cleaner when vector usage needs its own bill and scale curve.
Search model	Exact nearest neighbor by default, plus HNSW and IVFFlat approximate indexes	Dense, sparse, and full-text ranking fields in managed indexes	pgvector is enough for semantic search plus SQL. Pinecone is stronger when dense, sparse, BM25, and managed retrieval operations are product requirements.
Filtering	Full SQL filters, joins, row-level policies, and transactions	Metadata filters over flat JSON, with operators such as `$eq`, `$in`, `$and`, and `$or`	pgvector wins on relational permissions. Pinecone wins on simple metadata filters at vector-system scale.
Tenant isolation	Usually modeled with tenant columns, schemas, partitions, or separate databases	Namespaces partition records inside an index and are designed for multitenancy	Pinecone is simpler when every customer needs its own retrieval partition.
Operational burden	You tune Postgres memory, index builds, autovacuum, backups, query plans, and app traffic contention	Pinecone manages vector infrastructure, with higher-tier features such as RBAC, backup and restore, SAML SSO, audit logs, private networking, and SLA options	Use Pinecone when operating vector search is distracting your team from the product.
Migration risk	Data and metadata stay in relational tables, easier to inspect and export	Retrieval data sits behind Pinecone APIs and index schemas	Keep source chunks and embeddings exportable either way.
Best first workload	Permission-aware RAG over product data, docs, tickets, and internal records	Large multi-tenant retrieval, high ingest, managed hybrid search, or standalone retrieval platform	Start with the smallest system that preserves correctness. Split when the vector layer earns it.

Supabase pgvector documentation — Supabase documents pgvector as a Postgres extension for embeddings and vector similarity search.

Choose Pgvector When SQL Is The Product Boundary

Pgvector is strongest when retrieval quality depends on the same relational facts that protect the product. A B2B SaaS assistant should not retrieve chunks with a nice cosine score and then discover after generation that half of them came from the wrong tenant, role, billing tier, workspace, or lifecycle state. Those rules already exist in Postgres. Keeping vectors there lets the retrieval query enforce them before the model sees anything.

The base implementation is simple: chunks live in a table, source records keep their IDs, permissions stay queryable, and the vector column is indexed for approximate search once exact search becomes too slow. The pgvector README documents exact and approximate nearest neighbor search, HNSW and IVFFlat indexes, and distance operators for L2, inner product, cosine, L1, Hamming, and Jaccard distance.

SQL

create extension if not exists vector;

create table rag_chunks (
  id bigserial primary key,
  tenant_id uuid not null,
  source_id uuid not null,
  source_type text not null,
  acl_group_ids uuid[] not null default '{}',
  published_at timestamptz,
  body text not null,
  embedding vector(1536) not null
);

create index rag_chunks_tenant_idx on rag_chunks (tenant_id);
create index rag_chunks_source_idx on rag_chunks (source_type, source_id);
create index rag_chunks_embedding_hnsw
  on rag_chunks using hnsw (embedding vector_cosine_ops);

The retrieval query can enforce product boundaries in the same place it ranks chunks:

SQL

select id, source_id, body, 1 - (embedding <=> $1::vector) as similarity
from rag_chunks
where tenant_id = $2
  and published_at is not null
  and acl_group_ids && $3::uuid[]
order by embedding <=> $1::vector
limit 12;

That pattern is boring in the best way. It is inspectable, permission-aware, and easy to replay in an incident review. The limitation is that approximate vector indexes and filters are not magic. The pgvector README states that with approximate indexes, filtering is applied after the index is scanned. It gives the concrete example that if a condition matches 10% of rows, HNSW with the default hnsw.ef_search of 40 returns about 4 matching rows on average. Supabase's pgvector docs warn about the same failure mode and point to iterative search when HNSW or IVFFlat plus a filter returns fewer rows than requested.

For production RAG, that means pgvector is not "set one index and forget it." You need to test filtered recall, not just global recall. A good launch gate looks like this:

Build the golden retrieval set
Create 100 to 300 representative questions with expected source chunks, tenant, role, and freshness constraints. Include failed-permission cases where the correct behavior is no answer.
Measure filtered recall
Run the same questions with the real SQL filters enabled. Track top_k, returned row count, expected chunk rank, and whether iterative scan settings change the result.
Watch Postgres pressure
Record query latency, rows scanned, index build time, CPU, memory, locks, and autovacuum behavior during ingestion. If retrieval harms product traffic, the vector feature is no longer free.
Keep the escape hatch
Return retrieval results through a service boundary with stable fields: chunk_id, source_id, score, rank, tenant_id, filters, and embedding_model. Do not let application code depend on pgvector SQL directly.

Pgvector also has concrete shape limits. The current README lists HNSW support for vector up to 2,000 dimensions, halfvec up to 4,000 dimensions, bit up to 64,000 dimensions, and sparsevec up to 1,000 non-zero elements. That is enough for common text embeddings, but it is still a database design constraint, not an implementation detail.

Choose Pinecone When Retrieval Needs Its Own Control Plane

Pinecone is the stronger choice when retrieval must scale and operate independently from the product database. Its value is not that a managed vector database is automatically better than Postgres. Its value is that vector search gets its own indexes, namespaces, capacity model, import path, metrics, and security controls.

Pinecone's current index docs describe indexes that can hold documents, dense vectors, sparse vectors, and full-text search string fields. A document schema can mix dense_vector, sparse_vector, and full-text string ranking fields, then choose the scoring signal at query time. That matters for RAG systems where exact tokens, named entities, error codes, and semantic similarity all affect relevance.

Namespaces are the main production primitive for multitenancy. Pinecone partitions records within an index by namespace, and the docs call out two benefits: isolating customer data and speeding queries by limiting the records scanned. The pricing page lists 100 namespaces per index on Starter, 1,000 on Builder, and 100,000 on Standard and Enterprise. If your RAG product needs one retrieval partition per customer, namespace limits become a real architecture input.

The metadata filter model is useful but narrower than SQL. Pinecone's metadata filter docs list operators such as $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists, $and, and $or. Only $and and $or are allowed at the top level. Each $in or $nin accepts at most 10,000 values. Metadata must be flat JSON, nested objects are not supported, null values are not supported, and metadata size is 40 KB per record.

That pushes the application shape in a different direction:

Python

from pinecone import Pinecone

pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index(host=INDEX_HOST)

results = index.query(
    namespace="tenant_7f3a",
    vector=query_embedding,
    top_k=12,
    filter={
        "$and": [
            {"source_type": {"$eq": "policy"}},
            {"published": {"$eq": True}},
            {"acl_group": {"$in": ["support", "admins"]}},
        ]
    },
    include_metadata=True,
    include_values=False,
)

This works well when metadata is already denormalized into the retrieval record. It is weaker when the filter depends on live relational joins: current subscription state, account hierarchy, object ownership, or a permission graph that changes often. In those cases, either keep retrieval in pgvector or add a pre-query authorization layer that computes the allowed namespace and filter set before Pinecone runs.

Pinecone also has ingestion paths that matter at scale. Its indexing overview recommends import from object storage instead of upsert for large datasets of 10,000,000+ records. Batch upsert is positioned as useful up to 1,000 records per batch when import does not fit. If your pipeline re-embeds a large corpus weekly, those ingestion mechanics matter more than a local benchmark.

Pinecone pricing page — Pinecone pricing separates plan minimums from storage, read units, write units, imports, backups, and optional platform features.

The Cost Model That Prevents The Wrong Choice

Pgvector has no separate vector database bill, but it is not free in production. You pay through Postgres compute, memory, storage, backups, monitoring, and engineering attention. Pinecone has a clearer vector bill, but the first number on the pricing page is not the whole cost either.

Here is the current price shape from the Supabase pricing and Pinecone pricing pages:

Cost input	pgvector on Supabase as a managed Postgres example	Pinecone
Entry plan	Supabase Free is $0/month with 500 MB database size, shared CPU, 500 MB RAM, and 5 GB egress	Pinecone Starter is Free
Production starting point	Supabase Pro starts at $25/month, first project included, with 8 GB disk per project and daily backups stored for 7 days	Pinecone Builder is $20/month flat; Standard has a $50/month minimum usage charge
Scaling unit	Supabase compute add-ons range from Micro at $10/month with 1 GB RAM to 16XL at $3,730/month with 256 GB RAM	Standard storage is $0.33/GB/month, writes are $4-$4.50 per million, reads are $16-$18 per million
Backup and restore	Handled as part of the database plan and backup retention model	Standard backups are $0.10/GB/month and restore is $0.15/GB
Compliance	Supabase Team starts at $599/month; Enterprise is custom	Pinecone Enterprise has a $500/month minimum and 99.95% uptime SLA; HIPAA add-on on Standard is listed at $190/month

The simplest mistake is comparing Pinecone's read/write bill to zero. A 1,536-dimensional float32 embedding is 6,144 bytes before metadata and index overhead. That makes 100,000 raw embeddings about 614 MB and 1,000,000 raw embeddings about 6.14 GB before chunk text, metadata, indexes, and replicas. If that fits into Postgres headroom you already operate, pgvector probably wins the cost argument. If it forces a large Postgres compute tier, slows product queries, or adds operational work the team cannot absorb, Pinecone's explicit bill can be cheaper than hidden database pressure.

Use this budget worksheet before choosing:

Question	Why it matters
How many chunks will exist in 6 months, not today?	Vector systems fail after corpus growth, not at the prototype size.
What is the raw embedding size before metadata and index overhead?	It tells you whether free or entry tiers are realistic.
How many reads happen per user action?	RAG often does multiple retrieval calls per answer: rewrite, search, rerank, fallback, and citation validation.
How often do you re-embed?	Pinecone import and write units matter; Postgres index rebuild and vacuum behavior matter.
Does retrieval share a database with product writes?	The hidden cost of pgvector is contention with the rest of the application.

Treat vendor benchmarks as workload examples, not universal proof. The production number that matters is your filtered recall and p95 latency with your chunks, your metadata, your permission rules, your embedding model, and your traffic pattern.

The Migration Rule: Build A Retrieval Boundary Before You Need It

The safest architecture is not "choose forever." It is a retrieval boundary that lets pgvector and Pinecone sit behind the same contract.

That boundary should accept a normalized retrieval request:

TypeScript

type RetrievalRequest = {
  tenantId: string;
  query: string;
  queryEmbedding: number[];
  topK: number;
  filters: {
    sourceTypes?: string[];
    aclGroups?: string[];
    publishedOnly?: boolean;
    freshnessAfter?: string;
  };
};

type RetrievedChunk = {
  chunkId: string;
  sourceId: string;
  sourceType: string;
  rank: number;
  score: number;
  text: string;
  metadata: Record<string, string | number | boolean | string[]>;
};

Behind that contract, pgvector can use SQL joins and Pinecone can use namespaces and metadata filters. The application should not care. It should receive chunks, scores, source IDs, and enough metadata to cite and audit the answer.

Keep the canonical chunk store separate from the vector backend. Even if Pinecone is the serving layer, source chunks, document versions, embedding model names, and source IDs should live in your own database or object store. That is what makes re-embedding, index rebuilds, eval replay, and provider migration possible.

Pinecone has a few schema-specific risks to plan around. Its create-index docs state that full-text document schemas are currently public preview on API version 2026-01.alpha, and schema migration is not yet supported for document schemas. Changing a document schema requires deleting the index and creating a new one. That is fine if your ingest pipeline can rebuild from canonical chunks. It is painful if Pinecone is the only place your retrieval records exist.

What To Log Before Either One Goes Live

The vector database is not the safety layer. The production RAG system around it is.

Log the fields that let an engineer debug a bad answer without guessing:

query_text
query_embedding_model
retrieval_backend
tenant_id or namespace
filter payload
top_k
returned candidate count
chunk IDs and source IDs
rank and score
cited chunk IDs
retrieval latency
embedding latency
reranker latency, if present
estimated read/write or database cost
answer ID and model
user feedback or approval result

Then run evals against the same retrieval path that production uses. Do not eval unfiltered global search if production uses tenant filters. Do not eval pgvector exact search if production uses HNSW. Do not eval Pinecone without the same namespace and metadata filters your users hit.

This is also where retrieval connects to observability. If the team is already comparing trace and eval tools, the same discipline applies here: capture the retrieval step as a first-class span and preserve the evidence needed to replay it. The adjacent decision is covered in Langfuse vs LangSmith for Production Observability.

For a RAG team, the final decision rule is direct:

The implementation details will change. The boundary should not. More RAG architecture and pipeline patterns live in the RAG engineering lane.

Is pgvector better than Pinecone?

Pgvector is better when retrieval is tied to Postgres data, SQL filters, product permissions, and a corpus that fits within the database capacity your team can operate. Pinecone is better when retrieval needs managed scale, namespace isolation, dedicated vector operations, and a separate capacity model.

What are the disadvantages of pgvector?

Pgvector makes Postgres own vector indexing, memory pressure, query tuning, backups, and ingestion impact. Approximate indexes also need careful filtered-recall testing because filtering can happen after the index scan.

Is pgvector free to use?

Pgvector is open source, but a production RAG system still pays for Postgres compute, memory, disk, backups, logs, monitoring, and engineering time. The right comparison is not pgvector at $0 versus Pinecone pricing; it is total retrieval cost and operational ownership.

What is the difference between pgvector and Pinecone?

Pgvector is a Postgres extension that adds vector similarity search to relational tables. Pinecone is a managed vector database and retrieval platform with separate indexes, namespaces, metadata filtering, dense and sparse vectors, full-text ranking fields, and usage-based pricing.

Scope Your RAG Pipeline

Design a production RAG pipeline with retrieval boundaries, evals, citations, permissions, and observability before it reaches real users.

Last Updated

Jun 4, 2026

CategoryRAG

Pgvector vs Pinecone for Production RAG

The Verdict: Keep RAG In Postgres Until Retrieval Becomes Its Own System

Pgvector vs Pinecone: Production Comparison Table

Choose Pgvector When SQL Is The Product Boundary

Build the golden retrieval set

Measure filtered recall

Watch Postgres pressure

Keep the escape hatch

Choose Pinecone When Retrieval Needs Its Own Control Plane

The Cost Model That Prevents The Wrong Choice

The Migration Rule: Build A Retrieval Boundary Before You Need It

What To Log Before Either One Goes Live

Scope Your RAG Pipeline

More from RAG

Hybrid Search for Production RAG: The BM25, Vector, and Rerank Rule

RAG Evaluation Metrics Before Launch

One letter, every week. Working systems — not hot takes.