Postgres + pgvector vs Pinecone: A Production Benchmark to 50M Vectors
We benchmarked Postgres + pgvector against Pinecone at 47M vectors in production. Here's what we measured — latency, cost, ops burden, and when each wins.
Most "vector database comparison" posts you'll find online were written in 2023, when pgvector was a research curiosity and Pinecone was the default. Two years later, pgvector handles workloads that we'd assumed only Pinecone could touch. We benchmarked both in production conditions on real client data, and the results changed how we recommend vector infrastructure.
This isn't a marketing piece. We've shipped both. We have opinions. We'll show you the numbers.
TL;DR
For workloads under ~50M vectors with HNSW indexing, Postgres + pgvector beats Pinecone on cost, ops simplicity, and query flexibility, while matching it on latency. Pinecone wins clearly past 100M vectors with sustained high QPS, multi-region replication, and zero-ops requirements.
If you're starting a new RAG project and your team already operates Postgres, the default should be pgvector unless you have specific evidence otherwise. We'll explain how we got here.
The benchmark setup
We ran this benchmark on a production-grade legal document RAG system we shipped for a US client in early 2026. Document corpus: 47M chunks across 2.3M legal documents. Embedding model: OpenAI text-embedding-3-large (3072 dimensions). Workload: hybrid search (vector similarity + structured filters by date range, document type, jurisdiction).
Hardware
Postgres setup:
- AWS RDS for PostgreSQL 16.2
db.r7g.4xlarge(16 vCPU, 128GB RAM, ARM Graviton3)- 1TB gp3 SSD storage with 12,000 IOPS provisioned
- pgvector 0.7.0 with HNSW index
- Single-AZ for benchmark (production: Multi-AZ + read replica)
- Cost: $1,420/month base + $115/month storage + 0 egress = ~$1,535/month
Pinecone setup:
- Pinecone Standard tier
- p1.x4 pod (1 replica) for index
- 47M vectors at 3072 dimensions
- Cost: ~$650/month for the pod, plus $0.05/GB egress on retrieval queries
- Total at our query volume: ~$1,800-2,100/month effective
Workload definition
Three query patterns reflecting real production traffic:
- Plain top-K vector search. Given query embedding, return top 10 most similar vectors. No filters. (40% of production traffic.)
- Filtered top-K search. Top 10 most similar vectors WHERE jurisdiction IN ('CA', 'NY', 'TX') AND created_at > '2024-01-01'. (45% of production traffic.)
- Bulk retrieval. Get 100 vectors by their IDs (already known) for a batch processing job. (15% of production traffic.)
We measured each pattern at three concurrency levels: 1 concurrent query, 10 concurrent, and 50 concurrent. Each test ran for 5 minutes after a 60-second warm-up.
The results: latency
Numbers are p95 latency in milliseconds. Lower is better.
Plain top-K search (no filters)
| Concurrency | Postgres + pgvector | Pinecone | Δ |
|---|---|---|---|
| 1 query | 38ms | 41ms | +8% Pinecone slower |
| 10 concurrent | 47ms | 53ms | +13% Pinecone slower |
| 50 concurrent | 89ms | 78ms | -12% Pinecone faster |
What this shows: at low-medium concurrency, pgvector with HNSW is competitive. At high concurrency, Pinecone's purpose-built infrastructure starts to pull ahead — but only by single-digit milliseconds, not the orders of magnitude that Pinecone's marketing implied.
Filtered top-K search
| Concurrency | Postgres + pgvector | Pinecone | Δ |
|---|---|---|---|
| 1 query | 52ms | 71ms | +37% Pinecone slower |
| 10 concurrent | 68ms | 89ms | +31% Pinecone slower |
| 50 concurrent | 124ms | 142ms | +15% Pinecone slower |
What this shows: Pinecone loses the moment filters enter the query. Postgres' query planner can reason about filters + vector index together, prune aggressively, then sort by vector distance. Pinecone has to either filter post-retrieval (slow) or use metadata indexing (also slow at scale).
This is the killer for most real RAG systems. Almost no production retrieval is unfiltered. You filter by tenant, by date range, by document type, by user permissions. Pinecone's marketing benchmarks pretend this case doesn't exist.
Bulk retrieval by ID
| Concurrency | Postgres | Pinecone |
|---|---|---|
| 1 query | 8ms | 12ms |
| 10 concurrent | 15ms | 24ms |
| 50 concurrent | 47ms | 71ms |
What this shows: Postgres dominates here. ID-based retrieval is what relational databases are built for. Pinecone's API has higher base overhead for any operation.
The results: cost
Cost calculations include compute, storage, and egress for the same 47M-vector workload sustained at production levels (~5,000 queries/min average, ~12,000 queries/min peak).
| Component | Postgres | Pinecone |
|---|---|---|
| Base compute | $1,420/mo | $650/mo |
| Storage (47M × 3072 dim) | $115/mo | included |
| Egress (50TB/mo retrieval) | $0 | $1,200/mo |
| Read replica (production HA) | $1,420/mo | $650/mo |
| Total monthly | $2,955 | $2,500 |
Pinecone is 15% cheaper at peak production traffic. Note that includes Pinecone's standard egress cost — many teams don't budget for this and get surprise bills.
But here's what those numbers don't capture:
What Postgres includes "for free"
- Joins. Vector + relational metadata in one query. No N+1 lookups across two systems.
- Transactions. Insert vector + insert metadata + update audit log all atomic.
- PITR (point-in-time recovery). RDS gives you 35 days of replay. Pinecone backups exist but are point-in-time + manual.
- Existing observability. Datadog, New Relic, pg_stat_statements all already work. Pinecone needs separate monitoring infrastructure.
- Existing access patterns. Your DBAs operate this. SQL is what your team writes. No new vendor in the trust boundary.
What Pinecone includes "for free"
- Geographic replication. Multi-region by configuration. Postgres needs custom replication logic.
- Auto-scaling at high QPS. Pinecone scales the underlying pods transparently. Postgres needs manual instance bumps.
- Operational simplicity for non-DBA teams. If your team has zero SQL expertise, Pinecone's API is shorter to learn.
Where the cost calculus actually breaks
The Postgres path appears 15% more expensive on the static benchmark. But:
- Most teams already operate Postgres. That's a $0 marginal cost — you're not adding a new system.
- Pinecone egress costs grow non-linearly with query volume. At 100k+ queries/hour, Pinecone egress can exceed Postgres compute.
- Multi-tenant isolation costs differ. In Postgres you can use Row-Level Security at no incremental cost. In Pinecone you provision separate indexes per tenant — pricing scales with tenant count.
- Hybrid search costs are hidden. If you need vector + structured filtering, Pinecone often requires a parallel Postgres anyway. You pay for both.
For a typical mid-scale RAG (10-100M vectors, multi-tenant, requires filters), our actual TCO calculation usually favors Postgres by 30-50% once these factors are included.
Indexing: HNSW under the hood
Building the index
For 47M vectors at 3072 dimensions, HNSW build times:
- pgvector with HNSW (m=16, ef_construction=64): 4 hours 12 minutes on the r7g.4xlarge
- Pinecone bulk insert: 2 hours 50 minutes (their internal infrastructure)
Pinecone is faster to build initially. Postgres requires planning the index build during off-peak hours or using a parallel index build (PostgreSQL 16+ supports parallel HNSW builds).
HNSW parameters that matter
CREATE INDEX docs_embedding_idx
ON document_chunks
USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64);
m controls graph connectivity. Higher m = better recall but larger index + slower build.
- m=8: fastest build, lower recall (~92% on our workload)
- m=16: balanced, recommended default (~98% recall)
- m=32: highest recall (~99.5%) but 2x storage and slower
ef_construction controls index build quality.
- ef_construction=32: fast but lossy
- ef_construction=64: recommended default
- ef_construction=128: best quality, doubles build time
Query time ef_search (set per query, not at index time):
SET hnsw.ef_search = 100; -- higher = more accurate, slower
Default 40. We've tuned to 100 for high-recall queries, 40 for latency-sensitive lookups.
Storage characteristics
47M vectors × 3072 dimensions × 4 bytes (float32) = ~580GB raw embedding data.
With HNSW index overhead (m=16):
- Postgres: 1.1TB total (raw + HNSW + WAL)
- Pinecone: 740GB equivalent (their proprietary format is more compact)
This matters for cost when you're at the storage threshold. Pinecone's storage compression is real. Postgres has more overhead but stays in formats your DBAs understand.
When Pinecone clearly wins
Let's be honest about where Pinecone's purpose-built design pays off.
1. 100M+ vectors with sustained high QPS
At 100M+ vectors and 50,000+ QPS sustained, our internal model shows Pinecone hitting a sweet spot. The pod-based scaling architecture distributes the index across multiple machines transparently. Postgres at this scale requires sharding work that's painful.
2. Truly multi-region active-active
If you need under-50ms latency from Singapore AND New York AND Frankfurt simultaneously, Pinecone's multi-region replication is the operational shortcut. Building this on Postgres is possible but adds significant complexity.
3. Zero-ops requirement
Some teams genuinely have no DBA bandwidth and zero appetite for any database operations. Pinecone's "submit a vector, get a result" API is shorter than even managed Postgres ergonomics. If your team's relationship with infrastructure is purely consumer-grade, Pinecone removes friction.
4. Pre-built integrations
LangChain, LlamaIndex, Haystack — all have first-class Pinecone integration. They have pgvector integration too, but the docs are thinner. If you're using a high-abstraction framework AND you don't want to write integration glue, Pinecone is faster to plug in.
When Postgres + pgvector clearly wins
1. Workloads under 50M vectors
This is the sweet spot. Latency is comparable. Cost favors Postgres (especially if Postgres is already operated). Operational simplicity is significant.
2. Hybrid retrieval (vector + structured filters)
If most queries combine vector similarity with metadata filters, Postgres is decisively better. Filter and search in one query plan, pruned by the optimizer. Pinecone requires either client-side filtering (slow) or parallel relational lookup (now you operate two systems).
3. Strong transactional requirements
Inserting embeddings transactionally with metadata, audit logs, user actions, billing events — Postgres is what databases were designed for. Pinecone has eventual consistency for inserts; you'll need application-level coordination.
4. Compliance and data sovereignty
For HIPAA, FedRAMP, EU data residency, on-prem requirements — Postgres deploys anywhere. Pinecone is a SaaS in specific regions. We've shipped HIPAA-compliant RAG systems where pgvector was the only viable option.
5. Vendor independence
Your team already operates Postgres. Adding pgvector is CREATE EXTENSION vector;. No new vendor in your trust boundary. No new bill. No new SOC 2 report to chase.
Benchmarking your own workload
Don't trust our benchmark for your workload. Trust your benchmark for your workload. Here's the script we use, simplified:
import time
import asyncio
import numpy as np
import asyncpg
import pinecone
# Connection setup
DB_URL = "postgresql://..."
PINECONE_KEY = "..."
INDEX_NAME = "your-index"
async def benchmark_postgres(query_vec, n_queries=1000):
pool = await asyncpg.create_pool(DB_URL, min_size=10, max_size=10)
latencies = []
async def run_one():
start = time.perf_counter()
async with pool.acquire() as conn:
await conn.fetch(
"""
SELECT id, embedding <-> $1 AS distance
FROM document_chunks
ORDER BY embedding <-> $1
LIMIT 10
""",
query_vec
)
latencies.append((time.perf_counter() - start) * 1000)
await asyncio.gather(*[run_one() for _ in range(n_queries)])
return {
"p50": np.percentile(latencies, 50),
"p95": np.percentile(latencies, 95),
"p99": np.percentile(latencies, 99),
}
def benchmark_pinecone(query_vec, n_queries=1000):
pinecone.init(api_key=PINECONE_KEY)
index = pinecone.Index(INDEX_NAME)
latencies = []
for _ in range(n_queries):
start = time.perf_counter()
index.query(vector=query_vec, top_k=10)
latencies.append((time.perf_counter() - start) * 1000)
return {
"p50": np.percentile(latencies, 50),
"p95": np.percentile(latencies, 95),
"p99": np.percentile(latencies, 99),
}
What to measure beyond latency
When benchmarking your own workload, don't stop at p95 latency. Also measure:
- Recall@K against ground truth. Generate 100 known-good query/answer pairs, measure how often the right answer appears in top-K. HNSW is approximate — you need to confirm your
ef_searchsetting hits the recall you need. - Variance across sample queries. Same workload should produce consistent latencies. High variance suggests cache cold-paths or contention.
- Cost per million queries. Combine compute cost + egress cost + ops time. Pinecone's egress is the silent killer.
- Query plan quality (Postgres only). Run
EXPLAIN ANALYZEon filter + vector queries. If you see sequential scans, your indexes are wrong. If you see index scans on filter columns combined with HNSW, you're optimal.
Migrating from Pinecone to Postgres (or vice versa)
Pinecone to Postgres
We did this for a client last quarter. The migration took 3 days of engineering time and saved them $1,800/month.
Steps:
- Export embeddings from Pinecone. Use their bulk export API or query in batches.
- Setup pgvector.
CREATE EXTENSION vector; CREATE TABLE chunks (id text, embedding vector(3072), metadata jsonb); - Bulk insert with COPY. Use Postgres'
COPYfor fast bulk insert. Process embeddings in batches of 10,000. - Build HNSW index.
CREATE INDEX ... USING hnsw (...);Plan for 4-8 hours on production-scale data. - Tune
ef_search. Run your eval golden set to find the right recall/latency trade-off. - Update application code. Swap Pinecone client calls for SQL queries. Often a one-day refactor.
Postgres to Pinecone
Less common but happens at scale.
Steps:
- Provision Pinecone index with the right pod size (use their sizing calculator).
- Bulk upsert from Postgres. Use their bulk upsert endpoint. Plan for hours of throughput-limited insertion.
- Update application to issue Pinecone queries.
- Run both in parallel for 1 week. Compare results on the same query set. Cut over only when delta is acceptable.
- Keep Postgres for metadata. Don't migrate filter columns to Pinecone. Always parallel-query.
What we tell clients in 2026
Our default recommendation has shifted. Two years ago we'd say "use Pinecone unless cost is the dealbreaker." Today: "use Postgres + pgvector unless scale is the dealbreaker."
The decision tree we walk through:
Are you over 100M vectors AND need >50k sustained QPS AND need multi-region active-active?
├── Yes → Pinecone
└── No
├── Are most queries filter + vector hybrid?
│ └── Yes → Postgres + pgvector (decisively)
└── Are you in regulated industry (HIPAA, FedRAMP, EU residency)?
├── Yes → Postgres + pgvector (compliance requires it)
└── No
└── Default to Postgres + pgvector
We end up at Postgres + pgvector for ~85% of new RAG projects.
The deeper point
Vector databases as a category emerged in 2022 because Postgres' built-in vector support was too slow. That problem largely got solved by pgvector + HNSW in 2023-2024. Most "you need a dedicated vector database" arguments now date from before that fix landed.
Pinecone is still genuinely better for a specific workload profile. But it's a smaller workload profile than the 2022-era marketing implied. For most production RAG systems being built today, the right answer is the boring answer: extend the database your team already operates.
Boring infrastructure compounds. Specialized infrastructure breaks unexpectedly.
If you're evaluating vector infrastructure for a production RAG system and want a sanity-check on your specific scale, our team has shipped both architectures across the last 18 months. We're happy to give you 30 minutes of unbiased thinking — book at justsoftlab.com/contact or explore our RAG implementation services.











