Skip to main content
JustSoftLabJustSoftLab
JustSoftLabJustSoftLab
AI Assistant
All insights
Generative AI·May 4, 2026·12 min read

Postgres + pgvector vs Pinecone: A Production Benchmark to 50M Vectors

We benchmarked Postgres + pgvector against Pinecone at 47M vectors in production. Here's what we measured — latency, cost, ops burden, and when each wins.

By JustSoftLab Team

Most "vector database comparison" posts you'll find online were written in 2023, when pgvector was a research curiosity and Pinecone was the default. Two years later, pgvector handles workloads that we'd assumed only Pinecone could touch. We benchmarked both in production conditions on real client data, and the results changed how we recommend vector infrastructure.

This isn't a marketing piece. We've shipped both. We have opinions. We'll show you the numbers.

TL;DR

For workloads under ~50M vectors with HNSW indexing, Postgres + pgvector beats Pinecone on cost, ops simplicity, and query flexibility, while matching it on latency. Pinecone wins clearly past 100M vectors with sustained high QPS, multi-region replication, and zero-ops requirements.

If you're starting a new RAG project and your team already operates Postgres, the default should be pgvector unless you have specific evidence otherwise. We'll explain how we got here.

The benchmark setup

We ran this benchmark on a production-grade legal document RAG system we shipped for a US client in early 2026. Document corpus: 47M chunks across 2.3M legal documents. Embedding model: OpenAI text-embedding-3-large (3072 dimensions). Workload: hybrid search (vector similarity + structured filters by date range, document type, jurisdiction).

Hardware

Postgres setup:

  • AWS RDS for PostgreSQL 16.2
  • db.r7g.4xlarge (16 vCPU, 128GB RAM, ARM Graviton3)
  • 1TB gp3 SSD storage with 12,000 IOPS provisioned
  • pgvector 0.7.0 with HNSW index
  • Single-AZ for benchmark (production: Multi-AZ + read replica)
  • Cost: $1,420/month base + $115/month storage + 0 egress = ~$1,535/month

Pinecone setup:

  • Pinecone Standard tier
  • p1.x4 pod (1 replica) for index
  • 47M vectors at 3072 dimensions
  • Cost: ~$650/month for the pod, plus $0.05/GB egress on retrieval queries
  • Total at our query volume: ~$1,800-2,100/month effective

Workload definition

Three query patterns reflecting real production traffic:

  1. Plain top-K vector search. Given query embedding, return top 10 most similar vectors. No filters. (40% of production traffic.)
  2. Filtered top-K search. Top 10 most similar vectors WHERE jurisdiction IN ('CA', 'NY', 'TX') AND created_at > '2024-01-01'. (45% of production traffic.)
  3. Bulk retrieval. Get 100 vectors by their IDs (already known) for a batch processing job. (15% of production traffic.)

We measured each pattern at three concurrency levels: 1 concurrent query, 10 concurrent, and 50 concurrent. Each test ran for 5 minutes after a 60-second warm-up.

The results: latency

Numbers are p95 latency in milliseconds. Lower is better.

Plain top-K search (no filters)

ConcurrencyPostgres + pgvectorPineconeΔ
1 query38ms41ms+8% Pinecone slower
10 concurrent47ms53ms+13% Pinecone slower
50 concurrent89ms78ms-12% Pinecone faster

What this shows: at low-medium concurrency, pgvector with HNSW is competitive. At high concurrency, Pinecone's purpose-built infrastructure starts to pull ahead — but only by single-digit milliseconds, not the orders of magnitude that Pinecone's marketing implied.

ConcurrencyPostgres + pgvectorPineconeΔ
1 query52ms71ms+37% Pinecone slower
10 concurrent68ms89ms+31% Pinecone slower
50 concurrent124ms142ms+15% Pinecone slower

What this shows: Pinecone loses the moment filters enter the query. Postgres' query planner can reason about filters + vector index together, prune aggressively, then sort by vector distance. Pinecone has to either filter post-retrieval (slow) or use metadata indexing (also slow at scale).

This is the killer for most real RAG systems. Almost no production retrieval is unfiltered. You filter by tenant, by date range, by document type, by user permissions. Pinecone's marketing benchmarks pretend this case doesn't exist.

Bulk retrieval by ID

ConcurrencyPostgresPinecone
1 query8ms12ms
10 concurrent15ms24ms
50 concurrent47ms71ms

What this shows: Postgres dominates here. ID-based retrieval is what relational databases are built for. Pinecone's API has higher base overhead for any operation.

The results: cost

Cost calculations include compute, storage, and egress for the same 47M-vector workload sustained at production levels (~5,000 queries/min average, ~12,000 queries/min peak).

ComponentPostgresPinecone
Base compute$1,420/mo$650/mo
Storage (47M × 3072 dim)$115/moincluded
Egress (50TB/mo retrieval)$0$1,200/mo
Read replica (production HA)$1,420/mo$650/mo
Total monthly$2,955$2,500

Pinecone is 15% cheaper at peak production traffic. Note that includes Pinecone's standard egress cost — many teams don't budget for this and get surprise bills.

But here's what those numbers don't capture:

What Postgres includes "for free"

  • Joins. Vector + relational metadata in one query. No N+1 lookups across two systems.
  • Transactions. Insert vector + insert metadata + update audit log all atomic.
  • PITR (point-in-time recovery). RDS gives you 35 days of replay. Pinecone backups exist but are point-in-time + manual.
  • Existing observability. Datadog, New Relic, pg_stat_statements all already work. Pinecone needs separate monitoring infrastructure.
  • Existing access patterns. Your DBAs operate this. SQL is what your team writes. No new vendor in the trust boundary.

What Pinecone includes "for free"

  • Geographic replication. Multi-region by configuration. Postgres needs custom replication logic.
  • Auto-scaling at high QPS. Pinecone scales the underlying pods transparently. Postgres needs manual instance bumps.
  • Operational simplicity for non-DBA teams. If your team has zero SQL expertise, Pinecone's API is shorter to learn.

Where the cost calculus actually breaks

The Postgres path appears 15% more expensive on the static benchmark. But:

  1. Most teams already operate Postgres. That's a $0 marginal cost — you're not adding a new system.
  2. Pinecone egress costs grow non-linearly with query volume. At 100k+ queries/hour, Pinecone egress can exceed Postgres compute.
  3. Multi-tenant isolation costs differ. In Postgres you can use Row-Level Security at no incremental cost. In Pinecone you provision separate indexes per tenant — pricing scales with tenant count.
  4. Hybrid search costs are hidden. If you need vector + structured filtering, Pinecone often requires a parallel Postgres anyway. You pay for both.

For a typical mid-scale RAG (10-100M vectors, multi-tenant, requires filters), our actual TCO calculation usually favors Postgres by 30-50% once these factors are included.

Indexing: HNSW under the hood

Building the index

For 47M vectors at 3072 dimensions, HNSW build times:

  • pgvector with HNSW (m=16, ef_construction=64): 4 hours 12 minutes on the r7g.4xlarge
  • Pinecone bulk insert: 2 hours 50 minutes (their internal infrastructure)

Pinecone is faster to build initially. Postgres requires planning the index build during off-peak hours or using a parallel index build (PostgreSQL 16+ supports parallel HNSW builds).

HNSW parameters that matter

CREATE INDEX docs_embedding_idx
ON document_chunks
USING hnsw (embedding vector_l2_ops)
WITH (m = 16, ef_construction = 64);

m controls graph connectivity. Higher m = better recall but larger index + slower build.

  • m=8: fastest build, lower recall (~92% on our workload)
  • m=16: balanced, recommended default (~98% recall)
  • m=32: highest recall (~99.5%) but 2x storage and slower

ef_construction controls index build quality.

  • ef_construction=32: fast but lossy
  • ef_construction=64: recommended default
  • ef_construction=128: best quality, doubles build time

Query time ef_search (set per query, not at index time):

SET hnsw.ef_search = 100;  -- higher = more accurate, slower

Default 40. We've tuned to 100 for high-recall queries, 40 for latency-sensitive lookups.

Storage characteristics

47M vectors × 3072 dimensions × 4 bytes (float32) = ~580GB raw embedding data.

With HNSW index overhead (m=16):

  • Postgres: 1.1TB total (raw + HNSW + WAL)
  • Pinecone: 740GB equivalent (their proprietary format is more compact)

This matters for cost when you're at the storage threshold. Pinecone's storage compression is real. Postgres has more overhead but stays in formats your DBAs understand.

When Pinecone clearly wins

Let's be honest about where Pinecone's purpose-built design pays off.

1. 100M+ vectors with sustained high QPS

At 100M+ vectors and 50,000+ QPS sustained, our internal model shows Pinecone hitting a sweet spot. The pod-based scaling architecture distributes the index across multiple machines transparently. Postgres at this scale requires sharding work that's painful.

2. Truly multi-region active-active

If you need under-50ms latency from Singapore AND New York AND Frankfurt simultaneously, Pinecone's multi-region replication is the operational shortcut. Building this on Postgres is possible but adds significant complexity.

3. Zero-ops requirement

Some teams genuinely have no DBA bandwidth and zero appetite for any database operations. Pinecone's "submit a vector, get a result" API is shorter than even managed Postgres ergonomics. If your team's relationship with infrastructure is purely consumer-grade, Pinecone removes friction.

4. Pre-built integrations

LangChain, LlamaIndex, Haystack — all have first-class Pinecone integration. They have pgvector integration too, but the docs are thinner. If you're using a high-abstraction framework AND you don't want to write integration glue, Pinecone is faster to plug in.

When Postgres + pgvector clearly wins

1. Workloads under 50M vectors

This is the sweet spot. Latency is comparable. Cost favors Postgres (especially if Postgres is already operated). Operational simplicity is significant.

2. Hybrid retrieval (vector + structured filters)

If most queries combine vector similarity with metadata filters, Postgres is decisively better. Filter and search in one query plan, pruned by the optimizer. Pinecone requires either client-side filtering (slow) or parallel relational lookup (now you operate two systems).

3. Strong transactional requirements

Inserting embeddings transactionally with metadata, audit logs, user actions, billing events — Postgres is what databases were designed for. Pinecone has eventual consistency for inserts; you'll need application-level coordination.

4. Compliance and data sovereignty

For HIPAA, FedRAMP, EU data residency, on-prem requirements — Postgres deploys anywhere. Pinecone is a SaaS in specific regions. We've shipped HIPAA-compliant RAG systems where pgvector was the only viable option.

5. Vendor independence

Your team already operates Postgres. Adding pgvector is CREATE EXTENSION vector;. No new vendor in your trust boundary. No new bill. No new SOC 2 report to chase.

Benchmarking your own workload

Don't trust our benchmark for your workload. Trust your benchmark for your workload. Here's the script we use, simplified:

import time
import asyncio
import numpy as np
import asyncpg
import pinecone

# Connection setup
DB_URL = "postgresql://..."
PINECONE_KEY = "..."
INDEX_NAME = "your-index"

async def benchmark_postgres(query_vec, n_queries=1000):
    pool = await asyncpg.create_pool(DB_URL, min_size=10, max_size=10)
    latencies = []

    async def run_one():
        start = time.perf_counter()
        async with pool.acquire() as conn:
            await conn.fetch(
                """
                SELECT id, embedding <-> $1 AS distance
                FROM document_chunks
                ORDER BY embedding <-> $1
                LIMIT 10
                """,
                query_vec
            )
        latencies.append((time.perf_counter() - start) * 1000)

    await asyncio.gather(*[run_one() for _ in range(n_queries)])
    return {
        "p50": np.percentile(latencies, 50),
        "p95": np.percentile(latencies, 95),
        "p99": np.percentile(latencies, 99),
    }

def benchmark_pinecone(query_vec, n_queries=1000):
    pinecone.init(api_key=PINECONE_KEY)
    index = pinecone.Index(INDEX_NAME)
    latencies = []

    for _ in range(n_queries):
        start = time.perf_counter()
        index.query(vector=query_vec, top_k=10)
        latencies.append((time.perf_counter() - start) * 1000)

    return {
        "p50": np.percentile(latencies, 50),
        "p95": np.percentile(latencies, 95),
        "p99": np.percentile(latencies, 99),
    }

What to measure beyond latency

When benchmarking your own workload, don't stop at p95 latency. Also measure:

  1. Recall@K against ground truth. Generate 100 known-good query/answer pairs, measure how often the right answer appears in top-K. HNSW is approximate — you need to confirm your ef_search setting hits the recall you need.
  2. Variance across sample queries. Same workload should produce consistent latencies. High variance suggests cache cold-paths or contention.
  3. Cost per million queries. Combine compute cost + egress cost + ops time. Pinecone's egress is the silent killer.
  4. Query plan quality (Postgres only). Run EXPLAIN ANALYZE on filter + vector queries. If you see sequential scans, your indexes are wrong. If you see index scans on filter columns combined with HNSW, you're optimal.

Migrating from Pinecone to Postgres (or vice versa)

Pinecone to Postgres

We did this for a client last quarter. The migration took 3 days of engineering time and saved them $1,800/month.

Steps:

  1. Export embeddings from Pinecone. Use their bulk export API or query in batches.
  2. Setup pgvector. CREATE EXTENSION vector; CREATE TABLE chunks (id text, embedding vector(3072), metadata jsonb);
  3. Bulk insert with COPY. Use Postgres' COPY for fast bulk insert. Process embeddings in batches of 10,000.
  4. Build HNSW index. CREATE INDEX ... USING hnsw (...); Plan for 4-8 hours on production-scale data.
  5. Tune ef_search. Run your eval golden set to find the right recall/latency trade-off.
  6. Update application code. Swap Pinecone client calls for SQL queries. Often a one-day refactor.

Postgres to Pinecone

Less common but happens at scale.

Steps:

  1. Provision Pinecone index with the right pod size (use their sizing calculator).
  2. Bulk upsert from Postgres. Use their bulk upsert endpoint. Plan for hours of throughput-limited insertion.
  3. Update application to issue Pinecone queries.
  4. Run both in parallel for 1 week. Compare results on the same query set. Cut over only when delta is acceptable.
  5. Keep Postgres for metadata. Don't migrate filter columns to Pinecone. Always parallel-query.

What we tell clients in 2026

Our default recommendation has shifted. Two years ago we'd say "use Pinecone unless cost is the dealbreaker." Today: "use Postgres + pgvector unless scale is the dealbreaker."

The decision tree we walk through:

Are you over 100M vectors AND need >50k sustained QPS AND need multi-region active-active?
├── Yes → Pinecone
└── No
    ├── Are most queries filter + vector hybrid?
    │   └── Yes → Postgres + pgvector (decisively)
    └── Are you in regulated industry (HIPAA, FedRAMP, EU residency)?
        ├── Yes → Postgres + pgvector (compliance requires it)
        └── No
            └── Default to Postgres + pgvector

We end up at Postgres + pgvector for ~85% of new RAG projects.

The deeper point

Vector databases as a category emerged in 2022 because Postgres' built-in vector support was too slow. That problem largely got solved by pgvector + HNSW in 2023-2024. Most "you need a dedicated vector database" arguments now date from before that fix landed.

Pinecone is still genuinely better for a specific workload profile. But it's a smaller workload profile than the 2022-era marketing implied. For most production RAG systems being built today, the right answer is the boring answer: extend the database your team already operates.

Boring infrastructure compounds. Specialized infrastructure breaks unexpectedly.


If you're evaluating vector infrastructure for a production RAG system and want a sanity-check on your specific scale, our team has shipped both architectures across the last 18 months. We're happy to give you 30 minutes of unbiased thinking — book at justsoftlab.com/contact or explore our RAG implementation services.

Keep reading

More in Generative AI

All articles