Vector search in Postgres with pgvector

The default advice for any project involving embeddings is now "use Pinecone." The advice is correct at scale and wrong at the size most projects actually are.

For most embedding workloads (RAG over a documentation site, semantic search across your blog, similarity matching for a product catalogue), the right answer is pgvector. Same Postgres you're already running. Same auth. Same backups. Same dev environment. One fewer vendor on the invoice.

This post is the envelope (when pgvector works, when it doesn't), the setup that's actually fast, and the operational stuff nobody mentions.

What pgvector is

pgvector is a Postgres extension that adds a vector column type plus index types for approximate nearest neighbour (ANN) search. You install it once per database, store embeddings in a normal column, and query them with new SQL operators (<-> for distance, <#> for inner product, <=> for cosine).

Installation:

sql

CREATE EXTENSION vector;

That's it. Supported on Neon, Supabase, RDS Postgres, Cloud SQL, Crunchy, and any modern self-hosted Postgres. On most managed providers it's already available; just enable it.

What it can do, and at what scale

pgvector handles vector search well in this envelope:

Up to ~1-5 million vectors per table. Beyond this, query times start climbing, and you'll want a dedicated vector DB.
Up to ~1536 dimensions. OpenAI's text-embedding-3-small is 1536, text-embedding-3-large is 3072. Both work but 3072 is closer to the upper edge.
Latency in the 5-50ms range for ANN queries with a properly built HNSW index, at low millions of rows.
Throughput of hundreds of QPS on a moderately-sized Postgres instance.

This covers most projects:

A RAG chatbot over your company's documentation (a few hundred to a few thousand vectors).
Semantic search across a blog with 10,000 articles (10K vectors).
Product similarity for a marketplace with 100K items (100K vectors).
Customer support deflection with 50K historical tickets (50K vectors).

If you're under a million vectors, you're well within the comfortable zone. If you're between 1M and 5M, you're in the zone where pgvector still works but the index choice and configuration start mattering. Above 5M, the case for a dedicated vector DB strengthens.

The setup

Schema:

sql

CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding vector(1536) NOT NULL,
  metadata JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Index — and this is where most teams pick wrong:

sql

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Two index types pgvector offers: IVFFlat and HNSW. The honest comparison:

IVFFlat — faster to build, slower to query, requires you to know the data distribution ahead of time. Fine for batch updates and read-heavy workloads.
HNSW — slower to build, faster to query, works without prior knowledge of the data. Better for most use cases.

Default to HNSW unless you have a specific reason. The build time is longer (minutes vs. seconds for IVFFlat on small data, hours vs. minutes on large) but query latency is consistently lower.

The m and ef_construction parameters trade build time and memory for recall. The defaults (m=16, ef_construction=64) are reasonable for most data. Tune up if you need higher recall (closer to exact search); tune down if memory is tight.

Querying

A semantic-search query:

sql

SELECT id, content, metadata,
       embedding <=> $1 AS distance
FROM documents
WHERE metadata->>'category' = 'tutorial'
ORDER BY embedding <=> $1
LIMIT 10;

$1 is the query embedding (generated by the same model as the documents). <=> is cosine distance. The WHERE clause filters first, then ranks; the planner usually handles this efficiently if the metadata column is indexed.

In TypeScript with Drizzle:

import { sql } from "drizzle-orm";

const results = await db
  .select({
    id: documents.id,
    content: documents.content,
    distance: sql<number>`${documents.embedding} <=> ${queryEmbedding}::vector`,
  })
  .from(documents)
  .orderBy(sql`${documents.embedding} <=> ${queryEmbedding}::vector`)
  .limit(10);

A small annoyance: Drizzle's type system doesn't yet model vector columns natively (as of writing), so you do a small dance with sql template literals. Workable.

Performance numbers

Benchmarks on a real project (RAG over a documentation site, ~50K vectors at 1536 dims, on a $19/month Neon instance):

Index build time: ~90 seconds
Query latency p50: 12ms
Query latency p99: 38ms
Memory used by the index: ~600MB

For a $19 instance, those numbers are excellent. The same workload on Pinecone's starter tier is comparable in latency but costs $70/month and is one more vendor to manage.

At larger scales (we tested 500K vectors on a $99/month Neon instance):

Index build time: ~25 minutes
Query latency p50: 22ms
Query latency p99: 75ms

Still well within the envelope where pgvector is the right answer.

When to graduate to a dedicated vector DB

Specific signals:

Vector count above 5 million. Index build times become annoying, query latency starts climbing. You'll be tempted to shard; that's a sign.
You need real-time inserts at high throughput. pgvector's HNSW index has per-insert overhead. Pinecone and similar are optimised for streaming inserts.
You need multi-tenant isolation that Postgres's row-level security can't cleanly express.
You need vector operations beyond similarity (clustering, dimensionality reduction at query time). Vector DBs offer more analytics-style operations.
You're using a managed RAG framework that assumes Pinecone or similar. Sometimes the integration cost is bigger than the infrastructure cost.

If none of these apply, pgvector. If two or more apply, evaluate Pinecone, Qdrant, Weaviate, or similar.

Operational notes

Things that bite teams the first month:

Connection pooling. pgvector queries are CPU-heavy on the database. A connection pool that's too small causes queue backpressure; too large saturates CPU. Start at 20 connections, monitor, adjust.

Cold indexes. After a fresh restart, the HNSW index isn't in memory. First few queries are slow. If you have predictable traffic patterns, prewarm by running a few representative queries on startup.

Backups. pgvector data backs up with normal Postgres backups. No new backup story. This is one of the underrated wins.

Schema migrations. Adding a new column to the table doesn't rebuild the index. Adding rows does trigger index updates incrementally. Bulk inserts are slower than non-vector tables; batch your inserts and consider dropping/rebuilding the index for very large initial loads (>100K rows).

Embedding model upgrades. If you switch from text-embedding-3-small to text-embedding-3-large, you need to re-embed everything and rebuild the index. Plan for this; don't switch models casually.

Cost. The biggest hidden cost is embedding generation, not storage. OpenAI's text-embedding-3-small is $0.02 per 1M tokens. Re-embedding a corpus of 100K documents (~500 tokens each) is $1 per pass. Tracking this matters if you change models often.

A pattern that works for RAG

For most RAG applications, the architecture we ship is:

Ingest pipeline: documents → chunks → embeddings → stored in pgvector.
Query pipeline: user question → embedding → ANN search in pgvector → top 5-10 chunks → LLM with chunks in context → response.
Metadata filter: chunks tagged with source, date, category. Filter at query time to scope responses.

The infrastructure is one Postgres instance and the LLM API. No vector-DB vendor. No vector-DB SDK. No separate dashboard. The same database that holds the application's users and content also holds the embeddings.

This pattern scales until it doesn't. The "until it doesn't" usually means a million-plus vectors and high QPS, at which point the migration is a project but a tractable one.

When pgvector is overkill

If you have under 10,000 vectors, you don't even need an ANN index. Linear scan with cosine distance is fast enough.

sql

SELECT id, content, embedding <=> $1 AS distance
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;

No index, full scan, 10ms latency on 10K rows. Add the HNSW index when you cross 50-100K vectors and start seeing latency creep up.

The smallest setup is: a vector column, no index, a linear-scan query. That's a working semantic search system in maybe 50 lines of code.

The summary

For most projects involving embeddings:

Use Postgres + pgvector unless you have a specific reason to use a dedicated vector DB.
HNSW index, default parameters.
Linear scan if under 10K vectors; HNSW above that.
Re-evaluate at 5M vectors.

The default advice has been "use Pinecone" for two years. The better default for most projects is "use the database you're already running." pgvector is mature enough that the case for adding a new vendor needs to be made affirmatively, not assumed.

If you're building a web app with vector search and want a sane setup from day one, see how we work on web apps. Postgres + pgvector, fast queries, one fewer vendor on the invoice.