Why Postgres Might Already Contain Half Your Infrastructure
Most teams run Postgres for relational data, then reach for Redis for caching, a separate service for vector search, a cron scheduler, a geospatial database, and a message queue. What if four or five of those were already sitting inside your Postgres installation, waiting to be enabled with a single CREATE EXTENSION?
This isn’t hype. The Postgres extension ecosystem has matured to the point where specialized extensions can replace entire external services — with lower operational overhead, simpler architecture, and often comparable performance. This guide covers the extensions actually worth deploying, what they replace, and where they fall short.
pgvector: Kill Your Standalone Vector Database
If you’re building AI-powered features in 2026, you’ve probably looked at Pinecone, Weaviate, or Qdrant. Before you sign up, check whether pgvector covers your use case — it does for most teams.
pgvector adds a vector data type and efficient similarity search (HNSW and IVFFlat indexes) to Postgres. Installation:
-- On Ubuntu/Debian
sudo apt install postgresql-16-pgvector
-- Then in psql
CREATE EXTENSION vector;
Creating a table with embeddings and querying by cosine similarity:
CREATE TABLE documents (
id bigserial PRIMARY KEY,
content text,
embedding vector(1536) -- OpenAI ada-002 dimensions
);
-- Create an HNSW index for fast approximate search
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Find the 5 most similar documents
SELECT id, content,
1 - (embedding <=> '[0.1,0.2,...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1,0.2,...]'::vector
LIMIT 5;
The killer feature is joining vector search with structured filters — something that external vector databases handle awkwardly:
-- Semantic search within a specific user's documents
SELECT d.id, d.content,
1 - (d.embedding <=> $1::vector) AS similarity
FROM documents d
WHERE d.user_id = $2
AND d.created_at > NOW() - INTERVAL '30 days'
ORDER BY d.embedding <=> $1::vector
LIMIT 10;
This query would require either a pre-filter in a standalone vector DB (losing recall) or a two-step fetch-then-filter approach. In Postgres, it’s a single query with full ACID guarantees.
When pgvector isn’t enough
pgvector starts showing strain above ~5 million vectors with sub-10ms latency requirements. If you’re at that scale, dedicated systems like Qdrant or Weaviate offer better throughput. For most applications — especially early-stage — pgvector is the right starting point.
pg_cron: Replace Your External Job Scheduler
Cron jobs that touch databases are usually managed externally: a Linux crontab entry that runs a script, an AWS EventBridge rule, a Kubernetes CronJob. Each adds operational overhead and a failure domain. pg_cron runs scheduled jobs directly inside Postgres.
-- Install
CREATE EXTENSION pg_cron;
-- Schedule a cleanup job every night at 3am UTC
SELECT cron.schedule(
'cleanup-old-sessions',
'0 3 * * *',
$$DELETE FROM user_sessions WHERE expires_at < NOW() - INTERVAL '7 days'$$
);
-- Run a data rollup every hour
SELECT cron.schedule(
'hourly-stats-rollup',
'0 * * * *',
$$INSERT INTO hourly_stats SELECT date_trunc('hour', created_at), COUNT(*), SUM(amount)
FROM orders WHERE created_at >= date_trunc('hour', NOW()) - INTERVAL '1 hour'
AND created_at < date_trunc('hour', NOW())
GROUP BY 1$$
);
-- View scheduled jobs
SELECT * FROM cron.job;
-- View recent run history
SELECT * FROM cron.job_run_details ORDER BY start_time DESC LIMIT 20;
The configuration for pg_cron requires adding it to shared_preload_libraries in postgresql.conf:
shared_preload_libraries = 'pg_cron'
cron.database_name = 'myapp'
Practical advantages over external schedulers
- No network hop — the job runs where the data lives
- Jobs run in a transaction — failed jobs leave no partial state
- Job history is in the database — queryable, joinable with other data
- No separate credentials or connection management
The limitation: pg_cron only runs on the primary node in a replica setup. If the primary fails, scheduled jobs don’t run until failover completes. For truly critical scheduled work, keep an external fallback.
PostGIS: Geographic Queries Without a Separate Geo Service
PostGIS turns Postgres into a full-featured spatial database. If your application stores locations, routes, service areas, or any geographic data, PostGIS eliminates the need for a separate mapping/geo service for most backend queries.
CREATE EXTENSION postgis;
-- Store business locations
CREATE TABLE businesses (
id bigserial PRIMARY KEY,
name text,
location geometry(Point, 4326) -- WGS84 coordinates
);
-- Add a spatial index
CREATE INDEX businesses_location_idx
ON businesses USING GIST (location);
-- Find all businesses within 500 meters of a point
SELECT name,
ST_Distance(
location::geography,
ST_SetSRID(ST_MakePoint(-73.9847, 40.7490), 4326)::geography
) AS distance_meters
FROM businesses
WHERE ST_DWithin(
location::geography,
ST_SetSRID(ST_MakePoint(-73.9847, 40.7490), 4326)::geography,
500
)
ORDER BY distance_meters;
More complex geospatial operations:
-- Check if a point is inside a delivery zone polygon
SELECT z.name AS zone_name
FROM delivery_zones z
WHERE ST_Within(
ST_SetSRID(ST_MakePoint($1, $2), 4326),
z.boundary
);
pg_partman: Automated Table Partitioning
Time-series tables grow unbounded without partitioning. pg_partman automates creating and maintaining partitioned tables by time or serial range — replacing what many teams solve with custom scripts or a separate time-series database.
CREATE EXTENSION pg_partman;
-- Create a partitioned events table
CREATE TABLE events (
id bigserial,
event_type text,
payload jsonb,
created_at timestamptz NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);
-- Hand it to pg_partman: create monthly partitions, keep 12 months
SELECT partman.create_parent(
p_parent_table => 'public.events',
p_control => 'created_at',
p_type => 'native',
p_interval => 'monthly',
p_premake => 3
);
-- Configure retention: drop partitions older than 12 months
UPDATE partman.part_config
SET retention = '12 months',
retention_keep_table = false
WHERE parent_table = 'public.events';
-- Run maintenance (schedule this with pg_cron)
SELECT partman.run_maintenance();
pg_trgm: Full-Text Search and Fuzzy Matching
Before standing up Elasticsearch or Typesense for search, try pg_trgm. It enables trigram-based similarity search and dramatically speeds up LIKE/ILIKE queries on text columns.
CREATE EXTENSION pg_trgm;
-- Create a GIN index for fast LIKE queries
CREATE INDEX products_name_trgm_idx
ON products USING GIN (name gin_trgm_ops);
-- Fast LIKE search (uses the index)
SELECT * FROM products
WHERE name ILIKE '%wireless headphones%'
ORDER BY similarity(name, 'wireless headphones') DESC
LIMIT 20;
-- Fuzzy search: handles typos
SELECT name, similarity(name, 'airpods pro') AS score
FROM products
WHERE similarity(name, 'airpods pro') > 0.3
ORDER BY score DESC
LIMIT 10;
Combined with Postgres full-text search (tsvector/tsquery), this covers the majority of search use cases without Elasticsearch:
-- Full-text search with ranking
SELECT name, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'wireless & headphones') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
Combining Extensions: The Real Power
The compounding benefit is that these extensions work together. A single query can do vector similarity search with geospatial filtering, scheduled via pg_cron, on a partitioned table managed by pg_partman — all with full transactional consistency.
-- Find restaurants near a location that match cuisine preferences (embeddings)
SELECT r.id, r.name,
ST_Distance(r.location::geography, $1::geography) AS distance_m,
1 - (r.cuisine_embedding <=> $2::vector) AS preference_match
FROM restaurants r
WHERE ST_DWithin(r.location::geography, $1::geography, 2000)
AND r.is_active = true
ORDER BY
(1 - (r.cuisine_embedding <=> $2::vector)) * 0.6
+ (1 - LEAST(ST_Distance(r.location::geography, $1::geography) / 2000, 1)) * 0.4
DESC
LIMIT 10;
Try doing that as a single atomic query across Pinecone + PostGIS + your relational database.
The Honest Trade-off Assessment
Postgres extensions are not always the right answer. Here’s an honest breakdown:
- pgvector vs. Pinecone/Qdrant: Use pgvector under ~5M vectors. Above that, dedicated vector DBs win on throughput.
- pg_cron vs. Temporal/Airflow: pg_cron is ideal for simple periodic SQL tasks. Complex multi-step workflows with retries and branching need a dedicated orchestrator.
- PostGIS vs. specialized geo APIs: PostGIS handles backend queries brilliantly. For routing, map rendering, or geocoding, you still need external services.
- pg_trgm vs. Elasticsearch: pg_trgm covers search for most apps. Elasticsearch wins at large scale, faceted search, and relevance tuning.
Getting Started: Audit Your Current Stack
The exercise worth doing: list every non-Postgres service in your stack and ask whether a Postgres extension covers 80% of the use case. For most teams with under 10 million records per service, the answer is yes far more often than expected.
Start with pgvector if you’re doing any AI/embedding work. Add pg_cron to eliminate your cron scripts. The operational simplicity — one database to monitor, backup, scale — compounds over time. Every service you don’t run is a failure domain you don’t have.
The goal isn’t to run everything in Postgres forever. It’s to avoid premature architectural complexity while you’re still learning what your application actually needs.
