Language:English VersionChinese Version

Why Connection Pooling Exists

Every time your application opens a connection to a database, a small negotiation happens: TCP handshake, TLS negotiation (if you are using SSL, and you should be), authentication, session initialization. For PostgreSQL, this process takes roughly 50-100ms. For MySQL, it is somewhat faster but still non-trivial. When your application handles 500 requests per second and each request needs a database query, that is 500 connection setups and teardowns per second — roughly 25-50 seconds of cumulative overhead every second just on connection management.

Connection pooling solves this by maintaining a pool of pre-established connections that your application borrows and returns. The connection setup cost is paid once, and subsequent queries reuse existing connections with near-zero overhead. It sounds simple, and conceptually it is. The complexity comes from deciding where to pool, how to pool, and whether you need an external pooler at all.

Application-Level Pooling: The Default You Already Have

Most modern frameworks include built-in connection pooling. If you are using SQLAlchemy, Django ORM, Rails ActiveRecord, or any mature database library, you already have application-level pooling. Here is what a typical SQLAlchemy pool configuration looks like:

from sqlalchemy import create_engine

engine = create_engine(
    "postgresql://user:pass@localhost:5432/mydb",
    pool_size=10,          # Number of persistent connections
    max_overflow=20,       # Additional connections under load
    pool_timeout=30,       # Seconds to wait for a connection
    pool_recycle=1800,     # Recycle connections after 30 minutes
    pool_pre_ping=True,    # Verify connections before use
)

# Under the hood, when you do:
with engine.connect() as conn:
    result = conn.execute(text("SELECT * FROM users WHERE id = :id"), {"id": 42})
# The connection is borrowed from the pool, used, and returned.
# No TCP handshake, no auth — just a query on an existing socket.

Application-level pooling works well for single-process applications with steady workloads. But it falls apart in several common scenarios:

  • Many application instances: If you run 50 pods in Kubernetes, each with a pool of 10, that is 500 connections to your database. PostgreSQL starts struggling above 200-300 connections, and performance degrades sharply.
  • Serverless functions: Lambda functions, Cloud Functions, and similar platforms create and destroy processes constantly. Each invocation may establish a new connection, overwhelming the database.
  • Connection storms after deploys: When all your pods restart simultaneously during a deployment, they all try to establish connections at once.

This is where external connection poolers come in.

PgBouncer: The PostgreSQL Workhorse

PgBouncer is the most widely deployed connection pooler for PostgreSQL. It sits between your application and the database, maintaining a smaller pool of actual database connections that serve a larger number of client connections. It is lightweight, stable, and battle-tested in production at massive scale.

Pool Modes

PgBouncer supports three pooling modes, and choosing the right one is critical:

Mode How It Works Use When Watch Out For
Session Client owns a server connection for the entire session Apps using session-level features (LISTEN/NOTIFY, prepared statements) Minimal pooling benefit — essentially a proxy
Transaction Client gets a server connection only during a transaction Most web applications, APIs Cannot use session-level features across transactions
Statement Client gets a connection for each individual statement Simple SELECT workloads, pgbouncer for connection limiting only Multi-statement transactions are not supported

Transaction mode is the right choice for 90% of web applications. Here is a production-ready PgBouncer configuration:

# pgbouncer.ini
[databases]
mydb = host=10.0.1.50 port=5432 dbname=mydb

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# Pool sizing
pool_mode = transaction
default_pool_size = 25        # Connections per user/database pair
min_pool_size = 5             # Keep at least 5 connections warm
reserve_pool_size = 5         # Emergency overflow connections
reserve_pool_timeout = 3      # Seconds before using reserve pool

# Connection limits
max_client_conn = 1000        # Max client connections to PgBouncer
max_db_connections = 50       # Max actual database connections

# Timeouts
server_idle_timeout = 300     # Close idle server connections after 5 min
client_idle_timeout = 0       # Never close idle client connections
server_connect_timeout = 15   # Timeout for connecting to Postgres
query_timeout = 120           # Kill queries running longer than 2 min

# Logging
log_connections = 0
log_disconnections = 0
log_pooler_errors = 1
stats_period = 60

The Prepared Statement Problem

The biggest gotcha with PgBouncer in transaction mode is prepared statements. PostgreSQL prepared statements are session-scoped, meaning they exist on a specific server connection. In transaction mode, your application may get a different server connection for each transaction, so prepared statements created in one transaction are not visible in the next.

# This WILL break with PgBouncer in transaction mode:
conn = pool.getconn()
cur = conn.cursor()
cur.execute("PREPARE get_user AS SELECT * FROM users WHERE id = $1")
# ... later, possibly on a different server connection ...
cur.execute("EXECUTE get_user(42)")  # ERROR: prepared statement does not exist

# Solution 1: Disable prepared statements in your driver
# For psycopg2:
engine = create_engine(
    "postgresql://user:pass@pgbouncer:6432/mydb",
    connect_args={"options": "-c statement_timeout=30000"},
    pool_pre_ping=True,
)

# Solution 2: Use PgBouncer 1.21+ with prepared statement support
# Add to pgbouncer.ini:
# max_prepared_statements = 100

PgBouncer 1.21 (released in 2024) added server-side prepared statement tracking, which largely solves this problem. If you are running an older version, upgrade.

ProxySQL: The MySQL Equivalent (and More)

ProxySQL fills the same role for MySQL that PgBouncer fills for PostgreSQL, but it is significantly more feature-rich. Beyond connection pooling, ProxySQL handles query routing, read/write splitting, query caching, and query rewriting. This additional complexity is both its strength and its weakness.

# ProxySQL configuration via SQL interface
# Connect to ProxySQL admin interface:
mysql -u admin -padmin -h 127.0.0.1 -P 6032

-- Add backend MySQL servers
INSERT INTO mysql_servers (hostgroup_id, hostname, port, weight)
VALUES 
  (10, '10.0.1.50', 3306, 1000),   -- Primary (write)
  (20, '10.0.1.51', 3306, 1000),   -- Replica 1 (read)
  (20, '10.0.1.52', 3306, 500);    -- Replica 2 (read, lower weight)

-- Configure connection pool
UPDATE mysql_servers SET max_connections = 100;

-- Set up read/write splitting rules
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES
  (1, 1, '^SELECT .* FOR UPDATE', 10),   -- SELECT FOR UPDATE -> primary
  (2, 1, '^SELECT', 20),                  -- Regular SELECT -> replicas
  (3, 1, '.*', 10);                       -- Everything else -> primary

-- Load configuration
LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
SAVE MYSQL QUERY RULES TO DISK;

When ProxySQL Shines

ProxySQL is particularly valuable in these scenarios:

  • Read/write splitting: Route read queries to replicas automatically, without changing application code.
  • Connection multiplexing: Like PgBouncer, it maps many client connections to fewer server connections.
  • Query caching: Cache frequently executed read queries with TTL-based invalidation.
  • Failover: Automatically detect primary failures and promote replicas.

When You Actually Need an External Pooler

Not every application needs PgBouncer or ProxySQL. Here is a decision framework:

You probably need an external pooler if:

  • You run more than 10 application instances connecting to the same database
  • Your total connection count exceeds 100 for PostgreSQL or 500 for MySQL
  • You use serverless functions that connect to a relational database
  • You experience connection storms during deployments or scaling events
  • You need read/write splitting without application code changes (MySQL/ProxySQL)

You probably do not need an external pooler if:

  • You have fewer than 5 application instances
  • Your database connection count is well within limits
  • Your framework’s built-in pooling handles your load without issues
  • You are using a managed database with built-in pooling (e.g., Neon, PlanetScale)

Managed Alternatives Worth Considering

The managed database space has evolved significantly. Several providers now offer built-in connection pooling that eliminates the need for a separate pooler:

  • Neon: Built-in connection pooling via their proxy layer. Supports both pooled and direct connections. Works well for serverless.
  • Supabase: Includes PgBouncer in their managed PostgreSQL offering. Supavisor is their newer Elixir-based pooler.
  • PlanetScale: MySQL-compatible with built-in connection handling. Their Vitess-based architecture handles connection management internally.
  • AWS RDS Proxy: Managed connection pooler for RDS and Aurora. Supports both PostgreSQL and MySQL. Priced per vCPU-hour, which adds up.

Deployment Patterns

Where you deploy your pooler matters. There are three common patterns:

Pattern 1: Sidecar

Run PgBouncer as a sidecar container in each pod. This gives each application instance its own pool, reducing latency but increasing total connections to the database.

# Kubernetes deployment with PgBouncer sidecar
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest
        env:
        - name: DATABASE_URL
          value: "postgresql://user:pass@localhost:6432/mydb"
      - name: pgbouncer
        image: edoburu/pgbouncer:1.22
        ports:
        - containerPort: 6432
        env:
        - name: DATABASE_URL
          value: "postgresql://user:pass@postgres-primary:5432/mydb"
        - name: POOL_MODE
          value: "transaction"
        - name: DEFAULT_POOL_SIZE
          value: "5"
        - name: MAX_CLIENT_CONN
          value: "50"

Pattern 2: Centralized Pooler

Run a dedicated PgBouncer instance (or cluster) that all application instances connect to. This gives you precise control over the total connection count to the database.

Pattern 3: Per-Service Pooler

Each service gets its own PgBouncer deployment. This balances isolation with efficiency, and is the pattern I recommend for most microservice architectures.

Monitoring Your Pool

A misconfigured connection pool is worse than no pool at all. Monitor these metrics:

# PgBouncer stats query (connect to PgBouncer admin)
SHOW POOLS;
# Watch: cl_active, cl_waiting, sv_active, sv_idle
# If cl_waiting > 0 frequently, increase pool size
# If sv_idle is always high, decrease pool size

SHOW STATS;
# Watch: avg_query_time, total_wait_time
# total_wait_time > 0 means clients are waiting for connections

Connection pooling is infrastructure plumbing — unglamorous but critical. Get it right and your database handles 10x the load without breaking a sweat. Get it wrong and you will spend weekends debugging mysterious connection timeout errors that only happen under load. Choose the simplest solution that solves your actual problem, not the one that looks best on an architecture diagram.

By Michael Sun

Founder and Editor-in-Chief of NovVista. Software engineer with hands-on experience in cloud infrastructure, full-stack development, and DevOps. Writes about AI tools, developer workflows, server architecture, and the practical side of technology. Based in China.

Leave a Reply

Your email address will not be published. Required fields are marked *