Technical RFCs (Requests for Comments) are the most underrated tool in software engineering. A well-written RFC prevents months of wasted work by surfacing disagreements early, documenting decisions for future engineers, and building consensus before a single line of code is written. A poorly written RFC wastes everyones time in review meetings and dies in a shared drive.
I have written over 40 RFCs across four companies, reviewed hundreds more, and watched the pattern of which ones succeed and which ones stall indefinitely. The difference is rarely the technical merit of the proposal. It is the clarity of the writing, the structure of the argument, and whether the author anticipated the right objections.
Why Most RFCs Fail
The most common failure modes:
- Solution-first thinking: The RFC describes a solution in detail but barely explains the problem. Reviewers cannot evaluate a solution without understanding why it is needed.
- Missing alternatives: Proposing one approach without explaining why other approaches were rejected makes reviewers suspicious. They wonder what you did not consider.
- Scope creep: The RFC tries to solve three problems at once. Reviewers get overwhelmed and stop engaging.
- No success criteria: “Improve performance” is not measurable. “Reduce p99 latency from 800ms to 200ms” is.
- Written for the author, not the audience: Dense technical details without context for readers who are not deeply familiar with the system.
The RFC Template That Works
After years of iteration, this is the template I use. Every section earns its place.
Header
# RFC-042: Migrate Session Storage from Redis to PostgreSQL
**Author:** Jane Smith
**Status:** In Review
**Created:** 2026-03-15
**Reviewers:** @backend-team, @security-team
**Decision deadline:** 2026-03-29
The decision deadline is critical. Without it, RFCs languish in “in review” forever. Two weeks is usually right — long enough for thorough review, short enough to create urgency.
1. Summary (3-4 sentences)
Write this last. It should be a standalone paragraph that a VP could read and understand the proposals essence:
## Summary
This RFC proposes migrating session storage from a standalone Redis
instance to PostgreSQL, which we already run for application data.
This eliminates a separate infrastructure dependency, reduces monthly
costs by approximately $120, and simplifies our backup and recovery
procedures. The migration can be completed with zero downtime using
a dual-write strategy over two weeks.
2. Problem Statement
Describe the problem without hinting at the solution. This section determines whether reviewers agree the RFC is worth reading.
## Problem Statement
Our session management currently relies on a dedicated Redis 7.2
instance (r6g.large) running in AWS ElastiCache. This creates three
operational issues:
1. **Infrastructure complexity:** Redis is the only component in our
stack that runs as a managed service outside our primary PostgreSQL
database. It has separate monitoring, separate backup procedures,
and separate access controls. Our team of 6 engineers maintains
infrastructure for 2 database systems when we could maintain 1.
2. **Cost inefficiency:** The ElastiCache instance costs $156/month.
Our current session volume (12,000 active sessions, average size
340 bytes) would consume approximately 4MB of PostgreSQL storage —
well within our existing RDS capacity.
3. **Disaster recovery gap:** Our PostgreSQL backups run every 6 hours
with point-in-time recovery. Redis snapshots run daily. A failure
between snapshots loses up to 24 hours of session data, forcing
those users to re-authenticate.
Notice the specific numbers: 12,000 sessions, 340 bytes each, $156/month, 24-hour recovery gap. Concrete data makes the problem real. “Redis adds complexity” is opinion. “We maintain 2 database systems for a team of 6” is fact.
3. Proposed Solution
Now describe what you want to build. Be specific enough for an engineer to implement it, but do not write the code. Focus on the design decisions and their rationale.
## Proposed Solution
### Database Schema
Create a `sessions` table in PostgreSQL:
```sql
CREATE TABLE sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
data JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
expires_at TIMESTAMPTZ NOT NULL,
last_accessed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_sessions_user_id ON sessions(user_id);
CREATE INDEX idx_sessions_expires_at ON sessions(expires_at);
```
**Design decisions:**
- JSONB for session data: allows flexible session attributes
without schema migrations for every new field
- Separate expires_at column: enables efficient cleanup queries
without parsing JSONB
- UUID primary key: consistent with our existing ID strategy
### Session Cleanup
A scheduled job runs every 15 minutes:
```sql
DELETE FROM sessions WHERE expires_at < now();
```
At our current volume (12K sessions), this query completes in
under 5ms. At 10x volume, the index on expires_at keeps it
under 50ms.
### Performance Considerations
Current Redis session lookup: ~1ms
Expected PostgreSQL session lookup: ~3-5ms (indexed UUID lookup)
This 2-4ms increase is acceptable because:
- Session lookup happens once per request (in middleware)
- Our p50 response time is 45ms; a 4ms addition is < 10%
- Connection pooling (PgBouncer) eliminates connection overhead
4. Alternatives Considered
This is the section most RFCs skip and most reviewers care about most. Show that you evaluated other options and explain why you rejected them:
## Alternatives Considered
### A. Keep Redis, reduce instance size
We could downgrade from r6g.large to r6g.small ($78/month).
This reduces cost but does not address the infrastructure
complexity or disaster recovery issues. **Rejected** because
it solves only one of three problems.
### B. Switch to Valkey (Redis fork)
Self-hosted Valkey on our existing infrastructure would
eliminate the ElastiCache cost. However, it still requires
maintaining a separate data store, and self-hosting adds
operational burden. **Rejected** because it increases
complexity rather than reducing it.
### C. Use encrypted cookies (no server-side sessions)
JWTs or encrypted cookies eliminate server-side storage
entirely. However, our sessions contain role-based access
data that changes mid-session (when an admin modifies
permissions). Cookie-based sessions cannot be invalidated
server-side. **Rejected** for security reasons.
5. Migration Plan
Reviewers want to know the risk, not just the destination. A phased migration plan demonstrates that you have thought about failure modes:
## Migration Plan
### Phase 1: Dual-Write (Week 1)
- Deploy the sessions table and new session manager
- Write to both Redis and PostgreSQL
- Read from Redis (source of truth)
- Monitor PostgreSQL write latency and error rate
### Phase 2: Dual-Read Validation (Week 1-2)
- Read from both stores, compare results
- Log discrepancies without affecting users
- Fix any edge cases identified
### Phase 3: Switch Read Source (Week 2)
- Read from PostgreSQL (new source of truth)
- Continue writing to Redis as fallback
- Monitor for 48 hours
### Phase 4: Decommission Redis (Week 3)
- Remove Redis writes
- Archive Redis data
- Delete ElastiCache instance
- Update monitoring dashboards
### Rollback Plan
At any phase, revert by switching the read source back to
Redis. Both stores contain identical data during phases 1-3.
Rollback takes < 5 minutes (feature flag toggle).
6. Success Criteria
## Success Criteria
This RFC is successful when:
- [ ] All sessions are stored exclusively in PostgreSQL
- [ ] Session lookup p99 latency is under 10ms
- [ ] ElastiCache instance is decommissioned
- [ ] Monthly infrastructure cost decreases by >= $100
- [ ] No user-facing session disruptions during migration
7. Open Questions
Explicitly list what you do not know. This is intellectually honest and focuses reviewer attention on the genuine uncertainties:
## Open Questions
1. Should we partition the sessions table by month for easier
cleanup at scale? Current volume does not require it, but
it is easier to add partitioning now than later.
2. Do we need to encrypt session data at rest in PostgreSQL?
Redis data was not encrypted at rest. If we add encryption,
should we use PostgreSQL's pgcrypto or application-level
encryption?
3. Should the cleanup job be a cron task or a PostgreSQL
scheduled function (pg_cron)?
Writing Tips That Matter
Lead with the Why
Every design decision should include its rationale. Not "we will use JSONB" but "we will use JSONB because session data varies per feature and we want to avoid schema migrations for every new session attribute."
Write for Skimmers
Most reviewers skim first, then read deeply if interested. Structure for this behavior:
- Summary readable in 30 seconds
- Bold key points in each section
- Tables for comparisons (not prose)
- Code blocks for technical details (not inline descriptions)
Quantify Everything
Replace vague claims with measurements:
| Vague | Specific |
|---|---|
| "Improve performance" | "Reduce p99 from 800ms to 200ms" |
| "Reduce costs" | "Save $120/month ($1,440/year)" |
| "Simplify infrastructure" | "Eliminate 1 of 2 database systems" |
| "Minimal risk" | "Rollback takes < 5 minutes via feature flag" |
Address the Skeptic
Before submitting, read your RFC as someone who disagrees with it. What would they challenge? Write preemptive answers. The most effective technique is the "steel man" — state the strongest objection and address it directly:
### Anticipated Objection: "PostgreSQL is slower than Redis"
True. Redis delivers sub-millisecond lookups. PostgreSQL session
lookups will be 3-5ms. However, this difference is irrelevant
in context: our average response time is 45ms, and session
lookup happens once per request. The 3ms increase is noise
within our overall latency budget.
If session lookup latency becomes a concern at higher volumes,
we can add connection pooling or an application-level cache
without reintroducing Redis as infrastructure.
The Review Process
An RFC without a defined review process is a document, not a decision-making tool. Establish these norms:
- Announce with context: Do not just drop a link in Slack. Write: "RFC-042 proposes eliminating our Redis dependency by moving sessions to PostgreSQL. I need feedback by March 29, particularly on the performance trade-off (Section 3) and encryption question (Open Question #2)."
- Assign reviewers explicitly: "Backend team, please review the migration plan. Security team, please evaluate the encryption question." Undirected requests get no responses.
- Hold a decision meeting: After async review, schedule a 30-minute meeting. The agenda is: resolve open questions, address unresolved comments, and make a go/no-go decision. This meeting should not rehash the RFC — that is what async review is for.
- Document the decision: Update the RFC status to "Accepted" or "Rejected" with a one-paragraph rationale. Future engineers will thank you.
Common Mistakes in RFC Culture
- Requiring RFCs for everything. Not every change needs an RFC. Use them for decisions that are expensive to reverse: architecture changes, new dependencies, data model changes, API contracts.
- RFCs as permission slips. If engineers treat RFCs as bureaucratic approval processes, they will write minimal RFCs to clear the bar. RFCs should be thinking tools, not gatekeeping mechanisms.
- Infinite review cycles. Set a deadline. If reviewers have not commented by the deadline, the RFC proceeds. Silence is consent.
- Not archiving decisions. RFCs are valuable after the decision, not just before it. Search "why did we choose PostgreSQL for sessions?" and find RFC-042 with the full context.
The engineers who get their proposals approved are not the ones with the best ideas. They are the ones who communicate those ideas most clearly, anticipate the right objections, and make it easy for reviewers to say yes. Writing is a technical skill. Invest in it.
