The Error Message Nobody Reads
Here is an error message from a real production system: Error: operation failed (code: 4012). I found this in a support ticket where a customer had been stuck for three days trying to figure out what went wrong. The engineer who wrote it probably spent five seconds on it. The customer spent 72 hours. Multiply that by every user who hits that error, and you start to understand why error messages are one of the highest-leverage improvements a backend engineer can make.
Bad error messages are not just a UX problem — they are an operational cost center. Every cryptic error generates support tickets. Every vague exception slows down debugging during incidents. Every misleading message sends engineers down the wrong diagnostic path. Writing good error messages is not polish work you do after the “real engineering” is done. It is engineering.
The Anatomy of a Useful Error Message
A useful error message answers three questions:
- What happened? A precise description of the failure.
- Why did it happen? The condition or constraint that was violated.
- What can the user do about it? A concrete next step, or at minimum, information that helps them get unstuck.
Most error messages only answer the first question, and they answer it badly. Let us look at real examples across common scenarios:
Authentication Errors
# Bad
{"error": "unauthorized"}
# Better
{"error": "authentication_failed",
"message": "The API key provided is not valid for this environment."}
# Best
{"error": "authentication_failed",
"message": "The API key starting with 'sk_test_...' is a test key, but this request was sent to the production endpoint (api.example.com). Use your production key (starting with 'sk_live_') or send requests to api-test.example.com.",
"docs": "https://docs.example.com/authentication#environments"}
Notice the progression. The first tells you nothing actionable. The second tells you what went wrong. The third tells you exactly what you did, why it failed, and what to do instead. It even identifies the specific key format to make diagnosis instant.
Validation Errors
# Bad
{"error": "invalid request"}
# Better
{"error": "validation_error",
"message": "The 'email' field is invalid."}
# Best
{"error": "validation_error",
"field": "email",
"message": "The email address 'user@.com' is not valid. Email addresses must have a domain with at least one dot (e.g., user@example.com).",
"received": "user@.com"}
Including the received value in the error message is a small detail that saves enormous debugging time. Without it, the user has to figure out what they actually sent, which might involve digging through logs or replaying requests.
Rate Limiting
# Bad
{"error": "too many requests"}
# Better
{"error": "rate_limited",
"message": "Rate limit exceeded."}
# Best
{"error": "rate_limited",
"message": "You have exceeded the rate limit of 100 requests per minute for this endpoint. Your current usage: 142 requests in the last 60 seconds.",
"limit": 100,
"window": "60s",
"current": 142,
"retry_after": 23,
"headers": {
"X-RateLimit-Limit": "100",
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": "1711540823"
},
"docs": "https://docs.example.com/rate-limits"}
Error Message Patterns for APIs
After reviewing error handling in dozens of production APIs, I have identified patterns that consistently produce useful error messages.
Pattern 1: Structured Error Response
Adopt a consistent error response structure across your entire API. Here is a format that works well:
# Python / FastAPI example
from fastapi import HTTPException, Request
from fastapi.responses import JSONResponse
from pydantic import BaseModel
from typing import Optional, List
import uuid
class ErrorDetail(BaseModel):
field: Optional[str] = None
message: str
code: str
class ErrorResponse(BaseModel):
error: str
message: str
request_id: str
details: Optional[List[ErrorDetail]] = None
docs: Optional[str] = None
@app.exception_handler(HTTPException)
async def http_exception_handler(request: Request, exc: HTTPException):
return JSONResponse(
status_code=exc.status_code,
content=ErrorResponse(
error=exc.detail.get("code", "unknown_error"),
message=exc.detail.get("message", str(exc.detail)),
request_id=request.state.request_id,
details=exc.detail.get("details"),
docs=exc.detail.get("docs"),
).model_dump(exclude_none=True),
)
# Usage:
raise HTTPException(
status_code=422,
detail={
"code": "invalid_date_range",
"message": "The end_date must be after the start_date. You provided start_date=2026-04-01 and end_date=2026-03-15.",
"details": [
{"field": "end_date", "message": "Must be after start_date", "code": "date_order"},
],
"docs": "https://docs.example.com/api/orders#date-filtering",
},
)
Pattern 2: Error Codes as a Contract
Numeric error codes are meaningless to humans. String error codes are self-documenting and serve as a stable API contract:
# Bad: numeric codes that require a lookup table
{"error_code": 4012} # What does this mean?
# Good: string codes that are self-explanatory
{"error": "payment_method_expired"}
{"error": "insufficient_permissions"}
{"error": "resource_not_found"}
{"error": "concurrent_modification"}
# These codes become part of your API contract.
# Clients can match on them programmatically:
match response.error:
case "payment_method_expired":
prompt_user_to_update_payment()
case "insufficient_permissions":
request_elevated_access()
case "resource_not_found":
handle_missing_resource()
case _:
show_generic_error(response.message)
Pattern 3: Context-Rich Internal Errors
Internal error messages (logs, traces, exception messages) should contain even more context than user-facing ones. Include everything an on-call engineer needs to diagnose the issue without additional queries:
import structlog
logger = structlog.get_logger()
async def process_order(order_id: str, user_id: str):
order = await db.get_order(order_id)
if not order:
logger.error(
"order_not_found",
order_id=order_id,
user_id=user_id,
action="process_order",
hint="Check if order was created in a different region or if it was soft-deleted",
)
raise OrderNotFoundError(
f"Order {order_id} not found for user {user_id}. "
f"Checked primary database in us-east-1. "
f"Order may exist in a different region or may have been deleted."
)
if order.status != "pending":
logger.warning(
"order_invalid_state_transition",
order_id=order_id,
current_status=order.status,
requested_transition="pending -> processing",
hint="This usually indicates a duplicate webhook or race condition",
)
raise InvalidStateError(
f"Cannot process order {order_id}: current status is '{order.status}', "
f"but expected 'pending'. This may indicate a duplicate request. "
f"Last status change: {order.updated_at.isoformat()}"
)
Common Anti-Patterns
Anti-Pattern 1: Swallowing Exceptions
# The worst thing you can do
try:
result = await external_api.call(payload)
except Exception:
return {"error": "something went wrong"} # NEVER DO THIS
# What you should do instead
try:
result = await external_api.call(payload)
except ConnectionError as e:
logger.error("external_api_connection_failed",
endpoint=external_api.url, error=str(e))
raise HTTPException(
status_code=502,
detail={
"code": "upstream_connection_failed",
"message": "Unable to connect to the payment processor. This is usually temporary. Please retry in 30 seconds.",
"retry_after": 30,
},
)
except TimeoutError as e:
logger.error("external_api_timeout",
endpoint=external_api.url, timeout_ms=5000)
raise HTTPException(
status_code=504,
detail={
"code": "upstream_timeout",
"message": "The payment processor did not respond within 5 seconds. Your payment may still be processing. Check the payment status before retrying.",
},
)
Anti-Pattern 2: Leaking Internal Details
# Dangerous: exposes database schema and query
{"error": "ProgrammingError: column users.ssn does not exist.
Query: SELECT ssn, name FROM users WHERE id = 42"}
# Safe: meaningful message without internal leakage
{"error": "internal_error",
"message": "An internal error occurred while retrieving user profile. Our team has been notified.",
"request_id": "req_abc123"}
The request_id is critical here. It gives the user something to include in a support ticket that lets your team find the detailed error in your logs without exposing sensitive information.
Anti-Pattern 3: Boolean Error Fields
# Useless
{"success": false}
# Also useless
{"ok": false, "error": true}
# These tell the consumer nothing about what went wrong or what to do
Testing Error Messages
Error messages should be tested as deliberately as happy-path behavior. Add assertions for error message quality in your test suite:
def test_expired_api_key_returns_helpful_error():
response = client.get(
"/api/users",
headers={"Authorization": "Bearer sk_test_expired_key_123"}
)
assert response.status_code == 401
body = response.json()
# Verify error structure
assert "error" in body
assert "message" in body
assert "request_id" in body
# Verify message is actually helpful
assert "expired" in body["message"].lower()
assert "sk_test_" in body["message"] # References the specific key
assert "docs" in body or "renew" in body["message"].lower() # Actionable
A Style Guide for Error Messages
Here are the rules I enforce on every team I work with:
- Never say just “invalid” or “error”. Say what is invalid and why.
- Include the offending value when it is not sensitive. “Expected an integer but received ‘abc'” is infinitely better than “type error.”
- Use the user’s language, not yours. “The webhook URL must use HTTPS” is better than “TLS validation failed on callback_url.”
- Suggest a fix. If you know what the user should do, tell them.
- Include a request ID in every error response. This is the bridge between the user’s experience and your internal logs.
- Link to documentation for errors that have nuanced solutions.
- Never expose stack traces, queries, or internal paths to end users.
- Differentiate between client errors and server errors. The user needs to know whether to fix their request or just retry.
Writing useful error messages is not glamorous work. Nobody will praise you for a well-crafted 422 response. But your future self, your on-call teammates, and your users will silently thank you every time they hit an error and immediately know what to do about it. That is the kind of engineering that compounds.
