The Platform That Lets Developers Ship Without Waiting on Ops
Platform engineering is the practice of building internal developer platforms — the tools, workflows, and self-service capabilities that let application developers focus on features rather than infrastructure. It’s become one of the fastest-growing disciplines in software engineering, driven by a straightforward observation: every team beyond a certain size reinvents the same internal tooling, and that reinvention is expensive. This guide covers what platform engineering actually involves, the concrete tools used to build it, and honest guidance on when it’s worth investing in.
The Problem Platform Engineering Solves
Without a platform team, application developers typically spend 20-30% of their time on tasks that are not core to their product:
- Configuring CI/CD pipelines for new services
- Setting up Kubernetes deployments, services, and ingress
- Managing secrets (rotating credentials, updating environment variables)
- Requesting and waiting for new databases, queues, or storage buckets
- Configuring monitoring and alerting for new services
- Navigating compliance requirements for production access
A platform team builds self-service systems that reduce this overhead. The output isn’t just tools — it’s a reduction in the cognitive load and wait time that slows down product engineering.
The Internal Developer Portal: Backstage
Spotify’s Backstage has become the standard foundation for internal developer portals. It provides a service catalog, software templates for bootstrapping new services, and a plugin system for integrating all your internal tools into one interface.
# Install Backstage
npx @backstage/create-app@latest
cd my-platform
yarn dev
# Or use the official Docker image
docker run -p 7007:7007 spotify/backstage
The service catalog is the core: a single place to see every service, its owner, documentation, deployment status, and dependencies.
# catalog-info.yaml — checked into each service's repository
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: order-service
title: Order Service
description: Handles order creation, management, and fulfillment
annotations:
github.com/project-slug: myorg/order-service
backstage.io/techdocs-ref: dir:.
pagerduty.com/service-id: P1234
datadog/service-name: order-service
tags:
- java
- kafka
- payments
spec:
type: service
owner: team-commerce
lifecycle: production
dependsOn:
- component:payment-service
- component:inventory-service
- resource:orders-postgres
providesApis:
- order-api
Software templates allow developers to scaffold new services without reading documentation:
# template.yaml — a software template for a new microservice
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: java-microservice
title: Java Spring Boot Microservice
description: Creates a new Java microservice with CI/CD, monitoring, and deployment pre-configured
spec:
parameters:
- title: Service Details
properties:
name:
title: Service Name
type: string
pattern: '^[a-z][a-z0-9-]*$'
owner:
title: Owner Team
type: string
ui:field: OwnerPicker
description:
title: Description
type: string
steps:
- id: fetch-template
name: Fetch Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
- id: create-github-repo
name: Create GitHub Repository
action: publish:github
input:
repoUrl: github.com?repo=${{ parameters.name }}&owner=myorg
defaultBranch: main
- id: register-in-catalog
name: Register in Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps['create-github-repo'].output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
- id: create-datadog-monitor
name: Create Monitoring
action: http:backstage:request
input:
method: POST
path: /api/proxy/datadog/monitor
body:
name: "Error rate - ${{ parameters.name }}"
type: metric alert
query: "avg(last_5m):sum:trace.servlet.request.errors{service:${{ parameters.name }}} / sum:trace.servlet.request.hits{service:${{ parameters.name }}} > 0.05"
A developer runs this template, fills in the form, and gets a production-ready repository with CI/CD, Kubernetes manifests, monitoring, and catalog registration — in under 5 minutes, without waiting for any ops involvement.
Infrastructure as Self-Service: Crossplane
Crossplane extends Kubernetes to manage external cloud resources using the same GitOps workflow used for applications. It’s how platform teams give developers self-service access to databases, queues, and storage without exposing cloud credentials.
# A developer creates a database by submitting a Kubernetes resource
# The platform team defines what "a database" means for their environment
apiVersion: database.platform.mycompany.com/v1alpha1
kind: PostgresDatabase
metadata:
name: order-service-db
namespace: team-commerce
spec:
storageGB: 20
version: "16"
tier: standard # Platform team defines what 'standard' means
backupEnabled: true
maintenanceWindow: "sun:03:00-sun:04:00"
# Behind the scenes, Crossplane translates this to an AWS RDS instance
# The platform team defines the Composition
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: postgres-database-aws
spec:
compositeTypeRef:
apiVersion: database.platform.mycompany.com/v1alpha1
kind: PostgresDatabase
resources:
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
region: us-east-1
engine: postgres
instanceClass: db.t3.medium
allocatedStorage: 20
backupRetentionPeriod: 7
patches:
- type: FromCompositeFieldPath
fromFieldPath: spec.storageGB
toFieldPath: spec.forProvider.allocatedStorage
Application developers declare what they need. The platform team controls how it’s provisioned. Compliance, cost controls, and best practices are encoded in the Composition — not documented in a wiki that nobody reads.
Internal Golden Paths: Opinionated Defaults
The platform team’s most impactful output is often not a tool but a set of opinionated defaults — the “golden path” for common tasks. A golden path is the recommended way to do something, implemented as a template or tool that makes the right choice the easy choice.
Examples of golden paths worth building:
- Service template: A GitHub repository template with Dockerfile, Kubernetes manifests, CI pipeline, and observability pre-configured
- Secrets template: A standard pattern for using Vault or AWS Secrets Manager — developers get a documented, working example rather than figuring it out from scratch
- Observability template: Pre-built Grafana dashboards for new services — error rate, latency, saturation — auto-provisioned when a new service is created
- Runbook template: A standard runbook structure that every service uses, with the same sections and escalation paths
Measuring Platform Engineering Success
Platform teams are infrastructure teams — their customers are internal developers. The right metrics reflect developer productivity and satisfaction, not platform uptime:
- Time to first deployment for a new service: From repo creation to first production deploy. Track this longitudinally.
- Developer experience score: Quarterly survey asking developers to rate friction in their workflow (1-10)
- Self-service ratio: Percentage of infrastructure requests fulfilled without a manual ops ticket
- DORA metrics: Deployment frequency, lead time for changes, change failure rate, time to restore — the platform team’s work should move these
When Platform Engineering Is Premature
Platform engineering has a cost: it requires engineers who aren’t building product features. The question is whether that investment returns more value than direct product work.
Platform engineering is premature when:
- Your engineering team is under 15-20 people
- You have fewer than 5-10 services in production
- Each team still understands the full stack end-to-end
- Infrastructure work isn’t a measurable bottleneck
It becomes worthwhile when:
- Multiple teams are duplicating infrastructure setup work
- New team onboarding takes weeks because internal tooling isn’t documented
- Developers are waiting on ops tickets to get basic resources provisioned
- Security and compliance controls are being bypassed because the compliant path is too slow
The principle is the same as any shared infrastructure: invest in the platform when the cost of NOT having it exceeds the cost of building it. At small scale, a well-organized wiki and a few shell scripts is your platform. As you grow, the investment in self-service tooling compounds — every developer who can deploy independently is a developer not waiting on someone else.
