How Serverless Computing Changes the Way Applications Are Built
Serverless computing lets developers deploy code without managing servers. Learn how AWS Lambda, event-driven architecture, and cold starts work in serverless systems.
Code That Runs Without a Server — And Bills by the Millisecond
AWS Lambda, launched in 2014, introduced a pricing model that charged customers for 100-millisecond execution increments. Netflix uses Lambda to run over 100 billion function invocations per month. iRobot processes data from its Roomba fleet through serverless pipelines without maintaining a single persistent server. Coca-Cola replaced a vending machine payment processing system running on 10 EC2 instances with a Lambda function that costs $4,000 per year instead of $13,000 — while handling significantly higher throughput.
Serverless computing is a cloud execution model where the provider dynamically allocates machine resources to run code, manages all server-side infrastructure, and charges based on actual execution time rather than reserved capacity. The "serverless" label is a misnomer — servers still exist — but the developer has zero visibility into or responsibility for them.
How Serverless Functions Execute
The core serverless primitive is the Function as a Service (FaaS). A developer writes a function — a discrete unit of code with a defined entry point — and deploys it to the platform. The function remains dormant until an event triggers it. The platform instantiates a container or microVM to execute the function, runs it, returns the result, and (eventually) terminates the execution environment.
- Trigger sources: HTTP requests via API Gateway, queue messages (SQS, Pub/Sub), database change events, scheduled cron jobs, storage events (S3 object upload), IoT device messages
- Execution environment: AWS Lambda uses Firecracker microVMs — lightweight virtual machines that boot in ~125ms and provide hardware-level isolation while sharing a single host kernel
- Resource limits: AWS Lambda allows up to 10GB memory, 6 vCPUs (proportional to memory), 15-minute maximum execution time, 512MB to 10GB ephemeral storage
- Concurrency: When 1,000 requests arrive simultaneously, 1,000 function instances execute in parallel — scaling automatically without configuration, subject to account-level concurrency limits
The Cold Start Problem
Cold starts are the most discussed limitation of serverless computing. When a function is invoked after a period of inactivity, the platform must provision a new execution environment, download the code package, initialize the runtime (Node.js, Python, Java JVM, etc.), and run any initialization code before executing the handler. This process takes anywhere from tens of milliseconds to several seconds, depending on the runtime and package size.
| Factor | Impact on Cold Start Duration | Mitigation |
|---|---|---|
| Runtime choice | Java/C# cold starts 1–10s; Node.js/Python 100–500ms | Use lightweight runtimes for latency-sensitive paths |
| Package size | Larger packages take longer to download and initialize | Minimize dependencies; use Lambda layers for shared libraries |
| VPC attachment | VPC-attached functions require ENI provisioning (historically +10s) | AWS resolved this with Hyperplane ENIs; now adds ~100ms |
| Memory allocation | Higher memory = more CPU = faster initialization | Allocate more memory if cold start latency is critical |
| Provisioned concurrency | Pre-warmed instances eliminate cold starts entirely | Reserve minimum concurrency for latency-critical functions |
AWS Lambda's Provisioned Concurrency, announced in 2019, allows customers to pre-initialize a specified number of execution environments that remain warm and ready to respond without cold start latency — at additional cost proportional to the concurrency reserved.
Serverless vs. Traditional and Container-Based Architectures
| Characteristic | Virtual Machines | Containers (Kubernetes) | Serverless (FaaS) |
|---|---|---|---|
| Provisioning | Minutes | Seconds | Milliseconds (warm) / Seconds (cold) |
| Scaling granularity | Instances | Pods/replicas | Individual requests |
| Infrastructure management | Full responsibility | Container orchestration (kubelets, nodes) | Zero — fully managed |
| Cost model | Per-instance/hour | Per-cluster/per-node/hour | Per-invocation + per-GB-second |
| State handling | Stateful, persistent disk | Ephemeral, external state stores | Stateless; external state required |
| Maximum execution time | Unlimited | Unlimited | 15 minutes (Lambda) |
Event-Driven Architecture Patterns
Serverless functions naturally compose into event-driven architectures — systems where components react to events rather than being invoked synchronously through direct service calls. This architectural pattern differs fundamentally from request-response monoliths.
- Fan-out: One event triggers multiple functions in parallel — an image upload triggers simultaneous thumbnail generation, metadata extraction, and content moderation
- Queue-based decoupling: Functions read from SQS/Kafka queues, providing natural backpressure and retry semantics when downstream functions are slow or failing
- Choreography vs. orchestration: Services react independently to shared events (choreography) or a central orchestrator coordinates function execution sequence (orchestration via AWS Step Functions or similar)
- Saga pattern: Long-running business transactions implemented as sequences of serverless functions with compensating actions for rollback when individual steps fail
Where Serverless Excels and Where It Struggles
Serverless computing excels for event-driven workloads with variable or unpredictable traffic. API backends with spiky request patterns, data processing pipelines, webhook handlers, and scheduled batch jobs are natural fits. The cost efficiency for low-traffic workloads is particularly compelling — AWS Lambda's free tier covers 1 million invocations and 400,000 GB-seconds per month, sufficient for many low-traffic APIs to run at zero cost.
Serverless architecture introduces genuine constraints for certain workloads. Long-running processes that exceed the 15-minute Lambda timeout, applications requiring persistent local disk state, systems where cold start latency is unacceptable for user-facing operations, and workloads with sustained high throughput (where always-on containers become more cost-effective) all present challenges. The 2023 "Prime Video team abandons microservices/serverless" case study sparked debate when the team reported reducing costs by 90% by moving from a Step Functions + Lambda architecture to a monolithic EC2-hosted service — illustrating that serverless is not universally optimal.
The trajectory of serverless is toward finer-grained execution environments (Cloudflare Workers runs V8 isolates with ~0ms cold starts), edge deployment (running functions at 200+ global edge locations rather than centralized regions), and tighter integration with AI inference workloads — where per-request GPU billing makes serverless economics particularly attractive for models serving sporadic inference requests.
Related Articles
cloud computing
AWS vs Azure vs Google Cloud: Comparing the Big Three
Compare Amazon Web Services, Microsoft Azure, and Google Cloud Platform across services, pricing, strengths, and use cases to understand how the three major cloud providers differ.
10 min read
cloud computing
How Cloud Computing Transformed the Software Industry
AWS launched in 2006 and changed how software is built forever. Explore how cloud computing reshaped development practices, business models, and infrastructure management.
9 min read
cloud computing
How Cloud Storage Works: Distributed Systems and Data Centers
Understand how cloud storage works under the hood — from object storage and distributed file systems to data replication, consistency models, and how providers like AWS S3 achieve massive durability.
10 min read
cloud computing
How IaaS, PaaS, and SaaS Cloud Service Models Differ
IaaS, PaaS, and SaaS represent different levels of cloud abstraction. Learn what each model provides, who manages what, and which workloads fit each model best.
9 min read