What Is Infrastructure as Code: Terraform, Ansible, and Automating Your Stack

What Is Infrastructure as Code?

Infrastructure as Code (IaC) is the practice of managing and provisioning computing infrastructure—servers, networks, databases, load balancers, storage, and all the other resources that make up a modern application stack—through machine-readable configuration files or scripts rather than through manual processes or interactive tools. Instead of logging into a cloud console to click through a wizard to create a virtual machine, or SSHing into a server to manually install software and edit configuration files, engineers define their infrastructure in code that can be version-controlled, tested, reviewed, and automatically executed by machines.

The transformation IaC represents is analogous to the shift from hand-crafting software artifacts to using build systems and package managers: it brings the rigor, automation, and reproducibility of software engineering practices to the management of infrastructure. Before IaC became widespread, infrastructure management was plagued by "snowflake servers"—uniquely configured machines whose exact state was known only to whoever had last SSH'd into them—and "configuration drift," where servers that were intended to be identical gradually diverged as different people made different manual changes over time. Reproducing an environment for testing, disaster recovery, or scaling was slow, error-prone, and dependent on institutional knowledge rather than documented process.

IaC solves these problems by making the infrastructure definition the authoritative source of truth, stored in version control alongside application code. Every change to infrastructure is made by modifying the IaC configuration and applying it through an automated process, creating a complete audit trail of who changed what and when. Environments can be created and destroyed repeatably from the same configuration—the same code that provisions a production environment can create an identical staging environment in minutes. Failed infrastructure states can be diagnosed by examining the code history, and rollbacks can be automated. These properties dramatically accelerate the pace at which teams can operate while simultaneously improving reliability and reducing the risk of environmental inconsistencies that cause mysterious production-only bugs.

Declarative vs. Imperative Approaches

IaC tools fall into two broad categories based on their fundamental approach: declarative and imperative. Declarative IaC tools ask you to describe the desired end state of your infrastructure—"I want a Virtual Private Cloud with these subnets, three EC2 instances of this type with this security group, a load balancer distributing traffic between them, and an RDS database with these parameters"—and the tool figures out what steps are needed to achieve that state from the current actual state. Terraform, AWS CloudFormation, Azure Resource Manager templates, and Pulumi (when used declaratively) are examples of declarative tools.

Imperative IaC tools ask you to describe the steps to take to create or modify infrastructure—"first create this VPC, then create these subnets, then launch these instances, then configure this software." Ansible, Chef, Puppet, and Bash scripts are examples of imperative or procedural approaches. The distinction matters practically: declarative tools are generally better at idempotency (running the same configuration multiple times produces the same result—the tool detects what already exists and only makes necessary changes) and at expressing the desired final state clearly. Imperative tools offer more fine-grained control over execution order and are better suited to configuration management—installing and configuring software on existing servers—where the exact sequence of steps matters and the "desired state" is harder to express as a simple set of resource declarations.

In practice, modern IaC stacks often use both types: a declarative tool like Terraform to provision cloud resources (creating the servers, networks, and managed services), and a configuration management tool like Ansible to install and configure software on the provisioned servers. This layered approach plays to the strengths of each tool type. Some newer tools, particularly Pulumi (which allows writing infrastructure definitions in general-purpose programming languages like TypeScript, Python, or Go), blur the distinction by providing declarative resource management while giving access to the full expressive power of a programming language for logic, loops, and abstractions.

Terraform: The Cloud-Agnostic Standard

Terraform, created by HashiCorp (now acquired by IBM) and released in 2014, has become the most widely adopted infrastructure as code tool, particularly for cloud resource provisioning. Its key design decisions—a declarative configuration language (HCL, HashiCorp Configuration Language), an explicit execution plan that shows what changes will be made before applying them, a state file that tracks the current known state of managed resources, and a modular provider architecture that extends Terraform to manage virtually any cloud or API—have made it the default choice for multi-cloud and cloud-agnostic infrastructure management.

Terraform's workflow has three main steps. First, terraform init downloads the required provider plugins (AWS, Azure, GCP, Kubernetes, Datadog, GitHub, and thousands of others are available as Terraform providers). Second, terraform plan performs a dry run: it reads the current state of your infrastructure, compares it to your configuration, and shows you exactly what resources will be created, modified, or destroyed. This plan step is a critical safety mechanism that prevents surprises and allows code review of proposed infrastructure changes before they are applied. Third, terraform apply executes the plan, making the actual API calls to provision or modify resources, and updates the state file to reflect the new reality.

Terraform modules are reusable, parameterizable configurations that encapsulate common infrastructure patterns—a module might define a standard, security-hardened VPC configuration, a Kubernetes cluster with appropriate node groups, or a serverless application's backend resources. The Terraform Registry provides thousands of community and vendor-published modules that accelerate infrastructure development by providing battle-tested implementations of common patterns. Teams typically develop internal module libraries that encode their organization's infrastructure standards and best practices, allowing product teams to consume compliant infrastructure configurations without needing deep Terraform expertise.

Terraform state management is one of the most important operational considerations for Terraform at scale. The state file—which records the current known state of all managed resources, including resource IDs and metadata that are needed to make subsequent changes—must be stored in a location accessible to all team members and to CI/CD pipelines, and must be locked during apply operations to prevent concurrent modifications from corrupting it. Remote state backends (Terraform Cloud, AWS S3 with DynamoDB locking, Azure Blob Storage, Google Cloud Storage) provide the necessary sharing and locking mechanisms. Managing state carefully—particularly when importing existing resources, moving resources between state files, or recovering from corrupted state—is a skill that takes time to develop and is among the more challenging aspects of operating Terraform at production scale.

Ansible: Configuration Management and Orchestration

Ansible, created by Michael DeHaan in 2012 and acquired by Red Hat (now IBM) in 2015, takes a different approach from Terraform. While Terraform focuses on provisioning cloud resources, Ansible excels at configuration management—installing packages, configuring services, deploying application code, and executing administrative tasks on existing servers. Ansible is agentless: unlike Chef or Puppet, which require an agent running on each managed server, Ansible connects to servers via SSH (or WinRM for Windows) and executes tasks remotely, eliminating the overhead of managing agents.

Ansible configurations are written in YAML as playbooks—ordered sets of plays, each targeting a group of servers and defining a series of tasks to execute. Tasks call Ansible modules—self-contained units of functionality for common operations like installing packages (yum, apt), managing files (copy, template, file), managing services (service, systemd), configuring users (user, group), or making HTTP requests (uri). Thousands of built-in modules cover the vast majority of common system administration tasks, and custom modules can be written in Python or any language. Ansible roles provide a standard way to organize related tasks, handlers, templates, and variables into reusable units that can be shared through Ansible Galaxy, the community hub for roles and collections.

Ansible's idempotency—the property that running the same playbook multiple times produces the same result—is fundamental to its usefulness as a configuration management tool. Most Ansible modules check the current state before taking action: the package module checks if a package is already installed before attempting installation; the template module checks if the rendered template matches the existing file before writing it; the service module checks if a service is already in the desired state before starting or stopping it. This means Ansible playbooks can be run regularly as a convergence mechanism—continuously enforcing the desired configuration and correcting any drift that has occurred since the last run.

Ansible's role in modern infrastructure has evolved as container and Kubernetes adoption has grown. For containerized applications, much of the configuration management work that Ansible previously handled is now done inside container images (built with Dockerfiles) and deployed through Kubernetes manifests. But Ansible remains valuable for managing the underlying infrastructure that containers run on—Kubernetes node configuration, network devices, databases not running in containers, and legacy systems that have not been containerized. Ansible is also widely used for cloud provisioning through its extensive library of cloud modules, though Terraform has generally won that space for teams needing a single source of truth for cloud resource state.

GitOps: IaC Meets Continuous Delivery

GitOps is a paradigm that extends IaC principles to define a complete operational model for managing infrastructure and applications. In GitOps, a Git repository is the single source of truth for the desired state of the entire system—both application code and infrastructure configuration live in Git. Automated processes (typically CD pipelines or dedicated GitOps operators like Flux or Argo CD for Kubernetes environments) continuously compare the desired state in Git with the actual state in the running environment and apply any necessary changes to bring the two into sync.

The key properties that define a GitOps workflow are: declarative descriptions of the entire system in code stored in Git; automatic convergence—the system automatically applies changes from Git to the running environment; Git as the mechanism for change, review, and rollback—all changes go through pull requests with code review, and rollbacks are simply reverting commits; and observable system state—it is always possible to see what the current desired state is and whether the running system matches it. These properties make GitOps a powerful framework for both continuous delivery (new application versions deployed by merging to main and triggering automated synchronization) and operational safety (changes require review, the audit trail is complete, and rollback is fast and reliable).

Kubernetes has been the primary context in which GitOps has been developed and adopted, with tools like Argo CD, Flux, and Rancher Fleet providing GitOps operators that watch Git repositories and synchronize Kubernetes cluster state with the manifests they contain. But GitOps principles apply equally to Terraform-managed infrastructure when combined with CI/CD pipelines that run terraform plan on pull requests and terraform apply when changes are merged. The emerging practice of combining Terraform with GitOps tooling like Atlantis (which provides a GitOps workflow for Terraform through pull request automation) or Spacelift creates a complete GitOps experience for cloud infrastructure provisioning, extending the same development workflow practices from application code to infrastructure management.

Best Practices for IaC at Scale

As organizations grow their IaC practices, several patterns consistently emerge as important for maintaining manageability, reliability, and security. Modularization—breaking infrastructure into reusable modules that encode organizational standards—prevents the proliferation of similar-but-different configurations across teams and makes it easier to update patterns consistently when requirements change. Module versioning (pinning to specific module versions in consuming configurations) prevents unexpected changes from upstream module updates while allowing deliberate upgrades with testing.

Testing IaC is increasingly recognized as essential for large or rapidly evolving infrastructure codebases. Infrastructure tests can be static (linting and policy checking before any infrastructure is changed—tools like tflint, Checkov, and Open Policy Agent enable this), integration tests that actually provision real infrastructure in an isolated environment and verify it behaves as expected (Terratest and Kitchen-Terraform provide frameworks for this), and drift detection that regularly compares actual infrastructure state against the IaC definition and alerts on deviations. While infrastructure testing requires more effort than application unit testing—real provisioning takes real time and costs real money—the investment pays off in catching problems before they reach production.

Secret management is among the most critical IaC security concerns. Infrastructure code frequently needs access to credentials, API keys, and other sensitive values to configure services. Storing these in plaintext in IaC files or version control is a serious security risk. Solutions include integrating with secrets management platforms (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault) to retrieve secrets at runtime, using CI/CD platform secret stores to inject secrets into pipeline environments without storing them in code, and encrypting sensitive values with tools like Mozilla SOPS or git-crypt for cases where secrets must be stored in Git. Establishing a clear policy on secret handling and consistently enforcing it through automated scanning (detecting accidental credential commits with tools like GitGuardian or truffleHog) is a baseline requirement for security-conscious IaC operations. As infrastructure footprints grow and teams scale, the investment in good IaC practices—modularization, testing, GitOps workflows, security controls—multiplies in value, enabling organizations to manage complexity that would otherwise overwhelm their operations teams.

What Is Infrastructure as Code: Terraform, Ansible, and Automating Your Stack