Building a Multi-Region Disaster Recovery Setup on AWS with Terraform
Disaster Recovery (DR) is an essential part of cloud architecture. It ensures that your workloads remain available even when an entire AWS region experiences downtime. In this post, I’ll walk you through how to design and deploy a multi-region disaster recovery setup using Terraform while following AWS best practices.
Why Multi-Region DR Matters
Relying on a single AWS region can be risky. Hardware failures, outages, or network disruptions in one region can make your application unavailable. A multi-region architecture provides redundancy by replicating critical resources across regions.
In this example, we’ll use us-east-1 as the primary region and eu-west-1 as the disaster recovery region. Terraform will help us maintain consistency and automation across both regions
Project Overview
We’ll deploy:
1.Two VPCs (one in each region)
2.S3 buckets with cross-region replication
3.An RDS primary instance and a read replica in another region
4.Route 53 for health checks and DNS failover
5.Optional EC2 instances or ECS services behind an Application Load Balancer
Provider Configuration
In providers.tf
provider "aws" {
alias = "primary"
region = "us-east-1"
}
provider "aws" {
alias = "dr"
region = "eu-west-1"
}
-Using provider aliases helps Terraform manage resources across multiple regions within one configuration.
S3 Cross-Region Replication
In main.tf
resource "aws_s3_bucket" "primary" {
provider = aws.primary
bucket = "primary-dr-bucket"
versioning {
enabled = true
}
}
resource "aws_s3_bucket" "replica" {
provider = aws.dr
bucket = "replica-dr-bucket"
versioning {
enabled = true
}
}
resource "aws_s3_bucket_replication_configuration" "replication" {
provider = aws.primary
bucket = aws_s3_bucket.primary.id
role = aws_iam_role.replication.arn
rules {
id = "replication"
status = "Enabled"
destination {
bucket = aws_s3_bucket.replica.arn
storage_class = "STANDARD"
}
}
}
This ensures that any data stored in the primary bucket is automatically replicated to the disaster recovery region.
RDS Cross-Region Read Replica
You can deploy an RDS instance in us-east-1 and configure a read replica in eu-west-1
resource "aws_db_instance" "primary" {
provider = aws.primary
identifier = "primary-db"
engine = "mysql"
instance_class = "db.t3.micro"
allocated_storage = 20
username = "admin"
password = "password123"
skip_final_snapshot = true
}
resource "aws_db_instance" "replica" {
provider = aws.dr
identifier = "replica-db"
replicate_source_db = aws_db_instance.primary.arn
instance_class = "db.t3.micro"
skip_final_snapshot = true
}
This ensures that if the primary database fails, you can promote the replica in the disaster recovery region to act as the new primary.
DNS Failover with Route 53
Route 53 helps direct traffic based on the health of your resources. You can configure a health check and failover policy to automatically reroute users when the primary endpoint becomes unreachable.
resource "aws_route53_health_check" "primary" {
fqdn = "primary.example.com"
type = "HTTP"
resource_path = "/"
failure_threshold = 3
}
resource "aws_route53_record" "failover" {
zone_id = "ZXXXXXXXXXXXX"
name = "app.example.com"
type = "A"
set_identifier = "primary"
failover_routing_policy {
type = "PRIMARY"
}
alias {
name = aws_lb.primary.dns_name
zone_id = aws_lb.primary.zone_id
evaluate_target_health = true
}
}
Terraform Best Practices
1.Use Terraform modules to organize code by function such as VPC, S3,and RDS.
2.Use remote state storage like S3 with DynamoDB locking to avoid state conflicts.
3.Apply the principle of least privilege in IAM policies.
4.Always enable versioning and encryption for data resources.
5.Automate testing and validation using Terraform Cloud or GitHub Actions.
Testing the Failover
1.Deploy the infrastructure using terraform apply.
2.Simulate a regional failure by shutting down resources in us-east-1.
3.Verify that Route 53 redirects traffic to the disaster recovery region
You can find the full project setup and code on my GitHub repository
https://github.com/Copubah/aws-multi-region-disaster-recovery