From Nick Basinger

Nick Basinger

Subscribe to Nick Basinger: eMailAlertsEmail Alerts
Get Nick Basinger: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Cloud Computing, Infrastructure On Demand, Azure Cloud on Ulitzer, Amazon Cloud Journal, Cloud Application Management

Blog Post

Infrastructure as Code (IAC)

Best Practices for Achieving High Availability with Terraform in the Cloud (AWS, GCP, AZURE)

If you haven't been source controlling your environment with an Infrastructure-as-Code (IAC) process or tool, you are greatly missing out on the adaptability and flexibility of the cloud you are utilizing. Terraform and other Infrastructure as Code (IAC) tools such as Cloud Formation and Ansible enable you to feel greater ease and comfort optimizing your infrastructure the way you should be. With AWS and other public cloud providers coming out with new features every week, you end up feeling exhausted not being able to keep up with the platform you are running on. Terraform allows you to achieve flexibility with your cloud infrastructure, because you are no longer held back by any one-time decisions of the past concerning your architectural design. Terraform allows you to quickly import your current infrastructure (AWS, Azure, legacy/on-premise), as well as create separate state files for your different regions/zones so you can quickly manage multiple parts of your infrastructure separately and safely. Combined with a version control system (git, bitbucket), you always have the option to revert back changes. Terraform has been gaining momentum from the open-source community in recent years and months. The last commit to git as of writing this (07/14/2018) was one day ago! One of Terraform's key features that Cloud Formation lacks is its ability to take control of your current infrastructure safely. Cloud Formation requires you to rebuild your entire environment and Ansible is not 100% idempotent, meaning an accidental update to your infrastructure can occur. With Terraform you know exactly what changes will occur when using the 'terraform plan' command. It does this by maintaining a terraform state file (.tfstate) and checking it against your config file (.tf).

Unfortunately, just like with any other IAC tool, manual changes to the infrastructure controlled by Terraform can cause issues, and must be resolved before continuing to use any of the IAC tools out there, Terraform not being an exception. IAM controls must be placed on your infrastructural changes (VPC, Subnets, Route 53, etc.), and could be as easy as only allowing the terraform iam system user ("my_terraform_user") the ability to change those particular resources. Cloud Formation recently came out or will come out this year (Reference: Cloud Formation Deep Dive at Washington DC Summit) with the ability to see if whether your environment is out-of-sync with your Cloud Formation controlled infrastructure. For instance, if someone goes and changes a subnet association that Cloud Formation is supposed to change, then Cloud Formation will let you know exactly what the manual change was. Hopefully Terraform would come out with a feature like this soon as time progresses.

If you are convinced in using Cloud Formation as your IAC tool for your current infrastructure, you must rebuild new parts of your infrastructure in small chunks using Cloud Formation, verify that they work and then cutover to the new resources (Blue/Green). You are also limited to 200 resources per Cloud Formation stack, which becomes a problem when dealing with Security Groups. If you have 150 Security groups, you must have ingress and egress rules as well, which puts you well above the Cloud Formation hard limit. To quickly evolve your environment to utilize High Availability (HA) with multiple Availability-Zones (AZs), using Terraform can be your very best Ally. You can import your current infrastructure: E.G.(Tier 1: VPCs, IGWs, VGWs, Subnets), (Tier 2: Route Table Associations, NGWs, Security Groups), and then you can just change your subnet's AZ attribute and/or break up your subnets into more subnets if you are able to, and change all of the subnets to different AZs. This helps to achieve Higher Availability because if one AZ goes down, your app will still function. If multiple AZ is just not an option for you, a great solution is to have a process to quickly switch over your app from one Availability-Zone to another in the event of a failure with the current AZ. This can be rendered using lambda, EC2, and/or local scripts that can respond to health check's failing or alerts of an AZ going unhealthy for AWS. This way, even though your app doesn't utilize more than one AZ "concurrently", it does "non-concurrently", because you have the ability to switch over to a completely new healthy AZ anytime you need to, with minimal downtime. Always think of a good second option when you are unable to achieve concurrent HA. Always test in good conditions to guarantee your app works in a different AZ.

If you are seeking to migrate to another region, Terraform can easily help achieve this as well. Import your current region's resources, again separating resources logically into different tiers. Then copy this folder to another folder with a new name that represents the new region, and deploy into the new region with 'terraform apply'. Done testing for the day? Just terminate it all with 'terraform destroy'. Just be careful to specify the correct region when destroying anything! Currently, this isn't a silver bullet and importing all resources into Terraform may not make the most sense and can still be a headache for resources like route-table routes. Resources such as DynamoDB and S3 have an option to replicate to new regions with the click of a button or simple api call. Therefore, your migration will need different tools, but using Terraform for the first and second tiered infrastructure will be key to quickly moving over to a new region. As you build out into the config file for the new region, moving from simpler tiered items to more complicated tiered items, you probably will start to reveal better ways to migrate the most complicated resource types alongside Terraform. Terraform is also constantly improving, so importing more complicated resources will most likely become easier in the near future.

In conclusion, take the extra time to utilize an IAC tool such as Terraform, because it allows you to adapt to unexpected changes in scope and new products that AWS or other cloud providers continuously deliver to you. It will make your environment highly resilient (HA/Fault Tolerance), but you also may be surprised that many of your automated tasks can probably be achieved and improved by using Terraform. Source controlling your terraform state and config files (git/bitbucket) allows for a version controlled Infrastructure-as-Code elastic environment. As you progress through your IAC journey, you will find more and more good fruit along the way, as you take greater control of the elasticity and adaptability that the public cloud has to offer its consumers.

More Stories By Nick Basinger

Systems Engineer and Certified AWS Solutions Architect Professional (SAP). Cloud, Python, Node js, Java, Ansible, Terraform, AWS, Azure, GCP, reading Kurt Vonnegut, learning new things, reading stories to my two kids, hanging out with my family.