Terraform
Terraform is an open-source infrastructure as code software tool created by HashiCorp. It enables users to define and provision a datacenter infrastructure using an awful high-level configuration language known as Hashicorp Configuration Language (HCL), or optionally JSON. Terraform supports a number of cloud infrastructure providers such as Amazon Web Services, IBM Cloud , Google Cloud Platform, DigitalOcean, Linode, Microsoft Azure, Oracle Cloud Infrastructure, OVH, or VMware vSphere as well as OpenNebula and OpenStack.
Tools⚑
- tfschema: A binary that allows you to see the attributes of the resources of the different providers. There are some times that there are complex attributes that aren't shown on the docs with an example. Here you'll see them clearly.
tfschema resource list aws | grep aws_iam_user
> aws_iam_user
> aws_iam_user_group_membership
> aws_iam_user_login_profile
> aws_iam_user_policy
> aws_iam_user_policy_attachment
> aws_iam_user_ssh_key
tfschema resource show aws_iam_user
+----------------------+-------------+----------+----------+----------+-----------+
| ATTRIBUTE | TYPE | REQUIRED | OPTIONAL | COMPUTED | SENSITIVE |
+----------------------+-------------+----------+----------+----------+-----------+
| arn | string | false | false | true | false |
| force_destroy | bool | false | true | false | false |
| id | string | false | true | true | false |
| name | string | true | false | false | false |
| path | string | false | true | false | false |
| permissions_boundary | string | false | true | false | false |
| tags | map(string) | false | true | false | false |
| unique_id | string | false | false | true | false |
+----------------------+-------------+----------+----------+----------+-----------+
# Open the documentation of the resource in the browser
tfschema resource browse aws_iam_user
-
terraforming: Tool to export existing resources to terraform
-
terraboard: Web dashboard to visualize and query terraform tfstate, you can search, compare and see the most active ones. There are deployments for k8s.
export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
export AWS_DEFAULT_REGION=eu-west-1
export AWS_BUCKET=terraform-tfstate-20180119
export TERRABOARD_LOG_LEVEL=debug
docker network create terranet
docker run -ti --rm --name db -e POSTGRES_USER=gorm -e POSTGRES_DB=gorm -e POSTGRES_PASSWORD="mypassword" --net terranet postgres
docker run -ti --rm -p 8080:8080 -e AWS_REGION="$AWS_DEFAULT_REGION" -e AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" -e AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}" -e AWS_BUCKET="$AWS_BUCKET" -e DB_PASSWORD="mypassword" --net terranet camptocamp/terraboard:latest
-
tfenv: Install different versions of terraform
git clone https://github.com/tfutils/tfenv.git ~/.tfenv echo 'export PATH="$HOME/.tfenv/bin:$PATH"' >> ~/.bashrc echo 'export PATH="$HOME/.tfenv/bin:$PATH"' >> ~/.zshrc tfenv list-remote tfenv install 0.12.8 terraform version tfenv install 0.11.15 terraform version tfenv use 0.12.8 terraform version
-
landscape: A program to modify the
plan
and show a nicer version, really useful when it's shown as json. Right now it only works for terraform 11.
terraform plan | landscape
- k2tf: Program to convert k8s yaml manifestos to HCL.
Editor Plugins⚑
For Vim:
- vim-terraform: Execute tf from vim and autoformat when saving.
- vim-terraform-completion: linter and autocomplete.
Good practices and maintaining⚑
- fmt: Formats the code following hashicorp best practices.
terraform fmt
-
Validate: Tests that the syntax is correct.
terraform validate
-
Documentación: Generates a table in markdown with the inputs and outputs.
terraform-docs markdown table *.tf > README.md
## Inputs
| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| broker_numbers | Number of brokers | number | `"3"` | no |
| broker_size | AWS instance type for the brokers | string | `"kafka.m5.large"` | no |
| ebs_size | Size of the brokers disks | string | `"300"` | no |
| kafka_version | Kafka version | string | `"2.1.0"` | no |
## Outputs
| Name | Description |
|------|-------------|
| brokers_masked_endpoints | Zookeeper masked endpoints |
| brokers_real_endpoints | Zookeeper real endpoints |
| zookeeper_masked_endpoints | Zookeeper masked endpoints |
| zookeeper_real_endpoints | Zookeeper real endpoints |
- Terraform lint (tflint): Only works with some AWS resources. It allows the validation against a third party API. For example:
resource "aws_instance" "foo" { ami = "ami-0ff8a91507f77f867" instance_type = "t1.2xlarge" # invalid type! }
The code is valid, but in AWS there doesn't exist the type t1.2xlarge
. This test avoids this kind of issues.
wget https://github.com/wata727/tflint/releases/download/v0.11.1/tflint_darwin_amd64.zip
unzip tflint_darwin_amd64.zip
sudo install tflint /usr/local/bin/
tflint -v
We can automate all the above to be executed before we do a commit using the pre-commit framework.
sudo pip install pre-commit
cd $proyectoConTerraform
echo """repos:
- repo: git://github.com/antonbabenko/pre-commit-terraform
rev: v1.19.0
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: terraform_docs
- id: terraform_tflint
""" > .pre-commit-config.yaml
pre-commit install
pre-commit run terraform_fmt
pre-commit run terraform_validate --file dynamo.tf
pre-commit run -a
Tests⚑
Static analysis⚑
Linters⚑
- conftest
- tflint
terraform validate
Dry run⚑
terraform plan
- hashicorp sentinel
- terraform-compliance
Unit tests⚑
There is no real unit testing in infrastructure code as you need to deploy it in a real environment
-
terratest
(works for k8s and terraform)Some sample code in:
- github.com/gruntwork-io/infrastructure-as-code-testing-talk
- gruntwork.io
E2E test⚑
- Too slow and too brittle to be worth it
- Use incremental e2e testing
Variables⚑
It's a good practice to name the resource before the particularization of the resource, so you can search all the elements of that resource, for example, instead of client_cidr
and operations_cidr
use cidr_operations
and cidr_client
variable "list_example"{
description = "An example of a list"
type = "list"
default = [1, 2, 3]
}
variable "map_example"{
description = "An example of a dictionary"
type = "map"
default = {
key1 = "value1"
key2 = "value2"
}
}
For the use of maps inside maps or lists investigate zipmap
To access you have to use "${var.list_example}"
For secret variables we use:
variable "db_password" {
description = "The password for the database"
}
Which has no default value, we save that password in our keystore and pass it as environmental variable
export TF_VAR_db_password="{{ your password }}"
terragrunt plan
As a reminder, Terraform stores all variables in its state file in plain text, including this database password, which is why your terragrunt config should always enable encryption for remote state storage in S3
Interpolation of variables⚑
You can't interpolate in variables, so instead of
variable "sistemas_gpg" {
description = "Sistemas public GPG key for Zena"
type = "string"
default = "${file("sistemas_zena.pub")}"
}
locals {
sistemas_gpg = "${file("sistemas_zena.pub")}"
}
"${local.sistemas_gpg}"
Show information of the resources⚑
Get information of the infrastructure. Output variables show up in the console after you run terraform apply
, you can also use terraform output [{{ output_name }}]
to see the value of a specific output without applying any changes
output "public_ip" {
value = "${aws_instance.example.public_ip}"
}
> terraform apply
aws_security_group.instance: Refreshing state... (ID: sg-db91dba1)
aws_instance.example: Refreshing state... (ID: i-61744350)
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
public_ip = 54.174.13.5
Data source⚑
A data source represents a piece of read-only information that is fetched from the provider every time you run Terraform. It does not create anything new
data "aws_availability_zones" "all" {}
And you reference it with "${data.aws_availability_zones.all.names}"
Read-only state source⚑
With terraform_remote_state
you an fetch the Terraform state file stored by another set of templates in a completely read-only manner.
From an app template we can read the info of the ddbb with
data "terraform_remote_state" "db" {
backend = "s3"
config {
bucket = "(YOUR_BUCKET_NAME)"
key = "stage/data-stores/mysql/terraform.tfstate"
region = "us-east-1"
}
}
And you would access the variables inside the database terraform file with data.terraform_remote_state.db.outputs.port
To share variables from state, you need to to set them in the outputs.tf
file.
Template_file source⚑
It is used to load templates, it has two parameters, template
which is a string and vars
which is a map of variables. it has one output attribute called rendered
, which is the result of rendering template. For example
# File: user-data.sh
#!/bin/bash
cat > index.html <<EOF
<h1>Hello, World</h1>
<p>DB address: ${db_address}</p>
<p>DB port: ${db_port}</p>
EOF
nohup busybox httpd -f -p "${server_port}" &
data "template_file" "user_data" {
template = "${file("user-data.sh")}"
vars {
server_port = "${var.server_port}"
db_address = "${data.terraform_remote_state.db.address}"
db_port = "${data.terraform_remote_state.db.port}"
}
}
Resource lifecycle⚑
The lifecycle
parameter is a meta-parameter, it exist on about every resource in Terraform. You can add a lifecycle
block to any resource to configure how that resource should be created, updated or destroyed.
The available options are: * create_before_destroy
: Which if set to true will create a replacement resource before destroying hte original resource * prevent_destroy
: If set to true, any attempt to delete that resource (terraform destroy
), will fail, to delete it you have to first remove the prevent_destroy
resource "aws_launch_configuration" "example" {
image_id = "ami-40d28157"
instance_type = "t2.micro"
security_groups = ["${aws_security_group.instance.id}"]
user_data = <<-EOF
#!/bin/bash
echo "Hello, World" > index.html
nohup busybox httpd -f -p "${var.server_port}" &
EOF
lifecycle {
create_before_destroy = true
}
}
If you set the create_before_destroy
on a resource, you also have to set it on every resource that X depends on (if you forget, you'll get errors about cyclical dependencies). In the case of the launch configuration, that means you need to set create_before_destroy
to true on the security group:
resource "aws_security_group" "instance" {
name = "terraform-example-instance"
ingress {
from_port = "${var.server_port}"
to_port = "${var.server_port}"
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
}
Use collaboratively⚑
Share state⚑
The best option is to use S3 as bucket of the config.
First create it
resource "aws_s3_bucket" "terraform_state" {
bucket = "terraform-up-and-running-state"
versioning {
enabled = true
}
lifecycle {
prevent_destroy = true
}
}
And then configure terraform
terraform remote config \
-backend=s3 \
-backend-config="bucket=(YOUR_BUCKET_NAME)" \
-backend-config="key=global/s3/terraform.tfstate" \
-backend-config="region=us-east-1" \
-backend-config="encrypt=true"
In this way terraform will automatically pull the latest state from this bucked and push the latest state after running a command
Lock terraform⚑
To avoid several people running terraform at the same time, we'd use terragrunt
a wrapper for terraform that manages remote state for you automatically and provies locking by using DynamoDB (in the free tier)
Inside the terraform_config.tf
you create the dynamodb table and then configure your s3
backend to use it
resource "aws_dynamodb_table" "terraform_statelock" {
name = "global-s3"
read_capacity = 20
write_capacity = 20
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
terraform {
backend "s3" {
bucket = "grupo-zena-tfstate"
key = "global/s3/terraform.tfstate"
region = "eu-west-1"
encrypt = "true"
dynamodb_table = "global-s3"
}
}
You'll probably need to execute an terraform apply
with the dynamodb_table
line commented
If you want to unforce a lock, execute:
terraform force-unlock {{ unlock_id }}
You get the unlock_id
from an error trying to execute any terraform
command
Modules⚑
In terraform you can put code inside of a module
and reuse in multiple places throughout your code.
The provider resource should be specified by the user and not in the modules
Whenever you add a module to your terraform template or modify its source parameter you need to run a get command before you run plan
or apply
terraform get
To extract output variables of a module to the parent tf file you should use
${module.{{module.name}}.{{output_name}}}
Basics⚑
Any set of Terraform templates in a directory is a module.
The good practice is to have a directory called modules
in your parent project directory. There you git clone the desired modules. and for example inside pro/services/bastion/main.tf
you'd call it with:
provider "aws" {
region = "eu-west-1"
}
module "bastion" {
source = "../../../modules/services/bastion/"
}
Outputs⚑
Modules encapsulate their resources. A resource in one module cannot directly depend on resources or attributes in other modules, unless those are exported through outputs. These outputs can be referenced in other places in your configuration, for example:
resource "aws_instance" "client" {
ami = "ami-408c7f28"
instance_type = "t1.micro"
availability_zone = "${module.consul.server_availability_zone}"
}
Import⚑
You can import the different parts with terraform import {{resource_type}}.{{resource_name}} {{ resource_id }}
For examples see the documentation of the desired resource.
Bulk import⚑
But if you want to bulk import sources, I suggest using terraforming
.
Bad points⚑
- Manually added resources wont be managed by terraform, therefore you can't use it to enforce as shown in this bug.
- If you modify the LC of an ASG, the instances don't get rolling updated, you have to do it manually.
- They call the dictionaries
map
... (/゚Д゚)/ - The conditionals are really ugly. You need to use
count
. - You can't split long strings xD
Best practices⚑
Name the resources with _
instead of -
so the editor's completion work :)
VPC⚑
Don't use the default vpc
Security groups⚑
Instead of using aws_security_group
to define the ingress and egress rules, use it only to create the empty security group and use aws_security_group_rule
to add the rules, otherwise you'll get into a cycle loop
The sintaxis of an egress security group must be egress_from_{{source}}_to_destination
. The sintaxis of an ingress security group must be ingress_to_{{destination}}_from_{{source}}
Also set the order of the arguments, so they look like the name.
For ingress rule:
security_group_id = ...
cidr_blocks = ...
And in egress should look like:
security_group_id = ...
cidr_blocks = ...
Imagine you want to filter the traffic from A -> B, the egress rule from A to B should go besides the ingress rule from B to A.
Default security group⚑
You can't manage the default security group of an vpc, therefore you have to adopt it and set it to no rules at all with aws_default_security_group
resource
IAM⚑
You have to generate an gpg key and export it in base64
gpg --export {{ gpg_id }} | base64
To see the secrets you have to decrypt it
terraform output secret | base64 --decode | gpg -d
Sensitive information⚑
There are several approaches here.
First rely on the S3 encryption to protect the information in your state file
Second use Vault provider to protect the state file.
Third (but I won't use it) would be to use terrahelp
RDS credentials⚑
The RDS credentials are saved in plaintext both in the definition and in the state file, see this bug for more information. The value of password
is not compared against the value of the password in the cloud, so as long as the string in the code and the state file remains the same, it won't try to change it.
As a workaround, you can create the RDS with a fake password changeme
, and once the resource is created, run an aws
command to change it. That way, the value in your code and the state is not the real one, but it won't try to change it.
Inspired in this gist and the local-exec
docs, you could do:
resource "aws_db_instance" "main" {
username = "postgres"
password = "changeme"
...
}
resource "null_resource" "master_password" {
triggers {
db_host = aws_db_instance.main.address
}
provisioner "local-exec" {
command = "pass generate rds_main_password; aws rds modify-db-instance --db-instance-identifier $INSTANCE --master-user-password $(pass show rds_main_password)"
environment = {
INSTANCE = aws_db_instance.main.identifier
}
}
}
Where the password is stored in your pass
repository that can be shared with the team.
If you're wondering why I added such a long line, well it's because of HCL! as you can't split long strings, marvelous isn't it? xD
Loops⚑
You can't use nested lists or dictionaries, see this 2015 bug
Loop over a variable⚑
variable "vpn_egress_tcp_ports" {
description = "VPN egress tcp ports "
type = "list"
default = [50, 51, 500, 4500]
}
resource "aws_security_group_rule" "ingress_tcp_from_ops_to_vpn_instance"{
count = "${length(var.vpn_egress_tcp_ports)}"
type = "ingress"
from_port = "${element(var.vpn_egress_tcp_ports, count.index)}"
to_port = "${element(var.vpn_egress_tcp_ports, count.index)}"
protocol = "tcp"
cidr_blocks = [ "${var.cidr}"]
security_group_id = "${aws_security_group.pro_ins_vpn.id}"
}
Refactoring⚑
Refactoring in terraform is ugly business
Refactoring in modules⚑
If you try to refactor your terraform state into modules it will try to destroy and recreate all the elements of the module...
Refactoring the state file⚑
terraform state mv -state-out=other.tfstate module.web module.web
Google cloud integration⚑
You configure it in the terraform directory
// Configure the Google Cloud provider
provider "google" {
credentials = "${file("account.json")}"
project = "my-gce-project"
region = "us-central1"
}
To download the json go to the Google Developers Console. Go to Credentials
then Create credentials
and finally Service account key
.
Select Compute engine default service account
and select JSON
as the key type.
Ignore the change of an attribute⚑
Sometimes you don't care whether some attributes of a resource change, if that's the case use the lifecycle
statement:
resource "aws_instance" "example" {
# ...
lifecycle {
ignore_changes = [
# Ignore changes to tags, e.g. because a management agent
# updates these based on some ruleset managed elsewhere.
tags,
]
}
}
Define the default value of an variable that contains an object as empty⚑
variable "database" {
type = object({
size = number
instance_type = string
storage_type = string
engine = string
engine_version = string
parameter_group_name = string
multi_az = bool
})
default = null
Conditionals⚑
Elif⚑
locals {
test = "${ condition ? value : (elif-condition ? elif-value : else-value)}"
}
Do a conditional if a variable is not null⚑
resource "aws_db_instance" "instance" {
count = var.database == null ? 0 : 1
...
Debugging⚑
You can set the TF_LOG
environmental variable to one of the log levels TRACE
, DEBUG
, INFO
, WARN
or ERROR
to change the verbosity of the logs.
To remove the debug traces run unset TF_LOG
.