Skip to content

Conversation

@ldionne
Copy link
Member

@ldionne ldionne commented Oct 31, 2025

This patch adds a Terraform configuration file that should allow deploying to an EC2 instance. It requires a few secrets to be made available to Github Actions.

This patch adds a Terraform configuration file that should allow
deploying to an EC2 instance. It requires a few secrets to be
made available to Github Actions.
}

resource "aws_instance" "docker_server" {
ami = "ami-0c97bd51d598d45e4" # Amazon Linux 2023 kernel-6.12 AMI in us-west-2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boomanaiden154 Are we OK with hardcoding the AMI? What do you folks usually do?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with how AWS does things. Hard coding it doesn't seem like a big deal. But we want to be able to change it, which would probably force instance recreation. I think we should do what I suggested above where the instance is a clean slate on every boot but mounts a persistent volume that has the DB info.

resource "aws_instance" "docker_server" {
ami = "ami-0c97bd51d598d45e4" # Amazon Linux 2023 kernel-6.12 AMI in us-west-2
instance_type = "t2.micro"
key_name = "test-key-name" # TODO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to put here, I presume this needs to match a key in the LLVM Foundation's actual AWS account.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any keys should be specified in the provider. I think this is a different type of key.


on:
push:
tags:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am deploying on tags at the moment: I don't think we want to re-deploy at every commit since we risk bringing down the instance. Actually, I even wonder whether that should be a manually triggered job. WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to tag images by commit SHA, but explicitly version them in the terraform. That means we get a new image per commit, but only redeploy when we explicitly bump the commit of the images we're running.

- name: Initialize Terraform
run: terraform init

- name: Apply Terraform changes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the instance already exists and then we re-deploy it, what's going to happen? My understanding is that we'd start over from scratch with an empty EC2 instance, which means we would lose all of the existing data stored on the instance. Is that not the case?

Do you understand the mechanism by which the data we store in VOLUMES in the Docker container end up being persisted across re-deployments of the EC2 instance? I don't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it just calls whatever AWS API calls it needs to update the instance to match your terraform file, it won't get destroyed. Terraform holds state about this type of stuff.

The volume will just be stored on the root block device since we haven't attached any EBS storage or anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it just calls whatever AWS API calls it needs to update the instance to match your terraform file, it won't get destroyed. Terraform holds state about this type of stuff.

I see. But the Terraform state is not kept across invocations of the Github Action, so I don't really understand how Terraform can tell that we even already have an instance.

@ldionne
Copy link
Member Author

ldionne commented Oct 31, 2025

CC @lukel97 I went ahead and gave this a shot, I was curious to understand the whole pipeline

@ldionne
Copy link
Member Author

ldionne commented Oct 31, 2025

@boomanaiden154 I also created the appropriately-named secrets in the Github Actions of this repository, however they all have fake values at the moment.

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fleshing this out, I'm not sure if you've tried deploying this to a test AWS account yet but it looks like it's missing a security group/ingress rules etc., so the web server won't be reachable by any public traffic IIUC.

I've also got some terraform files written here, it would be good to collaborate on this. I don't want us to step on each others toes so I'll just leave review comments for now but let me know if you'd rather have me just commit directly to the branch.

- name: Initialize Terraform
run: terraform init

- name: Apply Terraform changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it just calls whatever AWS API calls it needs to update the instance to match your terraform file, it won't get destroyed. Terraform holds state about this type of stuff.

The volume will just be stored on the root block device since we haven't attached any EBS storage or anything.

@ldionne
Copy link
Member Author

ldionne commented Oct 31, 2025

I've also got some terraform files written here, it would be good to collaborate on this. I don't want us to step on each others toes so I'll just leave review comments for now but let me know if you'd rather have me just commit directly to the branch.

Please feel free to commit directly to the branch. Sorry, I didn't know you had started on this already.

I did try to deploy an EC2 instance in my personal account, however that account is blocked right now (IDK why) so I haven't gotten very far. This was intended to be a starting point.

Feel free to push whatever changes you have to the branch.


resource "aws_instance" "docker_server" {
ami = "ami-0c97bd51d598d45e4" # Amazon Linux 2023 kernel-6.12 AMI in us-west-2
instance_type = "t2.micro"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default block storage on these devices are tiny (~8GB IIRC?), you probably want to expand it to a few GB more

Suggested change
instance_type = "t2.micro"
instance_type = "t2.micro"
root_block_device {
volume_size = 64
volume_type = "gp3"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple GB boot disk should be fine, but slightly bigger might be good. The DB should probably be on a separate volume.


LNT_DB_PASSWORD=${__db_password__}
LNT_AUTH_TOKEN=${__auth_token__}
docker compose --file compose.yaml up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to daemonize this otherwise cloudinit will never finish

Suggested change
docker compose --file compose.yaml up
docker compose --file compose.yaml up -d


LNT_DB_PASSWORD=${__db_password__}
LNT_AUTH_TOKEN=${__auth_token__}
docker compose --file compose.yaml up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC these user data scripts are only called when the instance is first initialized, but not e.g. rebooted. So we probably want to change the docker-compose restart policy to be unless-stopped so the containers get relaunched on a reboot

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends upon how we set it up. I was thinking it might be better to setup the machine to be a clean slate on every boot, and mount a persistent volume that actually contains the DB. That makes it super easy to change system software inside TF.

@lukel97
Copy link
Contributor

lukel97 commented Oct 31, 2025

I did try to deploy an EC2 instance in my personal account, however that account is blocked right now (IDK why) so I haven't gotten very far. This was intended to be a starting point.

Hah, my AWS account was also blocked, I'm currently waiting for AWS support to verify my identity. I feel your pain :)


on:
push:
tags:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to tag images by commit SHA, but explicitly version them in the terraform. That means we get a new image per commit, but only redeploy when we explicitly bump the commit of the images we're running.

sudo usermod -a -G docker ec2-user
sudo chkconfig docker on

LNT_DB_PASSWORD=${__db_password__}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these env variables coming from?


LNT_DB_PASSWORD=${__db_password__}
LNT_AUTH_TOKEN=${__auth_token__}
docker compose --file compose.yaml up

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends upon how we set it up. I was thinking it might be better to setup the machine to be a clean slate on every boot, and mount a persistent volume that actually contains the DB. That makes it super easy to change system software inside TF.

provider "aws" {
region = "us-west-2"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a way to set the terraform state. We use a GCS bucket in the premerge cluster to do this. https://github.com/llvm/llvm-zorg/blob/87d07e600970abf419046d2ab6083b2d64240bce/premerge/main.tf#L31

Otherwise state isn't saved across checkouts, which means things won't work.

region = "us-west-2"
}

variable "lnt_db_password" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be data resources that reference secrets stored inside AWS's secret manager.

https://github.com/llvm/llvm-zorg/blob/87d07e600970abf419046d2ab6083b2d64240bce/premerge/main.tf#L113 is how we set this up for premerge. Not sure exactly how to do this for AWS.

}

resource "aws_instance" "docker_server" {
ami = "ami-0c97bd51d598d45e4" # Amazon Linux 2023 kernel-6.12 AMI in us-west-2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with how AWS does things. Hard coding it doesn't seem like a big deal. But we want to be able to change it, which would probably force instance recreation. I think we should do what I suggested above where the instance is a clean slate on every boot but mounts a persistent volume that has the DB info.


resource "aws_instance" "docker_server" {
ami = "ami-0c97bd51d598d45e4" # Amazon Linux 2023 kernel-6.12 AMI in us-west-2
instance_type = "t2.micro"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple GB boot disk should be fine, but slightly bigger might be good. The DB should probably be on a separate volume.

resource "aws_instance" "docker_server" {
ami = "ami-0c97bd51d598d45e4" # Amazon Linux 2023 kernel-6.12 AMI in us-west-2
instance_type = "t2.micro"
key_name = "test-key-name" # TODO

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any keys should be specified in the provider. I think this is a different type of key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants