-
Notifications
You must be signed in to change notification settings - Fork 9
AWS Fargate gRPC cluster example #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
FROM authzed/spicedb:v1.12.0 | ||
|
||
ARG SPICEDB_GRPC_PRESHARED_KEY | ||
ENV SPICEDB_GRPC_PRESHARED_KEY=${SPICEDB_GRPC_PRESHARED_KEY} | ||
ARG SPICEDB_DATASTORE_ENGINE | ||
ENV SPICEDB_DATASTORE_ENGINE=${SPICEDB_DATASTORE_ENGINE} | ||
ARG SPICEDB_DATASTORE_CONN_URI | ||
ENV SPICEDB_DATASTORE_CONN_URI=${SPICEDB_DATASTORE_CONN_URI} | ||
|
||
ENTRYPOINT ["spicedb", "serve"] | ||
|
||
EXPOSE 50051/tcp 8080/tcp 9090/tcp |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# SpiceDB AWS Fargate cluster | ||
This reference config will create a Fargate cluster exposing gRPC port `50051` over an SSL endpoint. | ||
|
||
## The following commands need to run: | ||
1. a network VPC is required for Fargate. Usually, one VPC is enough for multiple services. | ||
we will reference SubnetA, SubnetB and a VPC ids of our network VPC in the subsequent steps | ||
2. create a cluster using your values for SubnetA, SubnetB and VPC | ||
``` | ||
aws cloudformation create-stack --stack-name permissions-staging --template-body file://./aws/fargate.yaml --parameters ParameterKey=SubnetA,ParameterValue=subnet-0dd.........d9 ParameterKey=SubnetB,ParameterValue=subnet-062........a5 ParameterKey=VPC,ParameterValue=vpc-09a........57 --capabilities CAPABILITY_NAMED_IAM | ||
``` | ||
|
||
TODO: | ||
1. Parametrize: AWSAccountId, CertificateId, HostedZoneName, ServiceName, etc. | ||
2. Create ECR repository along with the rest of the stack resources | ||
3. Route traffic on ports 8080 and 9090 for SpiceDB dashboard and metrics |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,283 @@ | ||
AWSTemplateFormatVersion: 2010-09-09 | ||
Description: CloudFormation template for a SpiceDB cluster. | ||
Parameters: | ||
VPC: | ||
Type: AWS::EC2::VPC::Id | ||
SubnetA: | ||
Type: AWS::EC2::Subnet::Id | ||
SubnetB: | ||
Type: AWS::EC2::Subnet::Id | ||
Certificate: | ||
Type: String | ||
# Update with the certificate ARN from Certificate Manager, which must exist in the same region. | ||
# In our case, it is permissions-staging.domain.com | ||
Default: 'arn:aws:acm:us-east-1:5xxxxxxxxx3:certificate/6e603b3c-....-....-....-72b1d8d4711b' | ||
Image: | ||
Type: String | ||
# Update with the Docker image. "You can use images in the Docker Hub registry or specify other repositories (repository-url/image:tag)." | ||
Default: 5xxxxxxxxx3.dkr.ecr.us-east-1.amazonaws.com/permissions-staging:latest | ||
ServiceName: | ||
Type: String | ||
# update with the name of the service | ||
Default: perms-stg-service | ||
ContainerPort: | ||
Type: Number | ||
Default: 50051 | ||
LoadBalancerPort: | ||
Type: Number | ||
Default: 50051 | ||
HealthCheckPath: | ||
Type: String | ||
Default: '/grpc.health.v1.Health/Check' | ||
HostedZoneName: | ||
Type: String | ||
Default: domain.com | ||
Subdomain: | ||
Type: String | ||
Default: permissions-staging | ||
# for autoscaling | ||
MinContainers: | ||
Type: Number | ||
Default: 1 | ||
# for autoscaling | ||
MaxContainers: | ||
Type: Number | ||
Default: 2 | ||
# target CPU utilization (%) | ||
AutoScalingTargetValue: | ||
Type: Number | ||
Default: 50 | ||
Comment on lines
+47
to
+49
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. autoscaling could be disruptive to a spicedb cluster, because the ring has to reconfigure and it would potentially affect your deployment's cache hit ratios There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you please rephrase your comment as a question? BTW, I do have a question here, since I know next to nothing about the project: Does SpiceDB support dynamic sizing/scaling or not? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My initial suggestion was not to enable autoscaling. While SpiceDB supports adding and removing nodes, it's something that comes at a cost of reorganizing the hash ring, and leads to lower cache hit rate. However, after discussing with the team, there are things we could explore to reduce the impact of instances of SpiceDB coming and going, and we certainly want to make autoscaling fully supported without any performance impact. So you can dismiss my comment! |
||
Resources: | ||
Cluster: | ||
Type: AWS::ECS::Cluster | ||
Properties: | ||
ClusterName: !Join ['', [!Ref ServiceName, Cluster]] | ||
TaskDefinition: | ||
Type: AWS::ECS::TaskDefinition | ||
# Makes sure the log group is created before it is used. | ||
DependsOn: LogGroup | ||
Properties: | ||
# Name of the task definition. Subsequent versions of the task definition are grouped together under this name. | ||
Family: !Join ['', [!Ref ServiceName, TaskDefinition]] | ||
# awsvpc is required for Fargate | ||
NetworkMode: awsvpc | ||
RequiresCompatibilities: | ||
- FARGATE | ||
# 256 (.25 vCPU) - Available memory values: 0.5GB, 1GB, 2GB | ||
# 512 (.5 vCPU) - Available memory values: 1GB, 2GB, 3GB, 4GB | ||
# 1024 (1 vCPU) - Available memory values: 2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB | ||
# 2048 (2 vCPU) - Available memory values: Between 4GB and 16GB in 1GB increments | ||
# 4096 (4 vCPU) - Available memory values: Between 8GB and 30GB in 1GB increments | ||
Cpu: 256 | ||
# 0.5GB, 1GB, 2GB - Available cpu values: 256 (.25 vCPU) | ||
# 1GB, 2GB, 3GB, 4GB - Available cpu values: 512 (.5 vCPU) | ||
# 2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB - Available cpu values: 1024 (1 vCPU) | ||
# Between 4GB and 16GB in 1GB increments - Available cpu values: 2048 (2 vCPU) | ||
# Between 8GB and 30GB in 1GB increments - Available cpu values: 4096 (4 vCPU) | ||
Memory: 512 | ||
# A role needed by ECS. | ||
# "The ARN of the task execution role that containers in this task can assume. All containers in this task are granted the permissions that are specified in this role." | ||
# "There is an optional task execution IAM role that you can specify with Fargate to allow your Fargate tasks to make API calls to Amazon ECR." | ||
ExecutionRoleArn: !Ref ExecutionRole | ||
# "The Amazon Resource Name (ARN) of an AWS Identity and Access Management (IAM) role that grants containers in the task permission to call AWS APIs on your behalf." | ||
TaskRoleArn: !Ref TaskRole | ||
ContainerDefinitions: | ||
- Name: !Ref ServiceName | ||
Image: !Ref Image | ||
PortMappings: | ||
- ContainerPort: !Ref ContainerPort | ||
# Send logs to CloudWatch Logs | ||
LogConfiguration: | ||
LogDriver: awslogs | ||
Options: | ||
awslogs-region: !Ref AWS::Region | ||
awslogs-group: !Ref LogGroup | ||
awslogs-stream-prefix: ecs | ||
# A role needed by ECS | ||
ExecutionRole: | ||
Type: AWS::IAM::Role | ||
Properties: | ||
RoleName: !Join ['', [!Ref ServiceName, ExecutionRole]] | ||
AssumeRolePolicyDocument: | ||
Statement: | ||
- Effect: Allow | ||
Principal: | ||
Service: ecs-tasks.amazonaws.com | ||
Action: 'sts:AssumeRole' | ||
ManagedPolicyArns: | ||
- 'arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy' | ||
# A role for the containers | ||
TaskRole: | ||
Type: AWS::IAM::Role | ||
Properties: | ||
RoleName: !Join ['', [!Ref ServiceName, TaskRole]] | ||
AssumeRolePolicyDocument: | ||
Statement: | ||
- Effect: Allow | ||
Principal: | ||
Service: ecs-tasks.amazonaws.com | ||
Action: 'sts:AssumeRole' | ||
# ManagedPolicyArns: | ||
# - | ||
# Policies: | ||
# - | ||
# A role needed for auto scaling | ||
AutoScalingRole: | ||
Type: AWS::IAM::Role | ||
Properties: | ||
RoleName: !Join ['', [!Ref ServiceName, AutoScalingRole]] | ||
AssumeRolePolicyDocument: | ||
Statement: | ||
- Effect: Allow | ||
Principal: | ||
Service: ecs-tasks.amazonaws.com | ||
Action: 'sts:AssumeRole' | ||
ManagedPolicyArns: | ||
- 'arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceAutoscaleRole' | ||
ContainerSecurityGroup: | ||
Type: AWS::EC2::SecurityGroup | ||
Properties: | ||
GroupDescription: !Join ['', [!Ref ServiceName, ContainerSecurityGroup]] | ||
VpcId: !Ref VPC | ||
SecurityGroupIngress: | ||
- IpProtocol: tcp | ||
FromPort: !Ref ContainerPort | ||
ToPort: !Ref ContainerPort | ||
SourceSecurityGroupId: !Ref LoadBalancerSecurityGroup | ||
LoadBalancerSecurityGroup: | ||
Type: AWS::EC2::SecurityGroup | ||
Properties: | ||
GroupDescription: | ||
!Join ['', [!Ref ServiceName, LoadBalancerSecurityGroup]] | ||
VpcId: !Ref VPC | ||
SecurityGroupIngress: | ||
- IpProtocol: tcp | ||
FromPort: !Ref LoadBalancerPort | ||
ToPort: !Ref LoadBalancerPort | ||
CidrIp: 0.0.0.0/0 | ||
- IpProtocol: tcp | ||
FromPort: 80 | ||
ToPort: 80 | ||
CidrIp: 0.0.0.0/0 | ||
Service: | ||
Type: AWS::ECS::Service | ||
# This dependency is needed so that the load balancer is setup correctly in time | ||
DependsOn: | ||
- LoadBalancerListener | ||
Properties: | ||
ServiceName: !Ref ServiceName | ||
Cluster: !Ref Cluster | ||
TaskDefinition: !Ref TaskDefinition | ||
DeploymentConfiguration: | ||
MinimumHealthyPercent: 100 | ||
MaximumPercent: 200 | ||
DesiredCount: 2 | ||
# This may need to be adjusted if the container takes a while to start up | ||
HealthCheckGracePeriodSeconds: 30 | ||
LaunchType: FARGATE | ||
NetworkConfiguration: | ||
AwsvpcConfiguration: | ||
# change to DISABLED if you're using private subnets that have access to a NAT gateway | ||
AssignPublicIp: ENABLED | ||
Subnets: | ||
- !Ref SubnetA | ||
- !Ref SubnetB | ||
SecurityGroups: | ||
- !Ref ContainerSecurityGroup | ||
LoadBalancers: | ||
- ContainerName: !Ref ServiceName | ||
ContainerPort: !Ref ContainerPort | ||
TargetGroupArn: !Ref TargetGroup | ||
TargetGroup: | ||
Type: AWS::ElasticLoadBalancingV2::TargetGroup | ||
Properties: | ||
HealthCheckIntervalSeconds: 300 | ||
HealthCheckPath: !Ref HealthCheckPath | ||
HealthCheckTimeoutSeconds: 5 | ||
UnhealthyThresholdCount: 2 | ||
HealthyThresholdCount: 2 | ||
HealthCheckEnabled: true | ||
HealthCheckPort: 'traffic-port' | ||
HealthCheckProtocol: HTTP | ||
# end of gRPC -specific configuration | ||
Name: !Join ['', [!Ref ServiceName, TargetGroup]] | ||
Port: !Ref ContainerPort | ||
Protocol: HTTP | ||
ProtocolVersion: GRPC | ||
Matcher: | ||
GrpcCode: 0 | ||
TargetGroupAttributes: | ||
- Key: deregistration_delay.timeout_seconds | ||
Value: 300 # default is 300 | ||
TargetType: ip | ||
VpcId: !Ref VPC | ||
LoadBalancerListener: | ||
Type: AWS::ElasticLoadBalancingV2::Listener | ||
Properties: | ||
LoadBalancerArn: !Ref LoadBalancer | ||
Port: !Ref LoadBalancerPort | ||
Protocol: HTTPS | ||
SslPolicy: "ELBSecurityPolicy-2016-08" | ||
Certificates: | ||
- CertificateArn: !Ref Certificate | ||
DefaultActions: | ||
- Order: 1 | ||
TargetGroupArn: !Ref TargetGroup | ||
Type: "forward" | ||
LoadBalancer: | ||
Type: AWS::ElasticLoadBalancingV2::LoadBalancer | ||
Properties: | ||
LoadBalancerAttributes: | ||
# this is the default, but is specified here in case it needs to be changed | ||
- Key: idle_timeout.timeout_seconds | ||
Value: 60 | ||
Name: !Join ['', [!Ref ServiceName, LoadBalancer]] | ||
# "internal" is also an option | ||
Scheme: internet-facing | ||
SecurityGroups: | ||
- !Ref LoadBalancerSecurityGroup | ||
Subnets: | ||
- !Ref SubnetA | ||
- !Ref SubnetB | ||
DNSRecord: | ||
Type: AWS::Route53::RecordSet | ||
Properties: | ||
HostedZoneName: !Join ['', [!Ref HostedZoneName, .]] | ||
Name: !Join ['', [!Ref Subdomain, ., !Ref HostedZoneName, .]] | ||
Type: A | ||
AliasTarget: | ||
DNSName: !GetAtt LoadBalancer.DNSName | ||
HostedZoneId: !GetAtt LoadBalancer.CanonicalHostedZoneID | ||
LogGroup: | ||
Type: AWS::Logs::LogGroup | ||
Properties: | ||
LogGroupName: !Join ['', [/ecs/, !Ref Subdomain]] | ||
RetentionInDays: 1 | ||
AutoScalingTarget: | ||
Type: AWS::ApplicationAutoScaling::ScalableTarget | ||
Properties: | ||
MinCapacity: !Ref MinContainers | ||
MaxCapacity: !Ref MaxContainers | ||
ResourceId: !Join ['/', [service, !Ref Cluster, !GetAtt Service.Name]] | ||
ScalableDimension: ecs:service:DesiredCount | ||
ServiceNamespace: ecs | ||
# "The Amazon Resource Name (ARN) of an AWS Identity and Access Management (IAM) role that allows Application Auto Scaling to modify your scalable target." | ||
RoleARN: !GetAtt AutoScalingRole.Arn | ||
AutoScalingPolicy: | ||
Type: AWS::ApplicationAutoScaling::ScalingPolicy | ||
Properties: | ||
PolicyName: !Join ['', [!Ref ServiceName, AutoScalingPolicy]] | ||
PolicyType: TargetTrackingScaling | ||
ScalingTargetId: !Ref AutoScalingTarget | ||
TargetTrackingScalingPolicyConfiguration: | ||
PredefinedMetricSpecification: | ||
PredefinedMetricType: ECSServiceAverageCPUUtilization | ||
ScaleInCooldown: 10 | ||
ScaleOutCooldown: 10 | ||
# Keep things at or lower than 50% CPU utilization, for example | ||
TargetValue: !Ref AutoScalingTargetValue | ||
|
||
Outputs: | ||
Endpoint: | ||
Description: Endpoint | ||
Value: !Join ['', ['https://', !Ref DNSRecord]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 it's not clear to me why this has to be added. Why can't those be specified as environment variables that get injected in the container via the cloudformation template? This container definition does not seem to provide anything the upstream couldn't provide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i don't understand the comment. don't know the project as well as you probably do. is the goal of your question to understand whether my approach is the only approach possible? please clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try again 😄 What I'm trying to say is that theoretically you wouldn't need to define your own
Dockerfile
and use the images we the Authzed team push to public registries like Dockerhub, Quay or GitHub Container Registry. My goal is to see if we can remove thisDockerfile
and instead adjust the cloudformation accordingly to inject the corresponding environment variables. Does that make sense?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are certainly welcome to pull my PR and adjust the CloudFormation template. There are also other issues that are worth addressing, for example, parametrization of the ecr repository, certificate, aws account id, etc. I started the
TODO
section in the README. Please add anything you see fit there, as well as push further improvements.This was meant as a PoC reference implementation. I would certainly be delighted to pull any upstream changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood! I think the dockerfile bits is at least not something in line with our own standards, and it's only fair we tackle those. Another thing is dispatch, which isn't really enabled. Thanks for your contribution! As soon as we get some spare cycles we will get back to this 🙇🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I do understand. You are correct. Appreciate the fact that this is a draft rather than a finished production-quality template, and it does require someone to co-pilot before the merge.
Perhaps in the future, I will get to the point where I will be able to cover an entire issue by myself, all the way over the finish line. Right now, I built MAYBE somewhat useful pieces. However, I don't have enough mileage to see every angle, like the core team members. Just sharing small bits that I developed while trying to get the initial SpiceDB integration functional enough to cover some of my use cases.