Skip to content

Latest commit

 

History

History
4301 lines (3782 loc) · 206 KB

devops-engineer-professional.md

File metadata and controls

4301 lines (3782 loc) · 206 KB


DevOps Engineer Professional

10/2019 - 6/2020


Exam Objectives

  • Implement and manage continuous delivery systems and methodologies on AWS
  • Implement and automate security controls, governance processes, and compliance validation
  • Define and deploy monitoring, metrics, and logging systems on AWS
  • Implement systems that are highly available, scalable, and self-healing on the AWS platform
  • Design, manage, and maintain tools to automate operational processes

Content

Domain 1: SDLC Automation

  • Apply concepts required to automate a CI/CD pipeline
  • Determine source control strategies and how to implement them
  • Apply concepts required to automate and integrate testing
  • Apply concepts required to build and manage artifacts securely
  • Determine deployment/delivery strategies (e.g., A/B, Blue/green, Canary, Red/black) and how to implement them using AWS Services

Domain 2:Configuration Management and Infrastructure as Code

  • Determine deployment services based on deployment needs
  • Determine application and infrastructure deployment models based on business needs
  • Apply security concepts in the automation of resource provisioning
  • Determine how to implement lifecycle hooks on a deployment
  • Apply concepts required to manage systems using AWS configuration management tools and services

Domain 3: Monitoring and Logging

  • Determine how to set up the aggregation, storage, and analysis of logs and metrics
  • Apply concepts required to automate monitoring and event management of an environment
  • Apply concepts required to audit, log, and monitoroperating systems, infrastructures, and applications
  • Determine how to implement tagging and other metadata strategies

Domain 4: Policies and Standards Automation

  • Apply concepts required to enforce standards for logging, metrics, monitoring, testing, and security
  • Determine how to optimize cost through automation
  • Apply concepts required to implement governance strategies

Domain 5: Incident and Event Response

  • Troubleshoot issues and determine how to restore operations
  • Determine how to automate event managementand alerting
  • Apply concepts required to implement automated healing
  • Apply concepts required to set up event-driven automated action

Domain 6: High Availability, Fault Tolerance, and Disaster Recovery

  • Determine appropriate use of multi-AZ versus multi-region architectures
  • Determine how to implement high availability, scalability, and fault tolerance
  • Determine the right services based on business needs (e.g., RTO/RPO, cost)
  • Determine how to design and automate disaster recovery strategies
  • Evaluate a deployment for points of failure

Concepts

Deployment Strategies

Overview

General Strategies

Strategy Deploy
time
Downtime Testing Deployment
Costs
Impact of
failed deployment
Rollback
process
Single Target Deployment 🕑 complete deploy limited no extra costs downtime redeploy
All At Once 🕑 complete deploy limited no extra costs downtime redeploy
Minimum In Service 🕑🕑 none can test new version
while old is still active
no extra costs no downtime redeploy
Rolling 🕑🕑🕑 usually none can test new version
while old is still active
no extra costs no downtime redeploy
Rolling With Extra Batches 🕑🕑🕑 usually none can test new version
while old is still active
little extra costs no downtime redeploy
Blue/Green 🕑🕑🕑🕑 none can test
prior to cutover
extra costs for new stack no downtime revert cutover

Strategies per AWS service

Strategy Auto Scaling
Group
CodeDeploy
EC2/On-Premises
CodeDeploy ECS CodeDeploy Lambda Elastic
Beanstalk
OpsWorks
Single Target Deployment . . . . redeploy .
All At Once AutoScalingReplacingUpdate All-at-once . . all at once .
Minimum In Service . . . . rolling .
Rolling AutoScalingRollingUpdate One-at-at-time . . rolling .
Rolling With Extra Batches . . . . rolling with
extra batches
.
Blue/Green . Traffic is shifted to a replacement set of instances
* All-at-once
* Half-at-a-time
* One-at-a-time
Traffic is shifted to a replacement task set
* Canary
* Linear
* All-at-once
Traffic is shifted to a new Lambda version
* Canary
* Linear
* All-at-once
immutable comes close
or: create new environment and use DNS
create new environment and use DNS
Canary . . See above
* Canary
See above
* Canary
Traffic Splitting .

Single target deployment

System Deploy
v1 Initial State
v1-2 Deployment Stage
v2 Final State
  • When initiated a new application version is installed on the (single) target server
  • Practically not in use any more
pros cons
Simple & very few moving parts Downtime
Deployment is faster than other methods Limited testing

All-at-once deployment

System Deploy .
v1 v1 v1 v1 v1 Initial State .
v1-2 v1-2 v1-2 v1-2 v1-2 Deployment Stage .
v2 v2 v2 v2 v2 Final State .
  • Single build stage triggers multiple target environments
pros cons
Deployment is relatively fast Downtime (like STD)
. Limited testing (like STD)
. Everything in-flight - can't stop deployment/rollback if targets fail
. More complicated than STD, often requires orchestration

Minimum in-service style deployment

System Deploy .
v1 v1 v1 v1 v1 Initial State .
v1 v1 v1-2 v1-2 v1-2 Deployment Stage 1 Minimum targets required for operational state: 2
v1-2 v1-2 v2 v2 v2 Deployment Stage 2 .
v2 v2 v2 v2 v2 Final State .
  • Orchestration engines know how many targets are required for a minimum operational state
  • System ensures that this number of instances is active while completing the rest of the deployment as quickly as possible
  • Happens to as many targets as possible
  • Suitable for large environments
pros cons
No downtime Many moving parts, requires orchestration
Deployment happens in (two) stages .
Generally quicker & with less stages than rolling deployments .

Rolling deployment

System Deploy .
v1 v1 v1 v1 v1 Initial State .
v1-2 v1-2 v1 v1 v1 Deployment Stage 1 Deploy first set of targets
v2 v2 v1-2 v1-2 v1 Deployment Stage 2 Only if health checks succeed: Deploy next set of targets
v2 v2 v2 v2 v1-2 Deployment Stage 3 Only if health checks succeed: Deploy next set of targets
v2 v2 v2 v2 v2 Final State .
  • Do x deployments at once, than move on to the next x
  • Flexible on failing health checks - roll back stage or whole deployment
  • Was considered cheapest and least risk method until hourly and consumption-based billing entered the market
pros cons
No downtime (if number of stage deployments is small enough) Does not necessarily maintain overall application health
Can be paused to allow for multi-version testing Many moving parts, requires orchestration
. Can be least efficient deployment method in terms of time taken

Rolling deployment with extra batches

System Deploy .
v1 v1 v1 v1 Initial State .
v1 v1 v1 v1 . . Deployment Stage 1 Deploy new batch of servers
v1 v1 v1 v1 v2 v2 Deployment Stage 2 Deploy new version to new servers
. . v1 v1 v2 v2 Deployment Stage 3 Undeploy first batch
v2 v2 v1 v1 v2 v2 Deployment Stage 4 Deploy new version to first batch
v2 v2 . . v2 v2 Deployment Stage 5 Undeploy second batch
v2 v2 v2 v2 Deployment Stage 6 Decommission servers from second batch
  • Deploy new batch of servers first
  • Very similar to rolling deployment
pros cons
No downtime (if number of stage deployments is small enough) Does not necessarily maintain overall application health
Can be paused to allow for multi-version testing Many moving parts, requires orchestration
. Can be least efficient deployment method in terms of time taken

Blue/green deployment

System Deploy .
(v1 v1 v1)() Initial State Blue environment, all traffic goes here
(v1 v1 v1)(v v v) Deployment Stage 1 Bring up green environment
(v1 v1 v1)(v2 v2 v2) Deployment Stage 2 Deploy application into green environment
(v1 v1 v1)(v2 v2 v2) Deployment Stage 3 Cutover - direct traffic from blue to green
() (v2 v2 v2) Final State Blue environment removed
  • Deploys to a separate environment to provide outage risks
  • Different cutover techniques
    • DNS routing
    • Swap Auto Scaling Group behind load balancer
    • Elastic Beanstalk immutable: Merge new ASG into old one
      • Update auto scaling launch configuration
      • Swap environment of an AWS Elastic Beanstalk application
      • Clone stack in AWS OpsWorks and update DNS
pros cons
Rapid all-at-once deployment process, no need to wait for per target health checks Requires advanced orchestration tooling
Can test health prior to cutover Significant cost for a second environment (mitigated by advanced billing models)
Clean & controlled cutover (various options) .
Easy rollback .
Can be fully automated using advanced templating .
By far the best method in terms of risk mitigation and minimal user impact .

Red/black deployment

... are just like blue/green, but they happen at a much faster rate.

Example:

  1. DNS -> LB -> ASG1
  2. DNS -> LB -> ASG2

A/B testing

  • Like blue/green deployment, but only shift a percentage of the traffic over, not everything at once
  • Gradually increase percentage of traffic sent to the new environment
  • Allows for different end goal:
    • Measuer user feedback and decide whether a feature should be rolled out or rolled back
    • This can be decided well before the new environment gets 100% of the traffic
    • Could only role out a feature to 10% of the users and switch back after metrics have been collected
    • -> different to red/green deployment
  • Achieved via AWS route 53 weighted routing
pros cons
. Different versions across the environment
. DNS switching affected by caches and other DNS related issues

Canary deployment

  • Like A/B testing, but gradually increases percentage of traffic to green environment

EC2 Concepts

Instance Profile

  • A container for an IAM role that you can use to pass role information to an EC2 instance when the instance starts.
  • An EC2 Instance cannot be assigned a role directly, but it can be assigned an Instance Profile which contains a role.
  • If you use the AWS Management Console to create a role for Amazon EC2, the console automatically creates an instance profile and gives it the same name as the role.
  • If you manage your roles from the AWS CLI or the AWS API, you create roles and instance profiles as separate actions.

Load Balancing with ELB/ALB

ELB/ALB Logs

  • Access Logging is an optional feature of Elastic Load Balancing that is disabled by default.
  • After you enable access logging for your load balancer, Elastic Load Balancing captures the logs and stores them in the Amazon S3 bucket as compressed files.
  • You can disable access logging at any time.
  • Can log every 5 or 60 minutes
  • There's no additional charge

ELB/ALB Health Checks

  • If you have associated your Auto Scaling Group with a Classic Load Balancer, you can use the load balancer health check to determine the health state of instances in your Auto Scaling Group. By default, an Auto Scaling Group periodically checks the health state of each instance.
  • Your Application Load Balancer periodically sends requests to its registered targets to test their status. These tests are called health checks.
  • The status of the instances that are healthy at the time of the health check is InService. The status of any instances that are unhealthy at the time of the health check is OutOfService.

ELB Security

  • Need end-to-end security
  • Encrypt all communication
  • Use HTTPS (layer 7) or SSL (layer 4)
  • Need to deploy an X.509 certificate on ELB
  • Can configure back-end authentication
    • Once configured, ELB only communicates with an instance if it has a matching public key

Auto Scaling

Overview

Amazon EC2 Auto Scaling helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You create collections of EC2 instances, called Auto Scaling Groups. You can specify the minimum number of instances in each Auto Scaling Group, and Amazon EC2 Auto Scaling ensures that your group never goes below this size. You can specify the maximum number of instances in each Auto Scaling Group, and Amazon EC2 Auto Scaling ensures that your group never goes above this size. If you specify the desired capacity, either when you create the group or at any time thereafter, Amazon EC2 Auto Scaling ensures that your group has this many instances. If you specify scaling policies, then Amazon EC2 Auto Scaling can launch or terminate instances as demand on your application increases or decreases.

  • Auto scaling can play a major role in deployments
  • Need to avoid downtime during deployments
  • How long does it take to deploy code and configure an instance?
  • How do you test a new launch configuration?
  • How do you phase out older launch configurations?
  • Use lifecycle hooks for custom actions
  • CloudFormation init scripts
  • Cloud init scripts
  • Scale out/scale in
  • Can launch spot instances as well as on-demand instances (also configure ratio between these instance options)
  • On AWS: Service - FAQs - User Guide

Components

Auto Scaling Group
  • Contains a collection of Amazon EC2 instances that are treated as a logical grouping for the purposes of automatic scaling and management
  • To use Amazon EC2 Auto Scaling features such as health check replacements and scaling policies
Launch Configuration
  • Instance configuration template that an Auto Scaling Group uses to launch EC2 instances
  • One Launch Configuration per ASG, can be used in many ASGs though
  • Can't be modified, needs to be recreated
  • To change instance type, copy the old Launch Configuration, change instance type, double max and desired, wait till new values have propagated, revert max and desired
Launch Template
  • Similar to Launch Configuration
  • Launch Templates can be used to launch regular instances as well as Spot Fleets.
  • Allows to have multiple versions of the same template
  • Can source another template to build a hierachy
  • With versioning, you can create a subset of the full set of parameters and then reuse it to create other templates or template versions
  • AWS recommends to use Launch Templates instead of Launch Configurations to ensure that you can use the latest features of Amazon EC2
Termination Policy
  • To specify which instances to terminate first during scale in, configure a Termination Policy for the Auto Scaling Group.
  • Policies will be applied to the AZ with the most instances
  • Can be combined with instance protection to prevent termination of specific instances, this starts as soon as the instance is in service.
    • Instances can still be terminated manually (unless termination protection has been enabled)
    • Unhealthy instance will still be replaced
    • Spot instance interuptions can still occur
      • Instance protection can also be applied to an Auto Scaling Group - protecting the whole group: protect from scale in
  • Can specify multiple policies, will be executed in order until an instance has been found
  • Default policy being last in a list of multiple policies is like catchAll, will always find an instance
    • Determine which AZ has most instances and at least one instance that's not protected from scale in
    • [For ASG with multiple instance types and purchase options]: Try to align remaining instances to allocation strategy
    • [For ASG that uses Launch Templates]: Terminate one of the instances with the oldest Launch Template
    • [For ASG that uses Launch Configuration]: Terminate one of the instances with the oldest Launch Configuration
    • If there are multiple instances to choose from, pick the one nearest to the next billing hour
    • Choose one at random
. . .
0 Default Designed to help ensure that your instances span Availability Zones evenly for high availability
3->4->random
1 OldestInstance Useful when upgrading to a new EC2 instance type
2 NewestInstance Useful when testing a new launch configuration
3 OldestLaunchConfiguration Useful when updating a group and phasing out instances
5 OldestLaunchTemplate Useful when you're updating a group and phasing out the instances from a previous configuration
4 ClosestToNextInstanceHour Next billing hour - useful to maximize instance us
6 AllocationStrategy Useful when preferred instance types have changed
Auto Scaling Lifecycle Hooks

The EC2 instances in an Auto Scaling Group have a path, or lifecycle, that differs from that of other EC2 instances. The lifecycle starts when the Auto Scaling Group launches an instance and puts it into service. The lifecycle ends when you terminate the instance, or the Auto Scaling group takes the instance out of service and terminates it.

Allows to cater for applications that take longer to deploy/tear-down.

After Lifecycle Hooks are added to the instance:

  • ASG responds to scale-out/scale-in events
  • Lifecycle Hook puts instance into pending:wait/terminating:wait state, instance is paused until we continue or timeout
    • This can be extended by configuring a heartbeat
  • Custom actions are performed through one or more of these options:
    • CloudWatch Events target to invoke Lambda function
    • Notification target for Lifecycle Hook is defined
    • Script on instance runs as instance starts, script can control lifecycle actions
  • Can also notify SQS or SNS, but Lambda is the preferred option
  • Going into pending:proceed/terminating:proceed after that.
  • By default, the instance remains in a wait state for one hour, and then the Auto Scaling Group continues the launch or terminate process
. .
Scale out scale out -> Pending -> Pending:Wait -> Pending:Proceed -> InService
Scale in scale in -> Terminating -> Terminating:Wait -> Terminating:Proceed -> Terminated
Troubleshoot InService -> StandBy
Scaling

Scaling is the ability to increase or decrease the compute capacity of your application. Scaling starts with an event, or scaling action, which instructs an Auto Scaling Group to either launch or terminate Amazon EC2 instances.

  • Manual scaling
    • Specify min/max/desired
  • Scheduled scaling
    • Specify time and date
  • Scaling Policies
    • Target Tracking Scaling Policy
      • With target tracking scaling policies, you select a scaling metric and set a target value. Amazon EC2 Auto Scaling creates and manages the CloudWatch alarms that trigger the scaling policy and calculates the scaling adjustment based on the metric and the target value.
    • Simple Scaling Policy
      • With simple and step scaling policies, you choose scaling metrics and threshold values for the CloudWatch alarms that trigger the scaling process.
      • Both require you to create CloudWatch alarms for the scaling policies.
      • Both require you to specify the high and low thresholds for the alarms.
      • Both require you to define whether to add or remove instances, and how many, or set the group to an exact size.
    • Step Scaling Policy
      • The main difference between the policy types is the step adjustments that you get with step scaling policies. When step adjustments are applied, and they increase or decrease the current capacity of your Auto Scaling Group, the adjustments vary based on the size of the alarm breach.
      • We recommend that you use step scaling policies instead of simple scaling policies, even if you have a single scaling adjustment
    • After a scaling activity is started, the policy continues to respond to additional alarms, even while a scaling activity or health check replacement is in progress.
    • Therefore, all alarms that are breached are evaluated by Amazon EC2 Auto Scaling as it receives the alarm messages.
    • However, scaling actions from previous alarms are taken into account (thereby not changing the absolute outcome of the scaling action)
  • Predictive scaling
    • AWS using data collection from actual EC2 usage

Protect instances from scaling in by setting termination protection, e.g. per API call

  • Long running workers
  • 'Special' instances, e.g. master of a cluster
Notifications
  • Can send SNS notifications
    • Success/failure on instance launch/termination
  • Better to integrate with CloudWatch Events
  • No direct integration with CloudWatch Logs
    • However if the CloudWatch agent is installed on the instances they will send logs
Health Checks

Amazon EC2 Auto Scaling can determine the health status of an instance using one or more of the following:

  • Status checks provided by Amazon EC2 to identify hardware and software issues that may impair an instance. The default health checks for an Auto Scaling group are EC2 status checks only.
  • Health checks provided by Elastic Load Balancing (ELB). These health checks are disabled by default but can be enabled.
  • Your custom health checks.

Integration with other services

ALB

ALB -> Target Group <- ASG

  • Configure ASG
    • Target Group from ALB
    • Health Check Type ELB
  • Can configure Slow Start Mode on Target Group level (up to 15min), so that new instances don't get the full load immediately
  • Should redirect from http (80) to https (443)
  • Can put instances into Standby so that they don't receive traffic.
  • Can put instances into Scale In Protection so that they don't get terminated on Scale In.
CodeDeploy
  • Install CodeDeploy agent on instances as per UserData
  • Create CodeDeploy Application and Deployment and tie it to the Auto Scaling Group
    • CodeDeploy will create Deployments for existing and new instances
  • Can choose between in-place and blue-green deployments
    • Blue-green will provision new Auto Scaling Group
      • Must have a load balancer configured so that traffic can be switched over
  • When deploying a new application version this will not be considered 'latest' until it has succeeded
    • So for Auto Scaling events, the old version is still getting deployed
CloudFormation
  • CreationPolicy
    • Wait for notification from instances that they created successfully
    • Instances use cfn-signal in UserData section
  • UpdatePolicy
    • If Launch Configuration or Launch Template are changing, deployed instances will not update unless defined in UpdatePolicy
    • Can configure policy for rolling, replacing and scheduled updated
SQS
  • It's common pattern to have instances from within an ASG consuming messages from SQS
  • Can implement custom metric in CloudWatch to control Auto Scaling behaviour
    • E.g. size of individual backlog on instances

Deployment Concepts

Name Before Intermediate After
In Place [ASG [Instance 1]] - [ASG [Instance 2]
Rolling [ASG [Instance 1]] [ASG [Instance 1,2]] [ASG [Instance 2]
Replace [ALB [ASG1 [...]]] [ALB [ASG1 [...]][ASG2 [...]] [ALB [ASG2 [...]]]
Blue/Green [R53 [ALB1 [ASG1 [...]]]] [R53 [ALB1 [ASG1 [...]]]][ALB2 [...]] [R53 [ALB2 [ASG2 [...]]]]

Troubleshooting

Possible Problems
  • Attempting to use wrong subnet
  • AZ no longer available or supported (outage)
  • Security group does not exist
  • Associated keypair does not exist
  • Auto scaling configuration is not working correctly
  • Instance type specification does not exist in that AZ
  • Invalid EBS device mapping
  • Attempt to attach EBS block device to instance-store AMI
  • AMI issues
  • Attempt to use placement groups with instance types that don't support that
  • AWS running out of capacity in that AZ
  • If an instance is stopped, e.g. for updating it, auto scaling will consider it unhealthy and terminate - restart it. Need to suspend auto scaling first.
Suspending ASG processes

You can suspend and then resume one or more of the scaling processes for your Auto Scaling Group. This can be useful for investigating a configuration problem or other issues with your web application and making changes to your application without invoking the scaling processes.

. .
Launch Disrupts other processes as no more scale out
Terminate Disrupts other processes as no more scale in
HealthCheck .
ReplaceUnhealthy .
AZRebalance .
AlarmNotification Suspends actions normally triggered by alarms
ScheduledAction .
AddToLoadBalancer Will not automatically add instances later

On-Premises strategies

EC2 and On-Premises VMs

  • Can download Amazon Linux 2 AMI in VM format to run on-premises
  • Can import existing VMs into EC2

AWS Application Discovery Service

  • Gather information about On-premises instances to plan a migration
  • Server utilization and dependency mappings
  • Track with AWS Migration Hub

AWS Database Migration Service

  • Replicate
    • On-prem -> AWS
    • AWS -> On-prem
    • AWS -> AWS

AWS Server Migration Service

  • Incremental replication of on-prem instances into AWS

Cost Allocation Tags

Overview

A tag is a label that you or AWS assigns to an AWS resource. Each tag consists of a key and a value. For each resource, each tag key must be unique, and each tag key can have only one value. You can use tags to organize your resources, and cost allocation tags to track your AWS costs on a detailed level. After you activate cost allocation tags, AWS uses the cost allocation tags to organize your resource costs on your cost allocation report, to make it easier for you to categorize and track your AWS costs. AWS provides two types of cost allocation tags, an AWS generated tags and user-defined tags. AWS defines, creates, and applies the AWS generated tags for you, and you define, create, and apply user-defined tags. You must activate both types of tags separately before they can appear in Cost Explorer or on a cost allocation report.


Data/Network Protection

Data Protection

In Transit

  • TLS for transit encryption
  • ACM to manage SSL/TLS certificates
  • Load Balancers
    • ELB/ALB/NLB provide SSL termination
    • Can have multiple SSL certificates per ALB
    • Optional SSL/TLS encryption between ALB and EC2
  • CloudFront with SSL
  • All AWS services expose https endpoint
    • S3 also has http (shouldn't use it)

At Rest

  • S3
    • SSE-S3: Server-side encryption using AWS' key
    • SSE-KMS: Server-side encryption using own KMS key
    • SSE-C: Server-side encryption using own key
    • Clinet-side encryption: Already encrypted data is send through to AWs
    • Can enable default encryption on S3 buckets
    • Can enforce encryption via bucket policy
    • Glacier is encrypted by default
  • Other services:
    • Easy to configure for EBS, EFS, RDS, ElastiCache, DynamoDB, ...
      • Usually either service encryption key or own KMS key
  • Data categories:
    • PHI - protected health information
    • PII - personally-identifying information

Network Protection

  • Direct Connect
    • Private direct connection between on-site and AWS
  • Public Internet: Use VPN
    • Site-to-site VPN that supports Internet Protocol Security (IPsec)
  • Network ACLs for instance protection
  • WAF - Web Application Firewall
  • Security Groups
  • System Firewalls running on EC2 instances

Multi AZ

Services where multi AZ needs to be enabled manually

  • Assign AZ
    • ELB, EFS, ASG, Elastic Beanstalk
  • Synchronous database for failover in different AZ
    • RDS, ElastiCache, Aurora (for DB itself, data is already multi AZ)
    • Elasticsearch

Services that are implicitely multi AZ

  • S3 (with the exception of One Zone Infrequent Access)
  • DynamoDB
  • All of AWS' propriertrary services

Multi Region

Services that have a concept of multi region

. .
DynamoDB Global Tables multi-way replication, implemented by Streams
AWS Config Aggregators multi region as well as multi account
RDS Cross-region read replicas
Aurora Global Database One region is master, other for read & DR
EBS/AMI/RDS Snapshots
VPC Peering Private traffic between VPCs between regions
Route 53 Uses global network of DNS servers
S3 Cross-region replication
CloudFront Global CDN at Edge Locations
Lambda@Edge For global Lambda functions at Edge Locations
CloudFormation StackSets
CodePipeline action can be region specific -> multi-region deploys

Multi Region with Route 53

  • Deploy stacks behind ALB in different regions
  • Use Route 53 routing
    • Latency
    • Geo-proximity
  • Configure health checks
    • Trigger automated DNS failover
    • E.g. base health checks on CloudWatch Alarms

Multi Account

Services that have a concept of multi account

. .
IAM Define IAM Trust to enable cross account actions
Use STS to assume into roles in different accounts
CodePipeline Trigger CodeDeploy across accounts
AWS Config Agregate across accounts
CloudWatch Events Use EventBus to share events across accounts
CloudWatch Logs Use Logs Destination to send events into logging account
CloudFormation StackSets can be deployed across accounts
CloudTrail Can deliver trails into cross-account bucket

Disaster Recovery

  • DR is about preparing for and recovering from a disaster
  • Recovery Point Objective - RPO
    • How often do you run backups? How much data will be lost (since last backup)
  • Recovery Time Objective - RTO
    • How much downtime is acceptable?
From To .
On-prem On-prem Traditional DR, very expensive
On-prem Cloud Hybrid recovery
Cloud Region A Cloud Region B .
. RPO RTO Costs Comment What to do for DR
Backup & Restore High High $ Regular backups Restore
Pilot Light Medium Medium $$ Core system is always running Add non-critical systems
Warm Standby Low Low $$$ Full system at minimum size always running Add resources
Multi Site/Hot Site Lowest Lowest $$$$ Full system at production size always running Only switch traffic

Security Automation & Compliance

Service What it does Will warn about (example)
Amazon Inspector * Application and service security, scans EC2 instances for CVEs
* Network scans
Root login via ssh not disabled
Config * Ensure instance has proper AWS configuration, e.g. no open SSH port
* Track audit and compliance over time
Checks whether Amazon SNS topic is encrypted with KMS
GuardDuty * Scans accounts and workloads Instance has bitcoin activiy, unusual console logins (e.g. new location)
Macie * Protects data SSH private key uploaded to S3
Security Hub * Aggregates view from GuardDuty, Amazon Inspector, Macie, IAM Access Analyzer, AWS Firewall Manager.
Also integrates 3rd party services
Whatever was integrated with SecurityHub
Service Catalog * Restrict how instances are launched by minimizing configuration .
Systems Manager * Run automations, patches, commands, inventory at scale .
TrustedAdvisor * Scans accounts, recommends cost optimisations, fault tolerance, performance, service limits, security Open security groups, EBS snapshot permissions

Notifications

Service SNS (native) CloudWatch/EventBridge Events CloudWatch Metrics/Alarms Comment
Amazon Inspector + - +
(every 5 min)
Notify SNS on assessment run and findings
API Gateway - - +
(API monitoring)
.
Auto Scaling Lifecycle Hooks +
SNS or SQS
+ + .
CloudFormation + - - .
CloudTrail + - - .
CodeBuild - + + .
CodeCommit +
Trigger to SNS or Lambda
Notification to SNS or Chatbot(Slack)
+ -
CodeDeploy +
Trigger to SNS
Notification to SNS or Chatbot(Slack)
+ - .
CodePipeline +
Notification to SNS or Chatbot(Slack)
+ - .
Config +
All events only
+ - .
ECS - + + .
Elastic Beanstalk + - +
minimal, environment health only
.
GuardDuty - + - .
Kinesis - - - .
Lambda - - + .
Macie - + - .
OpsWorks - + + .
S3 +
Event notifications:
SNS
SQS
Lambda
- + documentation
Server Migration Service - + - .
Service Catalog + - + .
Systems Manager + + +
Run Command metrics
Various CloudWatch Events
Trusted Advisor - + + documentation
  • All services have API calls delivered to EventBridge via CloudTrail.
  • EventBridge supports many services.
  • All services that publish CloudWatch Metrics.

External Tools

Jenkins

  • Can replace CodeBuild, CodePipeline, CodeDeploy
    • Tight integration with those services
  • Master/Slave setup
    • Master/slaves can run on the same instance, but usually run on separate instances
    • Can have multiple master with respective set of slaves assigned to them
  • Must manage Multi-AZ, deploy on EC2, ...
  • Jenkinsfile to configure CI/CD
  • Many AWS plugins

Integrating into CodePipeline

  • CodePipeline can send build jobs to Jenkins instead of CodeBuild
  • Jenkins can pull from CodeCommit and eg. upload build result to ECR, invoke Lambda, ...
  • Direct Jenkins support in CodePipeline, requires CodePipeline-plugin on the Jenkins end

Plugins

  • EC2-Plugin
    • Allows Jenkins to start agents on EC2 on demand, and kill them as they get unused.
    • Also support spot instances
  • CodeBuild-Plugin
    • Send builds to CodeBuild
    • Official AWS plugin
  • ECS-Plugin
    • Launch slaves into ECS

Services

Amazon EMR

Overview

Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and database, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Use cases:

  • Machine learning
  • Extract transform load (ETL)
  • Clickstream analysis (from S3, using Apache Spark and Apache Hive)
  • Real-time streaming
  • Interactive analytics
  • Genomics

Amazon Inspector (Core Service)

Overview

Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposure, vulnerabilities, and deviations from best practices. After performing an assessment, Amazon Inspector produces a detailed list of security findings prioritized by level of severity. These findings can be reviewed directly or as part of detailed assessment reports which are available via the Amazon Inspector console or API.

Amazon Inspector security assessments help you check for unintended network accessibility of your Amazon EC2 instances and for vulnerabilities on those EC2 instances. Amazon Inspector assessments are offered to you as pre-defined rules packages mapped to common security best practices and vulnerability definitions. Examples of built-in rules include checking for access to your EC2 instances from the internet, remote root login being enabled, or vulnerable software versions installed. These rules are regularly updated by AWS security researchers.

  • Network Assessments
    • Does not require agent
  • Host Assessments
    • Requires agent
  • Can automate assessments via scheduled CloudWatch Events
    • Can use tag to find instances to asses
    • Cannot launch AMI, requires instance
  • Assessment templates can notify SNS
    • Could trigger Lambda to remediate EC2 findings via SSM documents
  • On AWS: Service - FAQs - User Guide

API Gateway (Core Service)

Overview

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. With a few clicks in the AWS Management Console, you can create REST and WebSocket APIs that act as a “front door” for applications to access data, business logic, or functionality from your backend services, such as workloads running on Amazon Elastic Compute Cloud (Amazon EC2), code running on AWS Lambda, any web application, or real-time communication applications.

Benefits

  • RESTful (stateless) or Websocket (stateful) APIs
  • Powerful, flexible authentication mechanisms, such as AWS IAM policies, Lambda authorizer functions, and Amazon Cognito user pools.
  • Developer portal for publishing your APIs.
  • Canary release deployments for safely rolling out changes.
  • CloudTrail logging and monitoring of API usage and API changes.
  • CloudWatch access logging and execution logging, including the ability to set alarms.
  • Ability to use AWS CloudFormation templates to enable API creation
  • Support for custom domain names.
  • Integration with AWS WAF for protecting your APIs against common web exploits.
  • Integration with AWS X-Ray for understanding and triaging performance latencies.

Concepts

Endpoint

A hostname for an API in API Gateway that is deployed to a specific region. The hostname is of the form {api-id}.execute-api.{region}.amazonaws.com.

The following types of API endpoints are supported:

  • Regional - deployed to the specified region and intended to serve clients in the same AWS region.
  • Edge Optimized - deployed to the specified region while using a CloudFront distribution to facilitate client access typically from across AWS regions
  • Private - exposed through interface VPC endpoints

Stage

A logical reference to a lifecycle state of your REST or WebSocket API (for example, dev, prod, beta, v2).

  • API stages are identified by API ID and stage name.
  • Each stage has its own configuration parameters.
  • Can be rolled back in history.
  • Have stage variables, that are like environment variables for API Gateway.
    • Can be used to configure enpoints that the stage talks to.
    • Accessible from Lambda context as well.
  • Can enable canary deployments for a stage (usually PROD)
    • Canaray releases attaches a new version to an existing stage deployment and randomly shift traffic over
    • Logs and metrics are generated separately for all canary requests
    • This is blue/green for API Gateway/Lambda

Deployment

After creating your API, you must deploy it to make it callable by your users. To deploy an API, you create an API deployment and associate it with a stage. Each stage is a snapshot of the API and is made available for client apps to call.

Canary Deployments

  • Use stage variables for canary deployments
    • Integrate Lambda via alias: GetStartedLambdaProxyIntegration:${stageVariables.lambdaAlias}
    • Overwrite stage variable in canary deployment
  • Could also use Lambda's canary functionality with weighted aliases

Integration

  • Lambda Proxy - request is passed through straight to a Lambda
    • Proxy Lambda deals with complete http request
  • Lambda Non-Proxy/Custom
    • Allows integration of mapping template
    • Can transform request as well as response
    • Allows to evolve the API while keeping Lambda function static
  • Any service
    • All AWS services support dedicated APIs to expose their features. However, the application protocols or programming interfaces are likely to differ from service to service. An API Gateway API with the AWS integration has the advantage of providing a consistent application protocol for your client to access different AWS services.

Mapping Template

  • A scripts in Velocity Template Language (VTL) that transforms a request body from the frontend data format to the backend data format.
  • Cannot add default values to fields, only add new static fields

Model

A data schema specifying the data structure of a request or response payload.

Throttling

  • Account-wide limit of 10,000 requests per second.
    • Applies at service/account level
  • Can create usage plan:
    • Rate, burst, quota
    • Can assoicate with stage/method/API key (to limit certain clients)

Athena

Overview

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning.


CloudFormation (Core Service)

Overview

AWS CloudFormation provides a common language for you to describe and provision all the infrastructure resources in your cloud environment. CloudFormation allows you to use a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts. This file serves as the single source of truth for your cloud environment.

AWS CloudFormation is available at no additional charge, and you pay only for the AWS resources needed to run your applications.

  • Allows to create and provision resources in a reusable template fashion
  • Declarative - no need for ordering and orchestration
  • Separation of concerns - different stacks for different purposes
  • On AWS: Service - FAQs - User Guide

Components

Template

A CloudFormation template is a JSON or YAML formatted text file

Element Comment
AWSTemplateFormatVersion 2010-09-09
Description .
Metadata Details about the template
Parameters Values to pass in right before template creation
Mappings Maps keys to values (eg different values for different regions)
Conditions Check values before deciding what to do
Resources Creates resources - only mandatory section in a template
Outputs Values to be exposed from the console or from API calls
Parameters
  • Type: String, Number, List, CommaDelimitedList, AWS-specific types like AWS::EC2::KeyPair::KeyName, SSM-Parameter key (AWS::SSM::Parameter)
    • SSM SecureString is not supported
  • Description, Default Value, Allowed Values, Allowed Pattern
  • Validation: regular expression/MinLength/MaxLength/MinValue/MaxValue
  • Can set NoEcho for secrets, will be masked with ****
  • Can set UsePreviousValue for stack updates
  • Pseudo parameters are parameters that are predefined by AWS CloudFormation. You do not declare them in your template. Use them the same way as you would a parameter, as the argument for the Ref function.
    • AWS::AccountId, AWS::NotificationARNs, AWS::NoValue, AWS::Partition, AWS::Region, AWS::StackId, AWS::StackName, AWS::URLSuffix
  • Reference a parameter within the template !Ref myParam
  • Usage of parameters might make it hard to instantiate stacks
Mappings
  • Fixed (hardcoded) variables within CloudFormation template
  RegionMap:
    us-east-1:
      HVM64: ami-0ff8a91507f77f867
      HVMG2: ami-0a584ac55a7631c0c
    us-west-1:
      HVM64: ami-0bdb828fd58c52235
      HVMG2: ami-066ee5fd4a9ef77f1
...
  myEC2Instance:
    Type: "AWS::EC2::Instance"
    Properties:
      ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", HVM64]
Conditions
Conditions:
  CreateProdResources: !Equals [ !Ref EnvType, prod ]
  • Can use (and combine) Fn::And, Fn::Equals, Fn::If, Fn::Not, Fn::Or
Resources
  • Over 220 types of resources, AWS::aws-product-name::data-type-name
  • AWS figures out creation, update and delete of resources for us
    • Update can impact a resource in 4 possible ways
      • No interruption
        • E.g. change ProvisionedThroughput of a DynamoDB table
      • Some interruption
        • E.g. change InstanceType of an EC2 instance
      • Replacement
        • E.g. change AvailabilityZone of an EC2 instance
        • E.g. change ImageId of an EC2 instance
        • E.g. change Tablename of a DynamoDB table
  • Can toggle creation with Condition:
     Resources:
     	MountPoint:
     	Type: "AWS::EC2:VolumeAttachment"
     	Condition: CreateProdResources
    
Outputs
  • Can be
    • constructed value/parameter reference/pseudo parameter/output from a function like fn::getAtt or Ref
  • Can be used in a different stack (cross stack references)
    • !ImportValue NameOfTheExport
    • Cannot delete stack if its outputs are used in another stack
Intrinsic Functions
  • Used to pass in values that are not available until runtime
  • Usable in resources, outputs, metadata attributes and update policy attributes (auto scaling). You can also use intrinsic functions to conditionally create stack resources.
  • Most intrinsic functions have a short and a long form (not Ref):
    • Fn::GetAtt: [ logicalNameOfResource, attributeName ]
    • !GetAtt logicalNameOfResource.attributeName
Name Attributes Description
Ref logicalName * Returns the default value of the specified parameter. For resource typically physical id
Fn::Base64 valueToEncode * Provides encoding, converts from plain text into base64
Fn::Cidr ipBlock, count, cidrBits * Returns an array of CIDR address blocks. The number of CIDR blocks returned is dependent on the count parameter
Fn::FindInMap MapName, TopLevelKey, SecondLevelKey * Returns the value corresponding to keys in a two-level map that is declared in the Mappings section
Fn::GetAtt logicalNameOfResource, attributeName * Returns the value of an attribute from an object, either the default or the specified attribute
* Object is either from the same or a nested template
Fn::GetAZs region * Returns an array that lists Availability Zones for a specified region
* If region is omitted return AZs from the region the template is applied in
Fn::If boolean, string1, string2 * Returns string1 if boolean is true, string2 otherwise
Fn::And, Fn::Equals, Fn::Or, Fn::Not . * Good for condition element
Fn::ImportValue sharedValueToImport * Returns the value of an Output exported by another stacki
* You can't delete a stack if another stack references one of its outputs.
* You can't modify or remove an output value that is referenced by another stack.
Fn::Join delimiter, [ comma-delimited list of values ] * Joins a set of values into a single value separated by the specified delimiter
Fn::Select index, listOfObjects * Returns a single object from a list of objects by index
Fn::Split delimiter, source string * Split a string into a list of string values so that you can select an element from the resulting
Fn::Sub - String
- { key: Value, ... }
* Substitutes variables in an input string with values that you specify string list
Also: !Sub 'arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:vpc/${vpc}'
Fn::Transform Name: String
Parameters:
{ key: Value, ... }
* Specifies a macro to perform custom processing on part of a stack template

Stacks

  • Related resources are managed in a single unit called a stack
    • All the resources in a stack are defined by the stack's CloudFormation template
    • Controls lifecycle of managed resources
    • Stack has name & id
    • Can be updated directly or via change set
    • Will rollback stack if it fails to create (can be disabled via API/console)
    • Possible to detect stack drift, if supported by created rersources
    • Can enable termination protection
    • Can send stack events to SNS topic
  • A stack policy is an IAM-style policy statements that governs who can do what
    • Defines which actions can be performed on specified resources.

    • With CloudFormation stack policies you can protect all or certain resources in your stacks from being unintentionally updated or deleted during the update process.

    • Check stack policy if updates are allowed

      • No policy present: All updates are allowed -> This differs from IAM default!
      • Once a policy is applied
        • It cannot be updated or removed from the stack (deleted)
        • All resources that are not explicitely allowed are denied
        • Default deny can be explicitely overwritten
      • Policy format JSON
        • Contains policy documents
    • Don't confuse with DeletionPolicy, UpdatePolicy, UpdateRollbackPolicy attributes

Element .
Effect .
Principal Must be wildcard for stack policies
Action Update:Modify,Update:Replace,Update:Delete,Update:(wildcard)
Resource,NotResource .
Condition Typically evaluates based on resource type

Processes

Stack Creation
  1. Template upload into S3 bucket
  2. Template syntax check
    • CloudFormation will check for any IAM resources being created, and require CAPABILITY_IAM| CAPABILITY_NAMED_IAM if so
    • Will raise InsufficientCapabilities otherwise
  3. Stack name & parameter verification & ingestion (apply default values)
  4. Template processing & stack creation
    • Resource ordering
      • Natural ordering
        • CloudFormation knows about 'natural' dependencies between resources.
      • DependsOn
        • Also DependsOn attribute
        • Allows to direct CloudFormation on how to handle more complex dependencies
        • Applies to creation as well as deletion & rollback
        • DependsOn can be a single resource or a list of resources
        • Will error on circular dependencies
        • DependsOn is problematic if the target resource needs more complex setup than just stack creation
      • -> Wait conditions allow further control about what happens when
    • Resource creation
      • Will try to create as many resources as possible in parallel
      • Includes pausing and waiting for other resources to be created first
      • Associate the CreationPolicy attribute with a resource to prevent its status from reaching create complete until AWS CloudFormation receives a specified number of success signals or the timeout period is exceeded.
    • Output creation
  5. Stack completion or rollback
    • Rollback settings can be provided while creating the stack
      • onFailure - ROLLBACK | DELETE | DO_NOTHING
    • Can try to manually resolve problems if in state UPDATE_ROLLBACK_FAILED
Stacks Updates
  • Direct updates
    • You submit changes and AWS CloudFormation immediately deploys them
  • Change sets
    • You can preview the changes AWS CloudFormation will make to your stack, and then decide whether to apply those changes by executing the change set
    • Change sets are JSON-formatted documents that summarize the changes AWS CloudFormation will make to a stack
  • Use the UpdatePolicy attribute to specify how AWS CloudFormation handles updates to the AWS: AutoScaling::AutoScalingGroup, AWS::ElastiCache::ReplicationGroup, AWS::Elasticsearch::Domain or AWS::Lambda::Alias resources.
    • Values depend on resource type, e.g. ASG replacing vs rolling update
  • Use the UpdateReplacePolicy attribute to retain or (in some cases) backup the existing physical instance of a resource when it is replaced during a stack update operation.
  • On Failure, the stack will rollback automatically to the last known working state
  • Interuption while updating
    • Update can impact a resource in 3 possible ways
      • No interruption
        • E.g. change ProvisionedThroughput of a DynamoDB table
      • Some interruption
        • E.g. change EbsOptimized of an EC2 instance (EBS-backed)
        • E.g. change InstanceType of an EC2 instance (EBS-backed)
      • Replacement
        • E.g. change AvailabilityZone of an EC2 instance
        • E.g. change ImageId of an EC2 instance
        • E.g. change Tablename of a DynamoDB table
Stack Deletion
  • Specify the stack to delete, and AWS CloudFormation deletes the stack and all the resources in that stack.
  • With the DeletionPolicy attribute you can preserve or (in some cases) backup a resource when its stack is deleted.
  • If AWS CloudFormation cannot delete a resource, the stack will not be deleted.
  • A stack can have termination protection enabled, which will prevent it from being deleted accidentally
  • Resource Deletion policy
    • Policy/statement that is associated with every resource of a stack
    • Controls what happens if stack is deleted
    • DeletionPolicy
      • Delete
        • (default)
        • Creates transitive environment - immutable architecture
      • Retain
        • Obviously needs further cleanup - non-immutable architecture
      • Snapshot
        • Takes snapshot prior to deletion
        • Some resourcetypes only
          • AWS::EC2::Volume
          • AWS::ElastiCache::CacheCluster
          • AWS::ElastiCache::ReplicationGroup
          • AWS::Neptune::DBCluster
          • AWS::RDS::DBCluster
          • AWS::RDS::DBInstance
          • AWS::Redshift::Cluster
        • Allow data recovery at a later stage

StackSets

StackSets lets you create stacks in multiple AWS accounts across multiple regions by using a single CloudFormation template. All the resources included in each stack are defined by the stack set's AWS CloudFormation template. As you create the stack set, you specify the template to use, as well as any parameters and capabilities that template requires.

  • Can roll out from Organizations master account
    • To all accounts
    • to all accounts of an OU
Concept
  • Stack sets are created in a region of an administrator account
  • A stack instance is a reference to a stack in a target account within a region
    • Can exist without a stack, e.g. if stack failed to create, then the stack instance shows the reason for that
  • Operations: Create, Update, Delete
  • For updates, can choose to overwrite parameters only for some accounts/regions
Operation options .
Maximum concurrent accounts Maximum number or percentage of target accounts in which an operation is performed at one time
Failure tolerance Maximum number or percentage of stack operation failures that can occur, per region, beyond which AWS CloudFormation stops an operation automatically
Retain stack (delete operations only) Keep stacks and their resources running even after they have been removed from a stack set

Concepts

Running code on instance boot

Script Name Purpose
cfn-init Use to retrieve and interpret resource metadata, install packages, create files, and start services.
cfn-signal Use to signal with a CreationPolicy or WaitCondition, so you can synchronize other resources in the stack when the prerequisite resource or application is ready.
cfn-get-metadata Use to retrieve metadata for a resource or path to a specific key.
cfn-hup Use to check for updates to metadata and execute custom hooks when changes are detected.
Define code and scripts to run
  • CloudFormation User Data
    • Scripts and commands to be passed to a launching EC2 instance.
    • Failing user data scripts don't fail the CFN stack
    • Logged to /var/log/cloud-init-output.log
    • Needs to be base64-encoded
UserData:
  Fn::Base64: |
        #!/bin/bash -x
        ...
  • cfn-init
    • Use the AWS::CloudFormation::Init type to include metadata on an Amazon EC2 instance for the cfn init helper script. If your template calls the cfn-init script, the script looks for resource metadata rooted in the AWS::CloudFormation::Init metadata key.
    • Different sections: packages, groups, users, sources, files, commands, services
    • Need to make sure aws-cfn-bootstrap is in place und up to date
    • Logged to /var/log/cfn-init.log
    • Can use WaitCondition/cfn-signal to make CloudFormation wait for successful finish of code
  • By default, user data scripts and cloud-init directives run only during the boot cycle when you first launch an instance.
  • Can use cfn-hup to detect changes in resource metadata and run user-specified actions when a change is detected
  • Use cfn-get-metadata to fetch a metadata block from AWS CloudFormation and print it to standard out
Signal outcome of installation back to CFN

Creation Policy

  • Can (only) be used for EC2 Instances and Auto Scaling Groups
  • Creation policy definion
    • Defines desired signal count & waiting period
  • Signal configuration
    • Call to cfn-signal from EC2 user-data
AutoScalingGroup:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    ...
  CreationPolicy:
    ResourceSignal:
      Count: '3'
      Timeout: PT15M

LaunchConfig:
  Type: AWS::AutoScaling::LaunchConfiguration
  Properties:
    ...
    UserData:
      "Fn::Base64":
        !Sub |
          #!/bin/bash -xe
          yum update -y aws-cfn-bootstrap
          /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource AutoScalingGroup --region ${AWS::Region}

Wait Conditions and Handlers

  • For other resources (external to the stack)
  • Wait condition handler
    • CloudFormation resource with no properties
    • Generates signed URL to communicate success or failure
    • URL can be used by cfn-signal to send data to
      • Takes custom data as well
  • Wait condition
    • Links handler and resource
      • Know which resource they depend on
      • Hold reference to handler
      • Have response timeout
      • Have a desired count (defaults to 1)
    • Allows to define complex wait order
WebServerGroup:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    ...
WaitHandle:
  Type: AWS::CloudFormation::WaitConditionHandle
WaitCondition:
  Type: AWS::CloudFormation::WaitCondition
  DependsOn: "WebServerGroup"
  Properties:
    Handle:
      Ref: "WaitHandle"
    Timeout: "300"
    Count:
      Ref: "WebServerCapacity"

Custom Resources

  • Problems with existing CloudFormation resources:
    • Sometimes lacks behind AWS services
    • Cannot deal with non-AWS resources
    • Cannot do much logic beyond the scope of intrinsic functions
    • Cannot interact with external services
  • Solvable with custom resources:
    • Custom resource type that is backed by SNS or Lambda
      • Type: Custom::NameOfResourceType
      • ServiceToken: arnOfSnsOrLambda
    • If stack is created, updated or deleted a payload is sent to ServiceToken
      • Payload contains any custom data that's defined with the resource together with the action type
      • This invokes a Lambda that performs any sort of custom action
    • Or an SNS topic, e.g. to communicate with on-prem resources
      • Returns outcome of operation back to CloudFormation, typically includes custom data as well

Drift detection

  • Can detect drift on an entire stack or on a particular resource
    • Not supported by all resources types
  • CloudFormation
    • Compares the current stack configuration to the one specified in the template that was used to create or update the stack
    • Reports on any differences, providing you with detailed information on each one.

Stacks Nesting

  • Resources in a stack can be references by other stacks
  • How to nest
    • Declare resources as AWS::CloudFormation::Stack
    • Point TemplateURL to S3 URL of nested stack
    • Use Parameters to provide the nested stack with input values (defaults will be used otherwise)
    • Output values of nested stack are returned to parent (root) stack
      • !GetAtt nestedStack.Outputs.db_name
  • Benefits
    • Allows infrastructure to be split over many templates
    • Allows infrastructure reuse
    • Allows to workaround limitations like max resources or max template size
    • Considered best practice by AWS

Limits

. .
Max stacks per region 200
Max templates per region unlimited
Max template size (stored in S3) 460kB
Parameters per stack 60
Mappings per stack 100
Resources per stack 200
Outputs per stack 60

CloudFront

Overview

Amazon CloudFront is a web service that speeds up distribution of your static and dynamic web content, such as .html, .css, .js, and image files, to your users. CloudFront delivers your content through a worldwide network of data centers called edge locations. When a user requests content that you're serving with CloudFront, the user is routed to the edge location that provides the lowest latency (time delay), so that content is delivered with the best possible performance.

  • CloudFront is usually the cheapest and simplest way to add caching to web application

Lamdba@Edge

Lambda@Edge is a feature of Amazon CloudFront that lets you run code closer to users of your application, which improves performance and reduces latency. With Lambda@Edge, you don't have to provision or manage infrastructure in multiple locations around the world. You pay only for the compute time you consume - there is no charge when your code is not running.

With Lambda@Edge, you can enrich your web applications by making them globally distributed and improving their performance — all with zero server administration. Lambda@Edge runs your code in response to events generated by the Amazon CloudFront content delivery network (CDN). Just upload your code to AWS Lambda, which takes care of everything required to run and scale your code with high availability at an AWS location closest to your end user.

  • Master Lambda, associated with CloudFront distribution trigger -> Edge replica
  • Can use Lambda@Edge for A/B testing

CloudSearch

Overview

Amazon CloudSearch is a fully managed service in the cloud that makes it easy to set up, manage, and scale a search solution for your website or application.

With Amazon CloudSearch you can search large collections of data such as web pages, document file, forum posts, or product information. You can quickly add search capabilities without having to become a search expert or worry about hardware provisioning, setup, and maintenance. As your volume of data and traffic fluctuates, Amazon CloudSearch scales to meet your needs

CloudTrail (Core Service)

Overview

AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. With CloudTrail, you can log, continuously monitor, and retain account activity related to actions across your AWS infrastructure. CloudTrail provides event history of your AWS account activity, including actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. This event history simplifies security analysis, resource change tracking, and troubleshooting. In addition, you can use CloudTrail to detect unusual activity in your AWS accounts. These capabilities help simplify operational analysis and troubleshooting.

CloudTrail is enabled by default in every account. All activities in an AWS account are being recorded as CloudTrail events.

Concepts

Event

  • JSON format, who did what (API calls).
  • ~15min delay
  • Stored for 90 days

Trail

  • Can configure what type of events to log
    • Management events
    • CloudWatch Insights events
    • Data events
  • One region/all regions/organization-wide
  • Store data in nominated S3 bucket, this can be encrypted as well
    • Can be in a different region
  • Can also deliver and analyse events in a trail with CloudWatch Logs and CloudWatch Events
  • Can validate integrity of log files using digest files
  • Can deliver trails from multiple accounts into the same bucket
    • Change bucket policy to allow that

CloudWatch (Core Service)

Overview

Amazon CloudWatch is a monitoring and management service built for developers, system operators, site reliability engineers (SRE), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, understand and respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications and services that run on AWS, and on-premises servers. You can use CloudWatch to set high resolution alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to optimize your applications, and ensure they are running smoothly.

  • Access all your data from a single platform
  • Easiest way to collect custom and granular metrics for AWS resources
  • Visibility across your applications, infrastructure, and services
  • Improve total cost of ownership
  • Optimize applications and operational resources
  • Derive actionable insights from logs
  • On AWS: Service - FAQs - User Guide
  • See also: AWS Geek 2019

Concepts

CloudWatch Logs

  • Log events are records of some activity recorded by the application or resource being monitored
  • Log streams are sequences of log events from the same source
  • Log groups are groups of log streams that share the same retention, monitoring, and access control settings
  • Metric filters allow to extract metric observations from ingested events and transform them to data points in a CloudWatch metric
    • Search for and match terms, phrases, or values in log events
    • Can increment the value of a CloudWatch metric
  • Retention settings can be used to specify how long log events are kept in CloudWatch Logs Default: Indefinetely
  • Logs can be exported to S3 for durable storage.
    • This can be automated with EventsFilter -> Lambda
Subscriptions
  • Allow real-time delivery of log events
  • Can create subscription filter for Lambda, Elasticsearch, Kinesis Data/Firehose (not supported from console)
    • Only Kinesis Stream supports cross-account
      • Need to establish log data sender and log data recipient.
      • The log group and the destination must be in the same AWS region. However, the AWS resource that the destination points to can be located in a different region.
AWS-Managed Logs
Service Target(s)
Load Balancer Access Logs
(ELB, ALB, NLB)
S3
CloudTrail Logs S3, CloudWatch
VPC Flow Logs S3, CloudWatch
Route 53 Access Logs CloudWatch
S3 Access Logs S3
CloudFront Access Logs S3

CloudWatch Metrics

Namespaces

  • Container for CloudWatch metrics
  • Metrics in different namespaces are isolated from each other
  • The AWS namespaces typically use the following naming convention: AWS/service

Metrics

  • Metrics are the fundamental concept in CloudWatch Metrics
  • A metric represents a time-ordered set of data points that are published to CloudWatch.
  • Available metrics are based on currently used service
  • Not everything is available out of the box, e.g. no data on memory usage of EC2 instances
  • Can also create Custom Metrics
    • Publish individual data points via AWS CLI or API
    • Exist only in the region where they were created
    • Expire after 15 months if no data is published
    • aws cloudwatch put-metric-data --metric-name PageViewCount --namespace MyService --value 2 --timestamp 2016-10-20T12:00:00.000Z
  • Can also export metrics
    • get-metric-statistics --namespace <value> --metric-name <value> --start-time <value> --end-time <value>...
  • Metrics produced by AWS services are standard resolution by default.
    • When you publish a custom metric, you can define it as either standard resolution or high resolution.
    • When you publish a high-resolution metric, CloudWatch stores it with a resolution of 1 second, and you can read and retrieve it with a period of 1 second, 5 seconds, 10 seconds, 30 seconds, or any multiple of 60 seconds.
    • Higher resolution data automatically aggregates into lower resolution data
Resolution Data retention
<60s 3h
60s 15d
300s (5min) 63d
3600s (1h) 15m

Time Stamps

  • Each metric data point must be associated with a time stamp.
  • Can be up to two weeks in the past and up to two hours into the future.

Dimension

  • A dimension is a name/value pair that is part of the identity of a metric.
  • You can assign up to 10 dimensions to a metric.
  • Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics.
  • For example 'ec2 instance id'

Statistics

  • Statistics are metric data aggregations over specified periods of time
  • Average, Sum, Minimum, Maximum, Sample Count, pNN.NN (value of specified percentile)
  • Can be computed for any time periods between 60-seconds and 1-day

Period

  • The length of time associated with a specific Amazon CloudWatch statistic

Aggregation

  • CloudWatch aggregates statistics according to the period length that you specify when retrieving statistics

CloudWatch Alarms

  • Based on thresholds defined on metrics, including custom metrics
    • Can only be based on a single metric
  • Can trigger Lambda, SNS, email, ...
    • Also Auto Scaling or EC2 action
    • Alarms do not raise CloudWatch Events themselves
  • High resolution alarms down to 10 seconds
  • Takes place once, at a specific point in time
    • Disable with mon-disable-alarm-actions via CLI
  • Can be added to dashboard
  • Using alarm actions, you can create alarms that automatically stop, terminate, reboot, or recover your EC2 instances.
Billing Alarms
  • Notfications on billing metrics
  • Only available in us-east-1

CloudWatch Events

  • Define actions on things that happened
  • Or schedule cron-based events
  • Events are recorded constantly over time
    • Targets process events
      • Lambda functions
      • Amazon EC2 instances
      • Streams in Amazon Kinesis Data Streams
      • Delivery streams in Amazon Kinesis Data Firehose
      • Log groups in Amazon CloudWatch Logs
      • Amazon ECS tasks
      • Systems Manager Run Command
      • Systems Manager Automation
      • AWS Batch jobs
      • AWS Step Functions state machines
      • Pipelines in AWS CodePipeline
      • AWS CodeBuild projects
      • Amazon Inspector assessment templates
      • Amazon SNS topics
      • Amazon SQS queues
      • Built-in targets: EC2 CreateSnapshot API call, EC2 RebootInstances API call, EC2 StopInstances API call, and EC2 TerminateInstances API call
      • The default event bus of another AWS account
    • Rules match incoming events and route them to targets
      • CloudTrail integration allows to trigger events on API calls
        • ReadOnly calls (List*, Get*, Describe*) are not supported
    • E.g. CodeCommit automatically triggers CodePipeline on new commits
  • Can deliver cross-account
    • Must be in the same region
EventBridge
  • CloudWatch Events in its core
  • Adding other (3rd party service partners) event sources into the mix
S3 Events
  • On bucket events send to SNS, SQS or Lambda
    • Not everything is covered by S3 notifications
    • Only object-level operations, not bucket-level
  • Can also integrate with CloudTrail, but need a trail configured for that bucket

Dashboards

  • Customizable home pages in the CloudWatch console that you can use to monitor your resources in a single correlated view
  • Even those resources that are spread across different Regions.
  • You can use CloudWatch dashboards to create customized views of the metrics and alarms for your AWS resources.

Unified CloudWatch Agent

  • Collects metrics and logs from
    • EC2 instances (Linux/Windows)
    • On-Prem instances (Linux/Windows)
  • Stores configuration in SSM parameters
    • Easy to share
    • Can configure CloudWatch Agent to boot directly from SSM parameter store
  • Offers a variety of metrics on top of the EC2 standard metrics

Key metrics

  • EC2 metrics are based on what is exposed to the hypervisor.
  • Basic Monitoring (default) submits values every 5 minutes, Detailed Monitoring every minute
Metric Effect
CPUUtilization The total CPU resources utilized within an instance at a given time.
DiskReadOps,
DiskWriteOps
The number of read (write) operations performed on all instance store volumes. This metric is applicable for instance store-backed AMI instances.
DiskReadBytes,
DiskWriteBytes
The number of bytes read (written) on all instance store volumes. This metric is applicable for instance store-backed AMI instances.
NetworkIn,
NetworkOut
The number of bytes received (sent) on all network interfaces by the instance
NetworkPacketsIn,
NetworkPacketsOut
The number of packets received (sent) on all network interfaces by the instance
StatusCheckFailed,
StatusCheckFailed_Instance,
StatusCheckFailed_System
Reports whether the instance has passed both/instance/system status check in the last minute.
  • Can not monitor memory usage, available disk space, swap usage
    • This can be achieved with the Unified CloudWatch Agent

Auto Scaling Group

Metric Effect
GroupMinSize
GroupMinSize
The minimum/maximum size of the Auto Scaling Group.
GroupDesiredCapacity The number of instances that the Auto Scaling Group attempts to maintain.
GroupInServiceInstances
GroupPendingInstances
GroupStandbyInstances
GroupTerminatingInstances
The number of instances that are running/pending (not yet in service)/standby (still running)/ terminating as part of the Auto Scaling Group.
GroupTotalInstances The total number of instances in the Auto Scaling Group. This metric identifies the number of instances that are in service, pending, and terminating.

Metric Effect
Latency Time it takes to receive an response. Measure max and average
BackendConnectionErrorr Number of not successfully established connections to registered instances, measure sum and look at difference between min and max
SurgeQueueLength Total number of request waiting to get routed, look at max and average
SpilloverCount Dropped requests because of exceeded surge queue. Look at sum
HTTPCode_ELB_4XX
HTTPCode_ELB_5XX
The number of HTTP XXX server error codes that originate from the load balancer. This count does not include any response codes generated by the targets. Look at sum
HTTPCode_Backend_2XX...
...HTTPCode_Backend_5XX
The number of HTTP XXX server error codes that originate from the backend. Look at sum
RequestCount Number of completed requests
HealthyHostCount,UnhealthyHostCount Self explainatory
  • Elastic Load Balancing reports metrics only when requests are flowing through the load balancer. If there are requests flowing through the load balancer, Elastic Load Balancing measures and sends its metrics in 60-second intervals.

spillover and surge queue give an indication of the ELB being overloaded

  • Typically this means that the backend system cannot process requests as fast as they are coming in
    • Ideally load balance into an Auto Scaling Group.

Metric Effect
RequestCount Number of completed requests
HealthyHostCount,UnhealthyHostCount Self explainatory
TargetResponseTime The time elapsed after the request leaves the load balancer until a response from the target is received.
HTTPCode_ELB_3XX_Count
HTTPCode_ELB_4XX_Count
HTTPCode_ELB_5XX_Count
The number of HTTP XXX server error codes that originate from the load balancer. This count does not include any response codes generated by the targets.

Metric Effect
processedbyte The total number of bytes processed by the load balancer, including TCP/IP headers.
tcp_client_reset_count the total number of reset (rst) packets sent from a client to a target.
tcp_elb_reset_count the total number of reset (rst) packets generated by the load balancer.
tcp_target_reset_coun the total number of reset (rst) packets sent from a target to a client.

Tutorials

. . .
CloudWatch Events/Eventbridge Relay Events to AWS Systems Manager Run Command Configure event ASG, Instance Launch and Terminate, EC2-Instance-Launch lifecycle action
Target: SSM Run Command, add command(s)
CloudWatch Events/Eventbridge Log the State of an Amazon EC2 Instance Usings Configure event EC2, Instance State Change, Specific states: Running
Invoke lambda the logs state from incoming event
CloudWatch Events/Eventbridge Log the State of an Auto Scaling Group Configure event ASG, Instance Launch and Terminate, EC2-Instance-Launch lifecycle action
Invoke lambda the logs state from incoming event
CloudWatch Events/Eventbridge Log Amazon S3 Object-Level Operations Configure CloudTrail trail to monitor S3 bucket(s)(no events otherwise)
Implement Lambda that logs state
Configure event to trigger on PutObject
Invoke Lamdba
CloudWatch Events/Eventbridge Use Input Transformer to Customize What Is Passed to the Event Target .
CloudWatch Events/Eventbridge Log AWS API Calls .
CloudWatch Events/Eventbridge Schedule Automated Amazon EBS Snapshots .
CloudWatch Events/Eventbridge Set AWS Systems Manager Automation as an EventBridge Target .
CloudWatch Events/Eventbridge Relay Events to a Kinesis Stream .
CloudWatch Events/Eventbridge Run an Amazon ECS Task When a File Is Uploaded to an Amazon S3 Bucket .
CloudWatch Events/Eventbridge Schedule Automated Builds Using AWS CodeBuild .
CloudWatch Events/Eventbridge Log State Changes of Amazon EC2 Instances .
CloudWatch Events/Eventbridge Download Code Bindings for Events using the EventBridge Schema Registry .

CodeBuild (Core Service)

Overview

AWS CodeBuild is a fully managed continuous integration service that compiles source code, Runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can get started quickly by using prepackaged build environments, or you can create custom build environments that use your own build tools. With CodeBuild, you are charged by the minute for the compute resources you use.

  • Provides preconfigured environments for supported versions of Java, Ruby, Python, Go, Node.js, Android, .NET Core, PHP, and Docker
  • Build images are Amazon Linux 2, Ubuntu 18.04, Windows Server Core 2016
    • Can use build image from ECR, even cross-account
  • On AWS: Service - FAQs - User Guide

Benefits

  • Fully managed build service
    • Serverless
    • Leverages Docker under the hood
    • Can use own docker images
    • Integrates with KMS, IAM, VPC and CloudTrail
  • Continuous scaling
  • Extensible
    • Can use own build tools and runtimes
  • Can attach to VPC

Components

  • Source code from CodeCommit, S3, GitHub, Bitbucket
  • Build defined in buildspec.yml
    • Build timeouts up to 8h
    • Uses queue to process build jobs
  • Triggers can schedule a build, can also use cron expressions
  • CloudWatch integration
    • Logs (can also go to S3)
    • Metrics to monitor CodeBuild statistics
      • Can set up Alarms on top of those
    • Can schedule CloudWatch Events

How it works

  • CodeBuild is provided with a build project.
    • Defines how Codebuild runs
      • Source code location, build environment to use, build commands
  • CodeBuild uses information from build project to create build environment
    • Build runs in container, in phases
    • submitted, queued, provisioning, download_source, install, pre_build, build, post_build, upload_artifacts, finalizing, completed
  • Download source code into build environment, use buildspec to build
. . .
version 0.3 .
run-as . .
env variables, parameter-store, exported-variables, secrets-manager, git-credentials-helper .
phases install, pre_build, build, post_build Every phase has a finally section
Failed build transitions to post_build, all others to finalizing
reports . .
artifacts, secondary-artifacts . .
cache paths .
  • Environment variables
    • Can come from SSM/Secrets Manager
    • Precendence: start-build (CLI), project definition, buildspec.yml
  • If there is build output and CodeBuild is not managed by CodePipeline
    • Uploaded to S3, encrypted by default
    • With default configuration, artifacts are overwritten each time
    • Can provide Namespace type to save output to different location every time
  • While build is running, the build environments sends information to CloudWatch and CodeBuild
    • Can also use console, CLI, SDK to retrieve information about running build

CodeCommit (Core Service)

Overview

AWS CodeCommit is a fully-managed source control service that hosts secure Git-based repositories. It makes it easy for teams to collaborate on code in a secure and highly scalable ecosystem. CodeCommit eliminates the need to operate your own source control system or worry about scaling its infrastructure. You can use CodeCommit to securely store anything from source code to binaries, and it works seamlessly with your existing Git tools.

Benefits

  • Fully managed
  • Highly available
  • Faster development cycle
  • Code lives close to actual environments
  • Secure
    • Encryption, IAM integration
  • Collaborate on code
  • Use existing tools
    • CodeBuild, Jenkins, other CI tools
  • Fully enabled for automation
    • Provides notifacations and triggers for all repository events

How To

Protect branches

Use IAM policy:

"Condition": {
		"StringEqualsIfExists": {
				"codecommit:References": [
						"refs/heads/master"
				]
		},
		"Null": {
				"codecommit:References": false
		}
}

Send Notifications

  • CodeStar Notifications integration
    • Uses CloudWatch Events Rules under the hood
  • Set up notification/notification rules:
    • Pick triggering event (on commit, ..., brunch updated)
    • Pick SNS topic as target
    • Add subscriber to target
  • Set up CloudWatch Events directly:
    • Pick event source (AWS service)
    • Pick event type (different types, also includes CloudTrail)
    • Pick target (many different types, e.g. Lambda, SNS, SSM, Code*)

Triggers

  • Can trigger SNS or Lambda
  • Up to 10 triggers
  • Can be augmented with custom data (an uninterpreted string) that you can use to distinguish the trigger from others that run for the same event.
  • More limited in scope than Notifications. Does not us CloudWatch Events under the hood.

CodeDeploy (Core Service)

Overview

AWS CodeDeploy is a fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers. AWS CodeDeploy makes it easier for you to rapidly release new features, helps you avoid downtime during application deployment, and handles the complexity of updating your applications. You can use AWS CodeDeploy to automate software deployments, eliminating the need for error-prone manual operations. The service scales to match your deployment needs.

  • CodeDeploy can be chained into CodePipeline and can use artifacts from there
  • CodeDeploy does not provision resources
  • Automated deployments
    • EC2, on-prem, (ASG), ECS, (Fargate), Lambda
  • Minimize downtime
  • Centralized control
  • Easy to adopt
  • Integrates with AWS SAM
  • On AWS: Service - FAQs - User Guide

Components

  • Application
    • The application that should be deployed
  • Deployment
    • Revision
      • Specific version of deployable content, such as source code, post-build artifacts, web pages, executable files, and deployment scripts, along with an AppSpec file
      • AppSpec File
        • Environment variables are being exposed as well (e.g. DEPLOYMENT_ID, DEPLOYMENT_GROUP_NAME), allow to implement logic in installation process.
        • Slightly different format for EC2/ECS/Lambda.
    • Deployment group
      • Set of instances or Lambda functions, defined by tags, e.g. 'environment=prod'
      • Can define multiple deployment groups with an application, such a prod or staging
      • Can be associated with
        • CloudWatch Alarms that would stop the deployment if triggered
        • Triggers for notification
        • Rollbacks
    • Deployment configuration
      • Specifies how the behavior for how deployment should proceed
      • CodeDeployDefault.OneAtATime, ..., CodeDeployDefault.LambdaCanary10Percent10Minutes
      • Can create own
        • Define minimum healthy hosts by percentage or number
        • E.g. 9 instances in total, 6 minimum healthy hosts, deploy 3 at a time. Deployment is successful after 6 hosts have been successfully deployed
EC2/On-Premises ECS Lambda
version: 0.0
os: operating-system-name
files:
source-destination-files-mappings
permissions:
permissions-specifications
hooks:
deployment-lifecycle-event-mappings
version: 0.0
resources: ecs-service-specifications
hooks:
deployment-lifecycle-event-mappings
version: 0.0
resources: lambda-function-specifications
hooks:
deployment-lifecycle-event-mappings
hooks section contains mappings that link one or more scripts hooks section specifies Lambda validation functions hooks section specifies Lambda validation functions

How it works

Overview

  • CodeDeploy (Agent) continuously polls CodeDeploy
  • CodeDeploy sends AppSpec file
  • Revision is pulled from S3/GitHub/BitBucket
    • Actual revison always comes from S3, needs IAM permissions for the bucket
  • Run in phases ApplicationStop, (DownloadBundle), BeforeInstall, (Install), AfterInstall, ApplicationStart, ValidateService
  • CodeDeploy (Agent) reports back success/failure

Notifications and logging

  • CodeStar Notifications integration (goes out to SNS only)
  • CloudWatch Event integration
  • Deployment triggers can notify SNS
    • Can trigger based on deployment or instance events
  • CloudWatch Log Agent on machine can push logs to CloudWatch
    • No logs for ECS/Lambda deploys

Rollback

  • Manual Rollback by simply deploying a previous version
  • Automated Rollbacks
    • Options
      • When deployment fails
        • Rollback when any deployments on any instance fails
      • When alarm thresholds are met
        • Add CloudWatch Alarm to deployment group
    • Can define cleanup file to help removing already installed files

Deploys

To EC2/On-premises

Step Comment
Create application .
Specify deployment group Tags and/or ASG name.
Specify deployment configuration AllAtOnce,HalfAtATime,OneAtATime (default)
In-place (default), Blue/green
Upload revison .
Deploy .
Check results .
Redeploy as needed .
  • In-place

    • The application on each instance in the deployment group is stopped
    • The latest application revision is installed
    • The new version of the application is started and validated.
    • You can use a load balancer so that each instance is deregistered during its deployment and then restored to service after the deployment is complete.
    • Only deployments that use the EC2/On-Premises compute platform can use in-place deployments.0
  • Blue/green

    • Instances are provisioned for the replacement environment.
    • The latest application revision is installed on the replacement instances.
    • An optional wait time occurs for activities such as application testing and system verification.
    • Instances in the replacement environment are registered with an Elastic Load Balancing load balancer, causing traffic to be rerouted to them.
    • Instances in the original environment are deregistered and can be terminated or kept running for other uses.
    • If you use an EC2/On-Premises compute platform, be aware that blue/green deployments work with Amazon EC2 instances only.
Integration with Elastic Load Balancing
  • During deployments, a load balancer prevents internet traffic from being routed to instances when they are not ready, are currently being deployed to, or are no longer needed as part of an environment.
  • Blue/Green Deployments
    • Allows traffic to be routed to the new instances in a deployment group that the latest application revision has been deployed to (the replacement environment), according to the rules you specify
    • Blocks traffic from the old instances where the previous application revision was running (the original environment).
    • After instances in a replacement environment are registered with a load balancer, instances from the original environment are deregistered and, if you choose, terminated.
    • Specify a Classic Load Balancer, Application Load Balancer, or Network Load Balancer in your deployment group.
  • In-Place Deployments
    • Prevents internet traffic from being routed to an instance while it is being deployed to
    • Makes the instance available for traffic again after the deployment to that instance is complete.
    • If a load balancer isn't used during an in-place deployment, internet traffic may still be directed to an instance during the deployment process.
    • When you use a load balancer with an in-place deployment, instances in a deployment group are deregistered from a load balancer, updated with the latest application revision, and then reregistered with the load balancer as part of the same deployment group after the deployment is successful.
    • Specify a Classic Load Balancer, Application Load Balancer, or Network Load Balancer. You can specify the load balancer as part of the deployment group's configuration, or use a script provided by CodeDeploy to implement the load balancer.
Integration with Auto Scaling Groups
  • When new Amazon EC2 instances are launched as part of an Amazon EC2 Auto Scaling group, CodeDeploy can deploy your revisions to the new instances automatically.
  • During blue/green deployments on an EC2/On-Premises compute platform, you have two options for adding instances to your replacement (green) environment:
    • Use instances that already exist or that you create manually.
    • Use settings from an Amazon EC2 Auto Scaling group that you specify to define and create instances in a new Amazon EC2 Auto Scaling group.
  • If an Amazon EC2 Auto Scaling scale-up event occurs while a deployment is underway, the new instances will be updated with the application revision that was most recently deployed, not the application revision that is currently being deployed.
    • Suspend scaling during rolling deploys
    • Or redeploy
Register on-premises instances
  • Configure each on-premises instance, register it with CodeDeploy, and then tag it.
    • Can create IAM User per instance
      • Needs configuration file with AK/SAK
      • Use register or register-on-premises-instances command
      • Best for only few instances
    • Can create IAM Role
      • Needs credentials to call STS with
      • Best for many instances, also more secure
      • Setup more complicated
      • Use register-on-premises-instances command together with STS token service
  • Need to install CodeDeploy agent, obviously
  • On-prem instances cannot blue/green, as CodeDeploy cannot create new infrastructure

To Lambdas

Step Comment
Create application .
Specify deployment group Only a name, Lambdas are specified in appspec
Specify deployment configuration LambdaCanary10Percent5Minutes/10/15/30
LambdaLinear10PercentEvery1Minute/2/3/10
LambdaAllAtOnce
only Blue/green
Specify an AppSpec file S3 (local with AWS CLI)
Deploy .
Check results .
Redeploy as needed .
  • Lambda deploys a new version under an alias, and traffic is shifted between old and new version
    • (Versions and Aliases are native Lambda features)
Integration with AWS Serverless
  • Deploys new versions of your Lambda function, and automatically creates aliases that point to the new version.
  • Gradually shifts customer traffic to the new version until you're satisfied that it's working as expected, or you roll back the update.
  • Defines pre-traffic and post-traffic test functions to verify that the newly deployed code is configured correctly and your application operates as expected.
  • Rolls back the deployment if CloudWatch alarms are triggered.
DeploymentPreference:
 Type: Canary10Percent10Minutes
 Alarms:
   # A list of alarms that you want to monitor
   - !Ref AliasErrorMetricGreaterThanZeroAlarm
 Hooks:
   # Validation Lambda functions that are run before & after traffic shifting
   PreTraffic: !Ref PreTrafficLambdaFunction
   PostTraffic: !Ref PostTrafficLambdaFunction

To ECS

Step Comment
Create ECS service Set its deployment controller to CodeDeploy
Create application .
Specify deployment group Specify
* ECS cluster and service name
* production listener, optional test listener, and target groups
* deployment settings, such as when to reroute production traffic to the replacement ECS task
* optional settings such as triggers, alarms and rollback behaviour
Specify deployment configuration ECSCanary10Percent5Minutes/15
LambdaLinear10PercentEvery1Minute/3
ECSAllAtOnce
only Blue/green
Specify an AppSpec file S3 (local with AWS CLI)
Deploy .
Check results .
Redeploy as needed .
  • CodeDeploy reroutes traffic from the original version of a task set to a new, replacement task set
  • Target groups specified in the deployment group are used to serve traffic to the original and replacement task sets
  • After the deployment is complete, the original task set is terminated.
  • Can specify an optional test listener to serve test traffic to your replacement version before traffic is rerouted to it

CodePipeline (Core Service)

Overview

AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. CodePipeline automates the build, test, and deploy phases of your release process every time there is a code change, based on the release model you define. This enables you to rapidly and reliably deliver features and updates. You can easily integrate AWS CodePipeline with third-party services such as GitHub or with your own custom plugin. With AWS CodePipeline, you only pay for what you use. There are no upfront fees or long-term commitments.

Benefits

  • Rapid delivery
  • Configurable workflow
  • Get started fast
    • No CI/CD infastructue needs to be provisioned
  • Easy to integrate
    • Plugin concept to integrate with other components
  • Integrates with CloudWatch Events
  • CodeStar Notifications integration

Components

  • stage
    • action group (run in sequence)
      • action (run in sequence or parallel)
        • Various action providers provide functionality
        • runOrder allows to decide about sequential and parallel
        • region replicates source bucket into target region. This enables multi-region deploys
        • Can invoke Lambda as an almost arbitrary pipeline action
  • Stages create artifacts
    • Stored in S3, passed on to the next stage
      • Default setting for artifact store would create one bucket per pipe, can also specify Custom location
      • Store must be in the same region as pipeline
      • Always encrypted, default KMS or CMK
    • Artifacts are the way different pipeline stages 'communicate' with each other
    • CodePipeline artifacts are sightly different to CodeBuild artifacts

Pipeline Actions

  • Source
    • S3
    • CodeCommit
      • -> Must have one pipeline per branch!
      • Change detection CloudWatch (recommended), or CodePipeline (poll periodically)
      • Creates CloudWatch Events rule in the background
    • Github
    • ECR
  • Build
    • CodeBuild
    • Jenkins
    • TeamCity
    • etc
  • Test
    • CodeBuild
    • Jenkins
    • Ghost Inspector
  • Deploy
    • CodeDeploy
      • Can deploy into different region
    • CloudFormation
    • Beanstalk
    • ECS
      • ServiceCatalog
    • etc
  • Invoke
    • Lambda
  • Approval
    • SNS

Scenarios

CodePipeline with

  • Amazon S3, AWS CodeCommit, and AWS CodeDeploy
  • Third-party Action Providers (GitHub and Jenkins)
  • AWS CodeStar to Build a Pipeline in a Code Project
  • Compile, Build, and Test Code with CodeBuild
  • Amazon ECS for Continuous Delivery of Container-Based Applications to the Cloud
  • Elastic Beanstalk for Continuous Delivery of Web Applications to the Cloud
  • AWS Lambda for Continuous Delivery of Lambda-Based and Serverless Applications
  • AWS CloudFormation Templates for Continuous Delivery to the Cloud

CodeStar

Overview

AWS CodeStar enables you to quickly develop, build, and deploy applications on AWS. AWS CodeStar provides a unified user interface, enabling you to easily manage your software development activities in one place. With AWS CodeStar, you can set up your entire continuous delivery toolchain in minutes, allowing you to start releasing code faster. AWS CodeStar makes it easy for your whole team to work together securely, allowing you to easily manage access and add owners, contributors, and viewers to your projects. Each AWS CodeStar project comes with a project management dashboard, including an integrated issue tracking capability powered by Atlassian JIRA Software. With the AWS CodeStar project dashboard, you can easily track progress across your entire software development process, from your backlog of work items to teams’ recent code deployments.

  • CodeStar provides a central console where you can assign project team members the roles they need to access tools and resources. These permissions are applied automatically across all AWS services used in your project, so you don't need to create or manage complex IAM policies.
    • owner, contributor, viewer
  • On AWS: Service - FAQs - User Guide

Benefits

  • Start developing on AWS in minutes
  • Manage software delivery in one place
  • Work across your team securely
  • Choose from a variety of project templates

Under the hood

Uses cfn-transform to generate cfn from template.yml


Config (Core Service)

Overview

AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources. Config continuously monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired configurations. With Config, you can review changes in configurations and relationships between AWS resources, dive into detailed resource configuration histories, and determine your overall compliance against the configurations specified in your internal guidelines. This enables you to simplify compliance auditing, security analysis, change management, and operational troubleshooting.

  • Evaluate your AWS resource configurations for desired settings.
  • Get a snapshot of the current configurations of the supported resources that are associated with your AWS account.
  • Retrieve configurations of one or more resources that exist in your account.
  • Retrieve historical configurations of one or more resources.
  • Receive a notification whenever a resource is created, modified, or deleted.
  • View relationships between resources. For example, you might want to find all resources that use a particular security group.
  • On AWS: Service - FAQs - User Guide

Config Rules

  • Evaluate the configuration settings of AWS resources
  • A Config rule represents your ideal configuration settings
  • Predefined rules called managed rules to help you get started
  • Can also create custom rules
  • AWS Config continuously tracks the configuration changes that occur among your resources
    • Checks whether these changes violate any of the conditions in your rules.
    • If a resource violates a rule, AWS Config flags the resource and the rule as noncompliant.
  • Can remediate using AWS Systems Manager Automation Documents

Automation

  • SNS notification on all Config events (cannot configure which events)
  • CloudWatch Events to observe specific events/rules

Aggregation

An aggregator is an AWS Config resource type that collects AWS Config configuration and compliance data from the following:

  • Multiple accounts and multiple regions.
  • Single account and multiple regions.
  • An organization in AWS Organizations and all the accounts in that organization.
  • Is limited to 50 per account
    • "We are unable to complete the request at this time. Try again later or contact AWS Support"

DynamoDB

Overview

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multi region, multi master, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second.

  • Fully managed NoSQL database
  • HA through different AZs, automatically spreads data and traffic accross servers
    • 3 geographically distributed regions per table
  • Can scale up and down depending on demand (no downtime, no performance degradation)
  • Automatic or user-controlled read/write capacity provisioning
  • No joins - create references to other tables manually (table1#something)
  • Conditional updates and concurrency control (atomic counters)
  • Option between eventual consistency or strongly consistency
  • Built-in monitoring
  • Big Data: Integrates with AWS Elastic MapReduce and Redshift
  • Can configure TTL to expire table entries
  • On AWS: Service - FAQs - User Guide
  • See also: AWS Geek 2018
  • See also: AWS Geek 2018

Keys and indexes

  • Partition key is also called hash attribute or primary key
  • Must be unique, used for internal hash function (unordered)
  • Used to retrieve data
  • You should design your application for uniform activity across all logical partition keys in the Table and its secondary indexes.

PK & Sort key

  • Composite PK: index composed of hashed PK (unordered) and SK (ordered)
  • Sort key is also called range attribute or range key
  • Different items can have the same PK, must have different SK

Secondary indexes

  • Associated with exactly one table, from which it obtains its data
  • Allows to query or scan data by an alternate key (other than PK/SK)
  • Only for read operations, write is not supported.

Projected attributes

  • Attributes copied from the base table into an index
  • Makes them queryable
  • Different projection types
    • KEYS_ONLY - Only the index and primary keys are projected into the index
    • ALL - All of the table attributes are projected into the index
    • INCLUDE - Only the specified table attributes are projected into the index

Local secondary index

  • Local as in "co-located on the same partition"
  • Uses the same PK, but offers different SK
  • Every partition of a local secondary index is scoped to a base table partition that has the same partition key value
  • Local secondary indexes are extra tables that DynamoDB keeps in the background
  • Can only by created together with the base table
  • Can choose eventual consistency or strong consistency at creation time
  • Can request not-projected attributes for query or scan operation
  • Consumes read/write throughput from the original table.

Global secondary index

  • Global as in "over many partitions"
  • Uses different PK and offers additional SK (or none).
  • PK does not have to be unique (unlike base table)
  • Queries on the global index can span all of the data in the base table, across all partitions
  • Can be created after the base table has already been created.
  • Only support eventual consistency
  • Have their own provisioned read/write throughput
  • Global secondary keys are distributed transactions across multiple partitions
  • Cannot request not-projected attributes for query or scan operation

Capacity provisioning

  • Unit for operations:
    • 1 strongly consistent read per second (up to 4KB/s)
    • 2 eventual consistent read per second (up to 8KB/s)
    • 1 write per second (up to 1KB)
  • Algorithm
. .
. 300 strongly consistent reads of 11KB per minute
Calculate read/writes per second 300r/60s = 5r/s
Multiply with payload factor 5r/s * (11KB/4KB) = 15cu
If eventual consistent, devide by 2 15cu / 2 = 8cu

DynamoDB Accelerator (DAX)

Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. DAX does all the heavy lifting required to add in-memory acceleration to your DynamoDB tables, without requiring developers to manage cache invalidation, data population, or cluster management. Now you can focus on building great applications for your customers without worrying about performance at scale. You do not need to modify application logic, since DAX is compatible with existing DynamoDB API calls. You can enable DAX with just a few clicks in the AWS Management Console or using the AWS SDK. Just as with DynamoDB, you only pay for the capacity you provision.

DynamoDB Streams

DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time.

A DynamoDB stream is an ordered flow of information about changes to items in a DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.

  • Typically targets Lambda function
    • Only up to 2 Lambda functions on the same stream, throttling issues otherwise
  • Underlying implementation is a Kinesis Stream
  • Can enable Global Table
    • Require DynamoDB streams and empty table
    • Replicate table in different region (read and write)

Global Tables

Amazon DynamoDB global tables provide a fully managed solution for deploying a multi region, multi master database, without having to build and maintain your own replication solution. With global tables you can specify the AWS Regions where you want the table to be available. DynamoDB performs all of the necessary tasks to create identical tables in these Regions and propagate ongoing data changes to all of them.

  • A global table is a collection of one or more replica tables, all owned by a single AWS account.
  • A replica table is a single DynamoDB table that functions as a part of a global table. Each replica stores the same set of data items.
  • When you create a DynamoDB global table, it consists of multiple replica tables (one per Region) that DynamoDB treats as a single unit. Every replica has the same table name and the same primary key schema. When an application writes data to a replica table in one Region, DynamoDB propagates the write to the other replica tables in the other AWS Regions automatically.
  • You can add replica tables to the global table so that it can be available in additional Regions.

Overview

Amazon Elastic Container Service (Amazon ECS) is a highly scalable, fast, container management service that makes it easy to run, stop, and manage Docker containers on a cluster. You can host your cluster on a serverless infrastructure that is managed by Amazon ECS by launching your services or tasks using the Fargate launch type. For more control you can host your tasks on a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances that you manage by using the EC2 launch type.

Benefits

  • Containers without servers
  • Containerize Everything
  • Secure
  • Performance at Scale
  • AWS Integration

Components

[Cluster
  [Services
    [Task Definitions
      [Family]
      [Task role/execution role]
      [Network mode]
      [Container Definitions
        [Name/Image]
        [Memory/Port Mappings]
        [Health Check]
        [Environment]
        [Network Settings]
        [Storage and Logging]
        [Security]
        [Resource Limit]
        [Docker labels]
      ]
    ]
  ]
]
  • Cluster
    • Logical grouping of EC2 instances that you can place tasks on
    • Instances run ECS agent as a Docker container
    • Cluster is an Auto Scaling Group with a Launch Configuration using a special ECS AMI
  • Service
    • Runs and maintains a specified number of tasks simultaneously
    • Created on cluster-level, launch type EC2 or Fargate
    • Can be linked to ALB/NLB/ELB
    • Service type
      • Replica - places and maintains the desired number of tasks across your cluster
      • Demon - deploys exactly one task on each active container instance that meets all of the task placement constraints
        • Good for e.g. monitoring that should run on every container instance
    • Deployment type
      • Rolling
        • Controlled by Amazon ECS
        • Service scheduler replacing the current running version of the container with the latest version
      • Blue/Green
        • Controlled by CodeDeploy
        • Allows to verify a new deployment of a service before sending production traffic to it
  • Task Definition
    • ECS allows to run and maintain a specified number containers in a task definition
      • Group by responsility, e.g. separate task definitions for frontend and backend
    • The task definition is a text file, in JSON format, that describes one or more containers, up to a maximum of ten, that form your application
    • Specify various parameters, eg:
      • Container image to use
      • Port to be opened & networking
      • Data volumes
    • Either for ECS or Fargate
  • Container Definitions
    • Can mark container as essential - if that container fails or stops for any reason, all other containers that are part of the task are stopped
  • Task
    • A task is the instantiation of a task definition within a cluster
    • If a task should fail or stop, the ECS scheduler launches another instance of the task definition to replace it and to maintain the desired count of tasks in service
    • Static host port mapping: Only one task per container instance allowed, e.g. mapping host port 80 to container port
    • Dynamic host port mapping: Uses randomized host ports, can work together with ALB to run multiple task instances per container instance
    • Tasks can have individual IAM roles

Auto Scaling

  • Use Service Auto Scaling for

    • Target Tracking Scaling Poilicy
    • Step Scaling Policy
    • Sdcheduled Scaling
  • For ECS, we also need to scale the cluster

    • This is really tricky, but in essence there's an ASG around the EC2 instances that form the cluster
    • Could use Fargate, obviously
    • Or even Elastic Beanstalk

Logging

  • For tasks, configure logging agent with task definition
    • Typically CloudWatch, also supports Splunk
  • For cluster instances, install CloudWatch Agent
  • Various ECS-specific CloudWatch metrics available
  • Various ECS-specific CloudWatch Events available
  • Can enable CloudWatch Container Insights
    • Sends per-container metrics into CloudWatch metrics

  • Needs login aws ecr get-login --no-include-email --region ap-south-2
  • docker pull aws_account_id.dkr.ecr.us-west-2.amazonaws.com/amazonlinux:latest

Fargate

  • Don't need to provision cluster
    • Does not need EC2 instance roles to create cluster
  • Requires VPC

Elastic Beanstalk (Core Service)

Overview

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time.

  • Allows to deploy, monitor and scale applications quickly
  • Focuses on components and performance, not configuration and specification
  • Aims to simplify or even remove infrastructure management
  • Provides different options for low cost and high availability quick starts
  • Underlying instances can be automatically patched
  • Platform-specific application source bundle (e.g. Java war for Tomcat)
    • Go
    • Java SE/with Tomcat
    • .NET on Windows Server with IIS
    • Node.js
    • PHP
    • Python
    • Ruby Stanalone/Puma
    • Docker Single-/Multicontainer - runs on ECS
    • Docker Preconfigured Glassfish/Python/Go
  • On AWS: Service - FAQs - User Guide
  • See also: AWS Geek 2019

Concepts

Components

  • Application
    • A logical collection of Elastic Beanstalk components, including environments, versions, and environment configurations.
    • An application is conceptually similar to a folder.
    • Either application in the traditional sense
    • Or application component, e.g. service, frontend, backend
  • Application Version
    • Refers to a specific, labeled iteration of deployable code for a web application
    • Unique package that represents a version of the application
    • Uploaded as a zipped application source bundle
    • Each application can have many application versions
    • Can be deployed to one or more environments within an application
    • Limit of 1000 version -> can configure application version lifecycle management
  • Environment
    • Collection of AWS resources running an application version
    • Isolated, self-contained set of components of infrastructure
    • Single-instance or multi-instance scalable
    • Application can have multiple environments
      • Either represent development stages (PROD, STAGING, ...)
      • ...or application component, e.g. service, frontend, backend
    • Environment has one application
    • Can be cloned as a whole environment
    • Can rebuild environment - complete delete and rebuild
  • Environment tier
    • Designates the type of application that the environment runs, and determines what resources Elastic Beanstalk provisions to support it
      • Type Web server environment tier
        • Represents a web application
      • Type Worker environment tier
        • Act upon output created by another environment - pulls work from SQS queue
          • Elastic Beanstalk is running a daemon process on each instance that reads from the queue
        • Ideal for long running workloads
        • Should only be loosely coupled to web server environments, eg. via SQS
        • Can also be invoked on a schedule (cron.yml)
  • Environment configuration
    • Identifies a collection of parameters and settings that define how an environment and its associated resources behave
  • Saved configuration
    • A template that you can use as a starting point for creating unique environment configurations
  • Platform
    • Combination of an operating system, programming language runtime, web server, application server, and Elastic Beanstalk components
In CloudFormation stack
Web Application
(non-docker)
Web Application
(docker, runs on ECS)
Worker
AWS::AutoScaling::AutoScalingGroup AWS::AutoScaling::AutoScalingGroup AWS::CloudFormation::WaitConditionHandle
AWS::AutoScaling::LaunchConfiguration AWS::AutoScaling::LaunchConfiguration AWS::DynamoDB::Table
AWS::AutoScaling::ScalingPolicy AWS::CloudFormation::WaitCondition AWS::EC2::SecurityGroup
AWS::CloudFormation::WaitCondition AWS::CloudFormation::WaitConditionHandle AWS::SQS::Queue
AWS::CloudFormation::WaitConditionHandle AWS::EC2::EIP .
AWS::CloudWatch::Alarm AWS::EC2::SecurityGroup .
AWS::EC2::SecurityGroup . .
AWS::EC2::SecurityGroupIngress . .
AWS::ElasticLoadBalancing::LoadBalancer . .

Configuration precedence

Configuration options, sorted by precedence:

Settings applied directly to the environment

Via console, eb-cli, ...

Existing configuration saved into .elasticbeanstalk
  • Can use eb config ... to save/snapshot a configuration of an application
  • Saved into .elasticbeanstalk
    • Only non-default values are being saved
    • Format is *.cfg.yml
  • This can be modified, uploaded and applied
    • Good to backup existing applications
    • Can also be applied elsewhere, e.g. in a different region
.ebextensions in project
  • Folder is part of application source bundle
  • Allows granular configuration & customisation of the EB environment and its resources
  • Contains *.config in YAML format with different sections like
  • option_settings
    • Global configuration options
  • resources
    • Specify additional resources, allows granular configuration of these
    • Put CloudFormation here, export outputs into beanstalk environment
      • Good for e.g. DDB table, SNS topic, ...
    • However, those resources are part of the environment and would be deleted with it
      • If this is not desired, then create resources independently
      • Usually best for e.g. databases
  • commands
    • Executed on ec2 instance
    • Run before application and web server are being set up
  • container_commands
    • Executed on ec2 instance
    • Affects application source code
    • Run after application archive has been extracted, but before application version has been deployed
    • Can use leader_only to only run on a single instance. E.g. migrate database
    • packages, sources, files, users, groups, service * Various package managers are being support (yum, rubygems, ...)
    • Applied with next eb deploy
Default values

As the name says.

Deployment Types

  • Single instance deploys
  • HA with Load Balancer
    • All at once
    • Rolling update
    • Rolling with additional batches
    • Immutable
      • Adds new Auto Scaling Group to existing environment
    • Blue/Green - not really supported, however
      • Can manually create new environment
      • Either use swap URL feature or manually create DNS
    • Traffic Splitting via Application Load Balancer
  • Can configure Elastic Beanstalk to ignore health checks during deployments

Limits

. .
Applications 75
Application Versions 1000
Configuration Templates 2000
Environments 200

Elasticsearch Service

Amazon Elasticsearch Service is a fully managed service that makes it easy for you to deploy, secure, and run Elasticsearch cost effectively at scale. You can build, monitor, and troubleshoot your applications using the tools you love, at the scale you need. The service provides support for open source Elasticsearch APIs, managed Kibana, integration with Logstash and other AWS services, and built-in alerting and SQL querying. Amazon Elasticsearch Service lets you pay only for what you use – there are no upfront costs or usage requirements. With Amazon Elasticsearch Service, you get the ELK stack you need, without the operational overhead.

Overview

  • Managed version of Elasticsearch
  • Runs on servers
  • Use cases:
    • Log Analytics
    • Real-time application monitoring
    • Security analysis
    • Full text search
    • Clickstream analytics
    • Indexing
  • Not a good choice for record processing
  • On AWS: Service - FAQs - User Guide

  • Elasticsearch
    • Provides search and indexing capabilities
  • Logstash
    • Log ingestion mechanism
    • Agent-based
  • Kibana
    • Provides real-time dashboards on top of data in ES

DynamoDB -> DynamoDB Stream -> Lambda -> AWS ES CloudWatch Logs -> Subscription Filter -> Lambda -> AWS ES (real time) CloudWatch Logs -> Subscription Filter -> Kinesis Firehose -> AWS ES (near real time)


GuardDuty (Core Service)

Overview

Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts and workloads. With the cloud, the collection and aggregation of account and network activities is simplified, but it can be time consuming for security teams to continuously analyze event log data for potential threats. With GuardDuty, you now have an intelligent and cost-effective option for continuous threat detection in the AWS Cloud. The service uses machine learning, anomaly detection, and integrated threat intelligence to identify and prioritize potential threats. GuardDuty analyzes tens of billions of events across multiple AWS data sources, such as AWS CloudTrail, Amazon VPC Flow Logs, and DNS log. With a few clicks in the AWS Management Console, GuardDuty can be enabled with no software or hardware to deploy or maintain. By integrating with Amazon CloudWatch Events, GuardDuty alerts are actionable, easy to aggregate across multiple accounts, and straightforward to push into existing event management and workflow systems.


Kinesis (Core Service)

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. Amazon Kinesis enables you to process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

[various data sources]->Kinesis Streams->Kinesis Analytics->Kinesis Firehose->[S3]

Overview

Kinesis Data Stream

  • Enables you to build custom applications that process or analyze streaming data for specialized needs
  • Real-time data delivery
  • Streams are devided into ordered shards/partitions
    • Shards can evolve over time (reshard/merge)
    • Records are ordered per shard (but not across shards)
  • Data retention is 1 day (default), up to 7 days
  • Ability to process/replay data
    • Real-time processing
  • Multiple applications can consume the same stream
  • Once date is inserted into Kinesis, it can't be deleted (immutability)
  • Records
    • Data Blob, up to 1MB
    • Record Key - helps grouping into shards, should be highly distributed
    • Sequence Number - unique identifier for each record put into shards
  • Producers
    • Kinesis SDK, Kinesis Producer Library (KPL), Kinesis Agent, CloudWatch Logs
    • 3rd party libraries: Spark, Log4j Appenders, ...
  • Consumers
    • Kinesis SDK, Kinesis Client Library (KCL), Kinesis Connector Library, AWS Lambda
    • 3rd party libraries: Spark, Log4j Appenders, ...

Kinesis Data Firehose

  • The easiest way to load streaming data into data stores and analytics tools
  • Near Real-time data delivery (~60 seconds)
  • Automatic Scaling
  • Can do data transformation through Lambda
  • Supports compression for S3
  • Pay for the amount of data going through Firehose
  • Producers
    • Kinesise Data Streams, CloudWatch Logs & Events, ...
  • Consumers (data receivers)
    • Redshift, S3, ElasticSearch, Splunk
Kinesis Data Streams Kinesis Firehose
Must manage scaling Fully managed
Real time Near real time
Data storage No data storage
Can write custom code for consumers/producers Serverless Lambda

For real time delivery Kinesis data streams are the only option.

Kinesis Data Analytics

  • The easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time
  • Performing real time analytics on Kinesis Streams using SQL
  • Managed, auto-scaling
  • Can create Kinesis Streams in real-time

Limits

. . ..
Kinesis Streams . .
. Producer 1MB/s or 1000 messages/s write per shard (->ProvisionedThroughputException)
. Consumer Classic 2MB/s write per shard
. . 5 API calls per second per shard
. Data Retentions 7 days

Lambda (Core Service)

Overview

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume - there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service - all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

  • Features
    • No servers
    • Continuous scaling
      • Cold Start - if no idle container is available to run the Lambda
    • Very cheap
    • Can give more RAM which will proportionaly increase CPU as well
  • Supported languages
    • nodejs, Java, C#/PowerShell, Python, Golang, Ruby
  • Can pass in environment variables
    • These can be KMS-encrypted as well (need SDK to decrypt)
  • On AWS: Service - FAQs - User Guide
  • See also: AWS Geek 2020
  • See also: AWS Geek 2020

Managing Functions

triggers -> function & layers -> destinations

Versions

  • If you work on a Lambda function, you work on $LATEST
  • The system creates a new version of your Lambda function each time that you publish the function. The new version is a copy of the unpublished version of the function.
  • Version is code and configuration
  • Versions are immutable, you can change the function code and settings only on the unpublished version of a function.
  • Each version gets its own ARN

Aliases

  • You can create one or more aliases for your AWS Lambda function. A Lambda alias is like a pointer to a specific Lambda function version.
  • Aliases are mutable
  • Users can access the function version using the alias ARN.
  • Can create e.g. dev, test and prod.
    • Aliases can point to multiple versions with a weight - for canary-style deployments

Layers

  • You can configure your Lambda function to pull in additional code and content in the form of layers.
  • A layer is a ZIP archive that contains libraries, a custom runtime, or other dependencies.
  • With layers, you can use libraries in your function without needing to include them in your deployment package.

Network

  • You can configure a function to connect to private subnets in a VPC in your account.
  • Use VPC to create a private network for resources such as databases, cache instances, or internal services.
  • Connect your function to the VPC to access private resources during execution.
  • Provisioning process for Lambda takes longer

Database

  • You can use the Lambda console to create an RDS database proxy for your function.
  • A database proxy manages a pool of database connections and relays queries from a function.
  • This enables a function to reach high concurrency levels without exhausting database connections.

Invoking Functions

Synchronous/Asynchronous/Event Source Invocation

  • When you invoke a function synchronously, Lambda runs the function and waits for a response.
    • -> API Gateway, ALB, Cognito, Lex, Alexa, CloudFront (Lambda@Edge), Kinesis Data Firehose
  • When you invoke a function asynchronously, Lambda sends the event to a queue. A separate process reads events from the queue and runs your function.
    • Lambda manages the function's asynchronous invocation queue and attempts to retry failed events automatically. If the function returns an error, Lambda attempts to run it two more times
    • When all attempts to process an asynchronous invocation fail, Lambda can send the event to an Amazon SQS queue or an Amazon SNS topic.
    • -> S3, SNS, SES, CloudFormation, CloudWatch Logs & Events, CodeCommit, Config
  • An event source mapping is an AWS Lambda resource that reads from an event source and invokes a Lambda function.
    • -> Kinesis, DynamoDB, SQS

Function Scaling

  • The first time you invoke your function, AWS Lambda creates an instance of the function and runs its handler method to process the event.
    • When the function returns a response, it sticks around to process additional events.
    • If you invoke the function again while the first event is being processed, Lambda creates another instance.
    • This continues until there are enough instances to serve all requests, or a concurrency limit is reached.
    • When the number of requests decreases, Lambda stops unused instances to free up scaling capacity for other functions.
  • Concurrency is invocations/s * runtime (eg. 10/s * 4s = 40)
    • Can configure reservered concurrency
      • No other function can use that concurrency
    • Can configure provisioned concurrency before an increase in invocations
      • Can ensure that all requests are served by initialized instances with very low latency.
  • Default Burst concurrency limits
    • 3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland)
    • 1000 – Asia Pacific (Tokyo), Europe (Frankfurt)
    • 500 – Other Regions

Monitoring and troubleshooting

  • AWS Lambda automatically monitors Lambda functions on your behalf and reports metrics through Amazon CloudWatch. To help you monitor your code as it executes, Lambda automatically tracks the number of requests, the execution duration per request, and the number of requests that result in an error.
  • It also publishes the associated CloudWatch metrics.
  • Need custom metric for memory usage

License Manager

Overview

AWS License Manager makes it easier to manage your software licenses from software vendors such as Microsoft, SAP, Oracle, and IBM across AWS and on-premises environments. AWS License Manager lets administrators create customized licensing rules that emulate the terms of their licensing agreements, and then enforces these rules when an instance of EC2 gets launched. Administrators can use these rules to help prevent licensing violations, such as using more licenses than an agreement stipulates. The rules in AWS License Manager enable you to help prevent a licensing breach by stopping the instance from launching or by notifying administrators about the infringement. Administrators gain control and visibility of all their licenses with the AWS License Manager dashboard and reduce the risk of non-compliance, misreporting, and additional costs due to licensing overages.


Macie

Overview

Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS. Amazon Macie recognizes sensitive data such as personally identifiable information (PII) or intellectual property, and provides you with dashboards and alerts that give visibility into how this data is being accessed or moved. The fully managed service continuously monitors data access activity for anomalies, and generates detailed alerts when it detects risk of unauthorized access or inadvertent data leaks. Amazon Macie is available to protect data stored in Amazon S3.


Managed Services

Overview

As enterprise customers move towards adopting the cloud at scale, some find their people need help and time to gain AWS skills and experience. AWS Managed Services (AMS) operates AWS on your behalf, providing a secure and compliant AWS Landing Zone, a proven enterprise operating model, on-going cost optimization, and day-to-day infrastructure management. By implementing best practices to maintain your infrastructure, AWS Managed Services helps to reduce your operational overhead and risk. AWS Managed Services automates common activities, such as change requests, monitoring, patch management, security, and backup services, and provides full-lifecycle services to provision, run, and support your infrastructure. AWS Managed Services unburdens you from infrastructure operations so you can direct resources toward differentiating your business.


OpsWorks Stacks (Core Service)

Overview

AWS OpsWorks is a configuration management service that provides managed instances of Chef and Puppet. Chef and Puppet are automation platforms that allow you to use code to automate the configurations of your servers. OpsWorks lets you use Chef and Puppet to automate how servers are configured, deployed, and managed across your Amazon EC2 instances or on-premises compute environments.

  • Declarative desired state engine
    • Automate, monitor and maintain deployments
  • AWS' implementation of Chef
    • Original Chef
    • AWS-bespoke orchestration components
    • Cookbooks define recipes
  • OpsWorks has three offerings:
    • AWS OpsWorks Stacks (<- exam relevant)
    • AWS Opsworks for Chef Automate
    • AWS OpsWorks for Puppet Enterprise
  • For Chef 11, BerkShelf is often used
    • Allows to use external cookbooks
  • On AWS: Service - FAQs - User Guide

Components

  • Stack
    • Set of resources that are managed as a group * Need to enable custom cookbooks, this has to be enabled on stack level
  • Layer
    • Represent and configure components of a stack
    • Share common configuration elements
    • E.g. load balancer layer, app layer, db layer
    • Type: OpsWorks, ECS or RDS
    • If a layer has auto healing enabled—the default setting—AWS OpsWorks Stacks automatically replaces the layer's failed instances * After 5 minutes a non-responsive instance becomes failed.
  • Instance
    • Units of compute within the platform
    • Must be associated with at least one layer
      • Linux or Windows, but not both
    • Can run
      • 24/7
      • Load-based
      • Time-based
        • Servers have to exist and to be pre-assigned to be e.g. load-based. So they are not created on demand.
  • App
    • Applications that are deployed on one or more instances
    • Deployed through source code repo or S3
  • Deployments
    • Deploy application code and related files to application server instances
    • Deployment operation is handled by each instance's Deploy recipes, which are determined by the instance's layer
      • Can rollback up to 4 versions

Lifecycle Events

  • Each layer has a set of five lifecycle events, each of which has an associated set of recipes that are specific to the layer
  • When an event occurs on a layer's instance, AWS OpsWorks Stacks automatically runs the appropriate set of recipes
. .
Setup Occurs after a started instance has finished booting
Configure Occurs on all of the stack's instances when one of the following occurs:
* An instance enters or leaves the online state.
* You associate an Elastic IP address with an instance or disassociate one from an instance.
* You attach an Elastic Load Balancing load balancer to a layer, or detach one from a layer.
Deploy Occurs when you run a Deploy command.
Undeploy Occurs when you run a Undeploy command
Shutdown Occurs after you direct AWS OpsWorks Stacks to shut an instance down but before the associated Amazon EC2 instance is actually terminated.

Under the hood

  • CloudWatch Events integration
    • Can configure event rules to trigger alarms
  • Under the hood
    • OpsWorks agent
      • Configuration of machines
    • OpsWorks automation engine
      • Create, update & delete of various AWS components
      • Handles load balancing, auto scaling and auto healing
      • Supports lifecycle events
  • BerkShelf
    • Addresses an OpsWorks shortcoming from old versions - only one repository for recipes
    • Was added in OpsWorks 11.10 and allows to install cookbooks from many repositories

Organizations

Overview

AWS Organizations offers policy-based management for multiple AWS accounts. With Organizations, you can create groups of accounts, automate account creation, apply and manage policies for those groups. Organizations enables you to centrally manage policies across multiple accounts, without requiring custom scripts and manual processes.

Using AWS Organizations, you can create Service Control Policies (SCPs) that centrally control AWS service use across multiple AWS accounts. You can also use Organizations to help automate the creation of new accounts through APIs. Organizations helps simplify the billing for multiple accounts by enabling you to setup a single payment method for all the accounts in your organization through consolidated billing. AWS Organizations is available to all AWS customers at no additional charge.

Benefits

  • Centrally manage policies across multiple accounts
  • Control access to AWS services
  • Automate AWS account creation and management
  • Consolidated billing
    • One paying account linked to many linked accounts
    • Pricing benefits (Volumes, Storage, Instances)
  • Create account hierachy with Organizational Units (OUs)
  • Apply SCPs across the hierachy
  • Apply Tag Policies across the hierachy

Service Control Policies (SCP)

Service control policies (SCPs) are one type of policy that you can use to manage your organization. SCPs offer central control over the maximum available permissions for all accounts in your organization, allowing you to ensure your accounts stay within your organization’s access control guidelines. SCPs are available only in an organization that has all features enabled. SCPs aren't available if your organization has enabled only the consolidated billing features. SCPs do not apply for the master account itself.

Tag Policies

Tag policies are a type of policy that can help you standardize tags across resources in your organization's accounts. In a tag policy, you specify tagging rules applicable to resources when they are tagged.

Limits

. .
Maximum linked accounts 20

Personal Health Dashboard

Overview

AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you. While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.

The dashboard displays relevant and timely information to help you manage events in progress, and provides proactive notification to help you plan for scheduled activities. With Personal Health Dashboard, alerts are triggered by changes in the health of AWS resources, giving you event visibility, and guidance to help quickly diagnose and resolve issues.


QuickSight

Overview

Amazon QuickSight is a fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization.

As a fully managed service, QuickSight lets you easily create and publish interactive dashboards that include ML Insights. Dashboards can then be accessed from any device, and embedded into your applications, portals, and websites.

  • Data sources
    • Amazon Athena
    • Amazon Aurora
    • Amazon Redshift
    • Amazon S3
    • Various SQL databases. Snowflake
  • On AWS: Service - FAQs - User Guide

Redshift

Overview

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases.

  • Queries are written in SQL

Relational Database Service

Overview

Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups. It frees you to focus on your applications so you can give them the fast performance, high availability, security and compatibility they need.

Amazon RDS is available on several database instance types - optimized for memory, performance or I/O - and provides you with six familiar database engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. You can use the AWS Database Migration Service to easily migrate or replicate your existing databases to Amazon RDS.

Backups

  • Automated backups
    • Enabled by default
    • Allow to recover database within a retention period (1d - 35d)
      • Transactional storage engine recommended as DB engine
      • Will take daily snapshot and use transaction logs
      • PITR down to 1 second (within retention period)
    • Backups are taken within defined window
      • Degrades performance if multi-AZ is not enabled!
      • Taken from slave if multi-AZ is enabled
    • Backups are stored internaly on S3
      • Free storage space equal to DB size
      • Deleting an instance deletes all automated backups
    • Cannot be shared with other accounts (need to be turned into manual snapshots first
  • Database Snapshots
    • Only manually, always user initiated
    • Won't be deleted with DB instance

Multi-AZ deployments

Provide enhanced availability for database instances within a single AWS Region.

  • Meant for disaster recover, not for performance improvement (-> Read Replica)
  • Configure RDS for multi-AZ deployments and turn replication on
    • Keeps a synchronous standby replica in a different AZ
    • Automatic failover in case of planned or unplanned outage of the first AZ
      • Most likely still has downtime
      • Can force failover by rebooting
    • Other benefits
      • Patching
      • Backups
    • Aurora can replicate accross 3 AZs

Read replicas

  • Read queries are routed to read replicas, reducing load on primary db instance (source instance)
  • To create read replicas, AWS initally creates a snapshot of the source instance
    • Multi-AZ failover instance (if enabled) is used for snapshotting
    • After that all read queries are then asynchronously copied to read replica
    • Implies data latency, which typically is acceptable.
      • ReplicaLag can be monitored and CloudWatch alarms can be configured
    • No AWS charges for data replication in same region
  • A single master can have up to 5 read replicas
    • Can be in different regions
    • Can have Multi-AZ enabled themselves
  • Read replicas are not the same as multi-AZ failover instances which
    • are synchronously updated
    • are designed to handle failover
    • don't receive any load unless failover actually happens
  • Often it is beneficial to have both read replicas and multi-AZ failover instances
  • Read replicas can be promoted to normal instances
    • E.g. use read replica to implement bigger changes on db level, after these have been finished promote to master instance
    • This will break replication

Route 53

Overview

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service. You can use Route 53 to perform three main functions in any combination: domain registration, DNS routing, and health checking. If you choose to use Route 53 for all three functions, perform the steps in this order:

  • Register domain names
    • Your website needs a name, such as example.com. Route 53 lets you register a name for your website or web application, known as a domain name.
  • Route internet traffic to the resources for your domain
    • When a user opens a web browser and enters your domain name (example.com) or subdomain name acme.example.com) in the address bar, Route 53 helps connect the browser with your website or web application.
  • Check the health of your resources
    • Route 53 sends automated requests over the internet to a resource, such as a web server, to verify that it's reachable, available, and functional. You also can choose to receive notifications when a resource becomes unavailable and choose to route internet traffic away from unhealthy resources.

Terminology

  • Hosts - Computers or services accessible within a domain
  • Name Server - Translates domain names into IP addresses
  • Zone File - Text file that contains mappings between domain names and IP addresses
  • Records - Entries in zone file, mappings beween resources and names

How it works

Basic Flow

Root Server -> TLD Server -> Domain-Level Name Server -> Zone File

Zone File & Records

Zone file stores records. Various records exists:

Type Definition Example
SOA State of Authority - Mandatory first entry, defines various things,
eg name servers & admin contact
ns1.dnsimple.com admin.dnsimple.com 2013022001 86400 7200 604800 300
A Map host name to ip4 address px01.vc.example.com. 198.51.100.40
AAAA Map host name to ip6 address px01.vc.example.com. 2a00:1450:4014:80c:0:0:0:2004
CNAME Defines alias for host name
(maps one domain name to another)
www.dnsimple.com. dnsimple.com.
MX Defines mail exchange example.com. 1800 MX mail1.example.com. 10
PTR Maps ip4 address to host name (inverse to A record) 10.27/1.168.192.in-addr.arpa. 1800 PTR mail.example.com.
SVR Points one domain to another domain name using a specific destination port _sip._tcp.example.com. 86400 IN SRV 0 5 5060 sipserver.example.com.

Route53 specific:

  • Alias record
    • Amazon Route 53 alias records provide a Route 53–specific extension to DNS functionality. Alias records let you route traffic to selected AWS resources, such as CloudFront distributions and Amazon S3 bucket. They also let you route traffic from one record in a hosted zone to another record.
    • Unlike a CNAME record, you can create an alias record at the top node of a DNS namespace, also known as the zone apex. For example, if you register the DNS name example.com, the zone apex is example.com. You can't create a CNAME record for example.com, but you can create an alias record for example.com that routes traffic to www.example.com
    • Preferred choice over CNAME (TODO: why?)

Route53 Routing Policies

  • Simple
    • Default policy, typically used if only a single resource performs functionality
  • Weighted
    • Control distribution of traffic with DNS entries
      • This can be based on a certain percentage
      • Set routing policy to weighted (instead of failover)
  • Latency
    • Control distribution of traffic based on latency.
  • Failover
    • Can set up health checks for endpoints or domains from within Route53
      • Route 53 has health checkers in locations around the world. When you create a health check that monitors an endpoint, health checkers start to send requests to the endpoint that you specify to determine whether the endpoint is healthy.
      • evaluate target health
    • DNS entries are then being associated with health checks and can be configured to failover as well (1 primary and n secondary recordsets)
  • Geolocation
    • Geolocation routing lets you choose the resources that serve your traffic based on the geographic location of your users, meaning the location that DNS queries originate from. For example, you might want all queries from Europe to be routed to an ELB load balancer in the Frankfurt region.
  • Geoproximity Routing (Traffic Flow Only)
    • Geoproximity routing lets Amazon Route 53 route traffic to your resources based on the geographic location of your users and your resources. You can also optionally choose to route more traffic or less to a given resource by specifying a value, known as a bias. A bias expands or shrinks the size of the geographic region from which traffic is routed to a resource.
  • Multivalue Answer Routing
    • Multivalue answer routing lets you configure Amazon Route 53 to return multiple values, such as IP addresses for your web servers, in response to DNS queries. You can specify multiple values for almost any record, but multivalue answer routing also lets you check the health of each resource, so Route 53 returns only values for healthy resources. It's not a substitute for a load balancer, but the ability to return multiple health-checkable IP addresses is a way to use DNS to improve availability and load balancing.

Overview

Amazon Simple Storage Service (S3) is object storage with a simple web service interface to store and retrieve any amount of data from anywhere on the web. It is designed to deliver 11x9 durability and scale past trillions of objects worldwide.

  • Key-value storage (folder-like structure is only a UI representation)
  • Bucket size is unlimited. Objects from 0B to 5TB.
  • HA and scalable, transparent data partitioning
    • Data is automatically replicated to at least 3 separate data centers in the same region
  • Bucket lifecycle events can trigger SNS, SQS or AWS Lambda
    • New object created events
    • Object removal events
    • Reduced Redundancy Storage (RRS) object lost event
  • Buckets are per region, but AWS console is global (displaying all bucket for that account)
  • Bucket names have to be globally unique, should comply with DNS naming conventions.
  • On AWS: Service - FAQs - User Guide
  • See also: AWS Geek 2018

Versioning

  • Works on bucket level (for all objects)
  • Versioning can either be unversioned (default), enabled or suspended
  • Version ids are automatically assigned to objects

Logging

  • AWS CloudTrail logs S3-API calls for bucket-level operations (and many other information) and stores them in an S3 bucket. Could also send email notifications or trigger SNS notifications for specific events.
  • S3 Server Access Logs log on object level.
    • Provide detailed records for the requests that are made to a bucket
    • Needs to be enabled on bucket level

Cross-Region Replication

  • Buckets must be in different regions
    • Can replicate cross-account
  • Must have versioning enabled
  • Only new/changed objects will be replicated

Storage classes

. Durability Availability AZs Costs per GB Retrieval Fee .
S3 Standard 11x9 4x9 >=3 $0.023 No .
S3 Intelligent Tiering 11x9 3x9 >=3 $0.023 No Automatically moves objects between two access tiers based on changing access patterns
S3 IA (infrequent access) 11x9 3x9 >=3 $0.0125 Yes For data that is accessed less frequently, but requires rapid access when needed
S3 One Zone IA (infrequent access) 11x9 99.5 1 $0.01 Yes For data that is accessed less frequently, but requires rapid access when needed
Glacier 11x9 . >=3 . Yes For archival only, comes as expedited, standard or bulk
Glacier Deep Archive 11x9 . >=3 . Yes Longer time span to retrieve
S3 RRS (reduced redundancy storage) 4x9 4x9 >=3 $0.024 . Deprecated

Access Control

  • Effect – This can be either allow or deny
  • Principal – Account or user who is allowed access to the actions and resources in the statement
  • Actions – For each resource, S3 supports a set of operations
  • Resources – Buckets and objects are the resources
  • Authorization works as a union of IAM & bucket policies and bucket ACLs

Defaults

  • Bucket is owned by the AWS account that created it
    • Ownership refers to the identity and email address used to create the account
      • Bucket ownership is not transferable
  • Bucket owner gets full permission (ACL)
  • The person paying the bills always has full control.
  • A person uploading an object into a bucket owns it by default.

  • IAM policies (in general) specify what actions are allowed or denied on what AWS resources
  • Defined as JSON
  • Attached to IAM users, groups, or roles (so they cannot grant access to anonymous users)
  • Use if you’re more interested in “What can this user do in AWS?”

Bucket policies

  • Specify what actions are allowed or denied for which principals on the bucket that the policy is attached to
  • Defined as JSON
  • Attached only to S3 buckets. Can however effect object in buckets.
  • Contain principal element (unnecessary for IAM)
  • Use if you’re more interested in “Who can access this S3 bucket?”
  • Easiest way to grant cross-account permissions for all s3:* permission. (Cannot do this with ACLs.)

  • Defined as XML. Legacy, not recomended any more.
  • Can
    • be attached to individual objects (bucket policies only bucket level)
    • control access to object uploaded into a bucket from a different account.
  • Cannot..
    • have conditions
    • cannot explicitely deny actions
    • grant permission to bucket sub-resources (eg. lifecycle or static website configurations)
  • Other than object ACLs there are bucket ACLs as well - only for writing access log objects to a bucket.

Secrets Manager

Overview

AWS Secrets Manager helps you protect secrets needed to access your applications, services, and IT resources. The service enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. Users and applications retrieve secrets with a call to Secrets Manager APIs, eliminating the need to hardcode sensitive information in plain text. Secrets Manager offers secret rotation with built-in integration for Amazon RDS, Amazon Redshift, and Amazon DocumentDB. Also, the service is extensible to other types of secrets, including API keys and OAuth tokens. In addition, Secrets Manager enables you to control access to secrets using fine-grained permissions and audit secret rotation centrally for resources in the AWS Cloud, third-party services, and on-premises.

  • Allows for easier rotation than SSM Parameter Store
  • Can trigger Lambda
  • Deeply integrates into RDS
  • On AWS: Service - FAQs - User Guide

Automatically Rotating Your Secrets

  • Define and implement rotation with an AWS Lambda function
    • Creates a new version of the secret.
    • Stores the secret in Secrets Manager.
    • Configures the protected service to use the new version.
    • Verifies the new version.
    • Marks the new version as production ready.

Service Catalog (Core Service)

Overview

AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS. These IT services can include everything from virtual machine images, servers, software, and databases to complete multi-tier application architectures. AWS Service Catalog allows you to centrally manage commonly deployed IT services, and helps you achieve consistent governance and meet your compliance requirements, while enabling users to quickly deploy only the approved IT services they need.

  • Ensure compliance with corporate standards
  • Help employees quickly find and deploy approved IT services
  • Centrally manage IT service lifecycle
  • Connect with ITSM/ITOM software
  • Self-service for user
    • Integrates with self-service portals like ServiceNow
  • Users of Service Catalog only required IAM permissions for the product, but not the underlying services
  • On AWS: Service - FAQs - User Guide

Components

  • Admins define
    • Product
      • Defined in CloudFormation
      • Can be versioned
    • Portfolio
      • Collection of products
      • IAM permissions to govern access
  • Users choose
    • from product list
    • launches automatically

Step Functions

Overview

AWS Step Functions is a web service that enables you to coordinate the components of distributed applications and microservices using visual workflows. You build applications from individual components that each perform a discrete function, or task, allowing you to scale and change applications quickly.

Step Functions provides a reliable way to coordinate components and step through the functions of your application. Step Functions offers a graphical console to visualize the components of your application as a series of steps. It automatically triggers and tracks each step, and retries when there are errors, so your application executes in order and as expected, every time. Step Functions logs the state of each step, so when things go wrong, you can diagnose and debug problems quickly.

Step Functions manages the operations and underlying infrastructure for you to ensure your application is available at any scale.

States

. . .
Pass Passes its input to its output, without performing work .
Task Represents a single unit of work performed by a state machine Can Retry after error
Choice Adds branching logic .
Wait Delays the state machine from continuing for a specified time .
Succeed Stops an execution successfully .
Fail Stops the execution of the state machine and marks it as a failure .
Parallel Create parallel branches of execution Can Retry after error
Map Run a set of steps for each element of an input array .

Input and Output processing

  • InputPath
    • Selects which parts of the JSON input to pass to the task of the Task state
  • OutputPath
    • Filter the JSON output to further limit the information that's passed to the output
  • ResultPath
    • Selects what combination of the state input and the task result to pass to the output.
  • Parameters
    • Collection of key-value pairs that are passed as input

Error handling

  • By default, when a state reports an error, AWS Step Functions causes the execution to fail entirely.
  • Task and Parallel states can have a field named Retry, whose value must be an array of objects known as retriers.
  • An individual retrier represents a certain number of retries, usually at increasing time intervals.
    • ErrorEquals (Required)
    • IntervalSeconds (Optional)
    • MaxAttempts (Optional)
    • BackoffRate (Optional)

Best Practices

  • Use Timeouts to Avoid Stuck Executions
    • Specify a reasonable timeout when you create a task in your state machine
  • Use ARNs Instead of Passing Large Payloads
  • Avoid Reaching the History Quota
    • Hard quota of 25,000 entries in the execution history. To avoid reaching this quota for long-running executions, implement a pattern that uses an AWS Lambda function that can start a new execution of your state machine to split ongoing work across multiple workflow executions
  • Handle Lambda Service Exceptions
    • Lambda can occasionally experience transient service errors - proactively handle these exceptions in your state machine
  • Avoid Latency When Polling for Activity Tasks
  • Choosing Standard or Express Workflows
    • Choose Standard Workflows when you need long-running, durable, and auditable workflows,
    • Choose Express Workflows for high-volume, event processing workloads.

Systems Manager (Core Service)

Overview

AWS Systems Manager gives you visibility and control of your infrastructure on AWS. Systems Manager provides a unified user interface so you can view operational data from multiple AWS services and allows you to automate operational tasks across your AWS resources. With Systems Manager, you can group resources, like Amazon EC2 instances, Amazon S3 buckets, or Amazon RDS instances, by application, view operational data for monitoring and troubleshooting, and take action on your groups of resources. Systems Manager simplifies resource and application management, shortens the time to detect and resolve operational problems, and makes it easy to operate and manage your infrastructure securely at scale.

Group resources -> Visualize data -> Take action

  • Manage EC2 and on-prem instances at scale
    • On-prem requires generation of secret activation code/activation id
  • Get operational insights of infrastructure
  • Easily detect problems
  • Patching automation for enhanced compliance
  • Both Linux and Windows
  • Tightly integrated with CloudWatch, AWS Config
  • Free service
  • SSM Agent
    • Installed on instances
    • Need correct IAM permissions, then shows up on SSM dashboard
  • On AWS: Service - FAQs - User Guide

Components

Resources groups

  • Organize your AWS resources.
  • Make it easier to manage, monitor, and automate tasks on large numbers of resources at one time.
    • Define groups based on tags or on CloudFormation stacks
      • Same region only

Insights

  • Insights dashboards
    • Automatically aggregates and displays operational data for each resource group
  • Inventory
    • Discover and audit the software installed
  • Configuration Compliance
    • Scan your fleet of managed instances for patch compliance and configuration inconsistencies

Parameter store

  • Centralized store to manage your configuration data, whether plain-text data such as database strings or secrets such as passwords

Action & Change

  • Automation
    • Simplifies common maintenance and deployment tasks of EC2 instances and other AWS resources.
    • Build Automation workflows to configure and manage instances and AWS resources.
    • Create custom workflows or use pre-defined workflows maintained by AWS.
    • Receive notifications about Automation tasks and workflows by using Amazon CloudWatch Events.
    • Monitor Automation progress and execution details by using the Amazon EC2 or the AWS Systems Manager console.
    • Can integrate manual approval step
    • Complete list of tasks, unlike run command which is a one-off
    • E.g. create Golden AMi
  • Maintenance windows
    • Define a schedule for when to perform potentially disruptive actions on your instances
  • Change Calendar
    • Set up date and time ranges when actions you specify may or may not be performed in your AWS account

Instances & Nodes

  • Run command
    • Lets you remotely and securely manage the configuration of your managed instances
    • Commands are in document format
    • Can run on resource group, individually or tag-based
  • Session manager
    • Fully managed AWS Systems Manager capability that lets you manage your EC2 instances, on-premises instances, and virtual machines (VMs) through an interactive one-click browser-based shell or through the AWS CLI.
    • Session Manager provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys
  • Patch manager
    • Automates the process of patching managed instances with both security related and other types of updates
    • AWS predefined patch baselines per operating system
      • Can also define own patch baselines
        • Patch items in approved or rejected list, e.g. 'CVE-2020-1234567'
        • Can define own patch source
    • Define Maintenance Window when patches are possibly executed
    • Use AWS-RunPatchBaseline run command
    • Can also evaluate compliance without applying patches
  • State manager
    • Secure and scalable configuration management service that automates the process of keeping your Amazon EC2 and hybrid infrastructure in a state that you define
  • SSM Documents
    • JSON format
    • Different types:
      • Command
      • Automation
      • Policy
      • Session

Trusted Advisor (Core Service)

Overview

  • AWS Trusted Advisor is an online tool that provides you real time guidance to help you provision your resources following AWS best practices. Whether establishing new workflows, developing applications, or as part of ongoing improvement, take advantage of the recommendations provided by Trusted Advisor on a regular basis to help keep your solutions provisioned optimally.
  • Global service
  • Creates recommendations for
    • Cost optimization
    • Performance
    • Security
    • Fault tolerance
    • Service limits
  • Trusted Advisor check results are raised as CloudWatch Events
    • Automate by triggering Lambdas
    • Events are only available in us-east-1
  • Checks are refreshed on visits to the dashboard (max every 5 minutes)
    • Otherwise weekly
    • Can trigger refresh via API
  • On AWS: Service - FAQs - User Guide

X-Ray (Core Service)

Overview

AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture. With X-Ray, you can understand how your application and its underlying services are performing to identify and troubleshoot the root cause of performance issues and errors. X-Ray provides an end-to-end view of requests as they travel through your application, and shows a map of your application’s underlying components. You can use X-Ray to analyze both applications in development and in production, from simple three-tier applications to complex microservices applications consisting of thousands of services.

  • X-Ray demon would run on EC2-instances/Elastic Beanstalk instances/ECS
  • X-Ray SDK to send signals
  • X-Ray API collects information
    • Automation could base on regular polling of GetServiceGraph
  • X-Ray Console displays information in service map
  • On AWS: Service - FAQs - User Guide

--

Random Information from practice questions

Aurora

  • Can have up to 15 read replicas
  • Can have a Global Database, spawning across multiple regions
    • Available for both MySQL and PostgreSQL
  • Can have 'Reader Endpoint' - load-balances connections to available Aurora Replicas in an Aurora DB cluster

CloudFront

  • You must ensure that the certificate you wish to associate with your Alternate Domain Name is from a trusted CA, has a valid date and is formatted correctly. Wildcard certificates do work with Alternate Domain Names providing they match the main domain, and they also work with valid Third Party certificates. If all of these elements are correct, it may be that there was an internal CloudFront HTTP 500 being generated at the time of configuration, which should be transient and will resolve if you try again.

CloudWatch Metrics

  • aws cfn put-metric-data - Publishes metric data points to Amazon CloudWatch
  • Use EstimatedCharges metric to track your estimated AWS charges

CodeBuild

  • CODEBUILD_SOURCE_VERSION - For CodeCommit, it is the commit ID or branch name associated with the version of the source code to be built.
  • MSBuild container image is required for building .NET applications

CodeCommit

  • Value for local ssh-config needs to match SSH key id from IAM Security Credentials

CodePipeline

  • CodePipeline Pipeline Execution State Change. is the detail-type of the CloudWatch Event raised for pipeline failures
  • AWS DeviceFarm is an action provider

Cognito

  • Amazon Cognito provides authentication, authorization, and user management for your web and mobile apps. Your users can sign in directly with a user name and password, or through a third party such as Facebook, Amazon, Google or Apple.

Direct Connect

  • Direct Connect is the only way to access your AWS resources from a Data Center without traversing the internet

  • In order to encrypt a EBS snapshot, copy the unencrypted snapshot and tick the checkbox to encrypt the target
  • Amazon Data Lifecycle Manager (DLM) for EBS Snapshots provides a simple, automated way to back up data stored on Amazon EBS volumes. You can define backup and retention schedules for EBS snapshots by creating lifecycle policies based on tags.

  • In order to get access to the CPU sockets for billing purposes, you need to use EC2 Dedicated Hosts
  • To maximise networking performance, Jumbo frames (9001 MTU) allow more than 1500 bytes of data

Elastic Beanstalk

  • Dockerrun.aws.json v2 is the file to configure a multi-container Docker environment

Elastic Load Balancing

  • IPv6 is only supported by Application Load Balancers, not NLB, not Classic
  • Network Load Balancers do not use security groups. This is different from Classic Load Balancer or Application Load Balancer.

  • Adding the SHA256 to a docker image URL makes sure that ECS get the latest images. Otherwise, it might still get the previous :latest.

Fargate

  • If a container image requires many network connections (e.g. Websocket) it's better installed as multiple tasks across an ECS Cluster
    • One ENI per task
    • ECS: ENIs come from underlying instance

GitHub

  • The number of OAUTH tokens is limited and CodePipeline might stop working with older tokens

  • Create and configure an IAM SAML Identity Provider, create a role with a SAML Trusted Entity, Configure AD, Configure ADFS with Relay Party, Create Custom Claim Rules
  • Can store server certificates - only recommended for regions that don't support ACM
  • Access Advisor show last services accessed per OU

Kinesis Data Streams

  • When using KCL, make sure getRecords is not throwing unhandled exceptions
  • Ensure the maxRecords value for the GetRecords call isn't set below the default setting
  • When resharding, sometimes a small shard is left over
    • This occurs when the width of a shard is very small in size in relation to other shards in the stream. This is resolved by merging with any adjacent shard.

Personal Health Dashboard

  • The AWS_RISK_CREDENTIALS_EXPOSED is exposed by the Personal Health Dashboard service.
  • Integrates with CloudWatch Events, but cannot send notifications directly

  • EngineVersion - The version number of the database engine to use.

  • When encrypting at rest, SSE-S3 is more performant as SSE-KMS, as the latter gets throttled above 10,000 objects per seconds
  • The Amazon S3 notification feature enables you to receive notifications when certain events happen in your bucket.

Secrets Manager

  • Length of a secret - 65,536 bytes
  • You should ask the external party for a DB user with at least two credential sets or the ability to create new users yourself. Otherwise, you might encounter client sign-on failures. The risk is because of the time lag that can occur between the change of the actual password and - when using Secrets Manager - the change in the corresponding secret that tells the client which password to use.

Server Migration Service

  • The Server Migration Service replication job generates an AMI after the job is finished. However, it does not automatically launch EC2 instances.

  • You can use Amazon S3 and the Amazon SQS Extended Client Library for Java to manage Amazon SQS messages. This is especially useful for storing and consuming messages up to 2 GB in size.

Trusted Advisor

  • In CloudWatch Events, use check item refresh status to only observe some events

SSO

  • You can use the User Principal Name (UPN) or the DOMAIN\UserName format to authenticate with AD, but you can't use the UPN format if you have two-step verification and Context-aware verification enabled.
  • AWS Organisations and the AWS Managed Microsoft AD must be in the same account and the same region
  • AD Connector is a directory gateway with which you can redirect directory requests to your on-premises Microsoft Active Directory without caching any information in the cloud.