Understanding Cloud Architect Technology Solutions

Design and Connectivity Patterns

https://docs.microsoft.com/Azure/architecture/patterns/
Partitioning workloads
- Modularize application to functional units.
  - Each module
    - Handles portion of application's overall functionality
    - Represents set of related concerns.
  - Why?
    - Easier to design both current & future iterations of your application.
    - Modules can also be tested & distributed and otherwise verified in isolation.
Load balancing
- Application traffic or load is distributed among various endpoints by using algorithms.
- Allows
  - Multiple instances of the website can be created
  - They can behave in a predictable manner
  - Flexibility to grow or shrink the number of instances in application without changing the expected behaviour
- Load balancing strategy considerations
  - Physical vs Virtual Load balancers
    - Use Virtual Load Balancers (hosted in VM's) if company requires a very specific configuration.
  - Load balancing algorithm
    - round robin => Selects next instance for each request based on a predetermined order that includes all of the instances.
    - random choice
  - Configurations
    - Affinity/stickiness: If subsequent requests from the same client machine should be routed to the same service instance.
    - Required when application has state.
Transient fault handling
- Leads to more resilient applications.
- Implemented in .NET lib's (entity framework, Azure SDK etc)
- Transient errors=> occur due to temporary interruptions in the service or to excess latency.
- Many are self-healing and can be resolved with retry policy
- Retry policy
  - Retry when a temporary failure occurs.
  - A break in the circuit => abort retries if it's a serious issue.
Queues
- Provides a degree of consistency regardless of the behaviour of the modules.
- Direct method invocation
  - Connection is severed on transient errors
  - Use 3rd party queue to persist the requests beyond a temporary failure.
    - Allows you to audit failing requests independently.
Retry pattern
- Cloud applications must be sensitive to transient faults.
  - E.g. loss of network connectivity, the temporary unavailability of a service, timeouts that arise when a service is busy.
- They're typically self-correcting, if the action that triggered a fault is repeated after a suitable delay, it's likely to be successful.
  - DB with too many concurrent requests can have throttling (fails until workload is eased). Fixes itself after some delay.
- Solution: Retry for temporarily fails.
  - Remote service => retry after short wait.
    - Fails again => Limit attempts to avoid brute forcing retry again until maximum tries are reached.
      - to spread requests from multiple instances of the application as evenly as possible.
Competing consumers pattern
- Sudden large number of requests may cause unpredictable workload.
- Single consumer => risk of being flooded, or messaging system being overloaded.
- Solution: asynchronous messaging with variable quantities of message producers and consumers
  - Business logic in the application is not blocked while the requests are being processed.
  - Handle fluctuating workloads => system can run multiple instances of the consumer service.
Cache-aside pattern
- Problem: Cached data consistency
  - A strategy is needed to ensure that the data is up-to-date & handle situations where the data in cache has become stale.
- Solution: read-through and write-through caching
  - Cache-aside => Effectively loads data into the cache on demand if it's not already available in the cache.
    - Not in cache? Fetch & add it to cache, modifications on cache=> write to data store.
Sharding pattern
- Problem: hosting large volumes of data in a traditional singe-instance store
- Some limitations
  - Storage space: Upgrading disks is not easy.
  - Computing resources: It's not possible to always increase more
  - Network bandwith: Network traffic might exceed
  - Geography: Reduce latency of data access for different across regions.
- Scaling up can postpone affects but only temporary solution
- Solution: partitioning data horizontally across many nodes
  - Divide data store into horizontal partitions, or shards.
  - Shard: Same schema but distinct subset of data.
  - Sharding can be in data access code, or storage system with transparent sharding
    - Abstracting physical location => High level of control over which shard contain which data.
      - Easier to migrate between shard without touching application logic.
      - Tradeoff => Additional data access overhead to determine the location of each data item as it's retrieved
  - For optimal performance & scalability
    - Split data in a way that's appropriate for the types of queries the application performs.
    - Sharding schema will exactly match requirements of every query.
    - E.g.:
      - In multitenant system => You lookup with tenant id + e.g. tenant's name, Tenant's name = sharding key

Hybrid Networking

Site-to-site connectivity (Site-to-site VPN)
- Between your on-premises site <=> VNet in Azure via IPsec tunnel.
- Resources on local network can communicate with resources on Azure VNet
  - No need for separate connection for each client computer in local network.
- Requires VPN device.
- E.g.:
  - IT Pros and Developer in-office have their own gateway and connect to Azure.
  - Q&A offshore team has its own gateway and connect to Azure
Point-to-site connectivity (Point-to-site VPN)
- Configured on each client computer that you want to connect to the VNet in Azure.
- No need for VPN device
  - Instead you use VPN client you install on each client computer.
  - Requires manually starting connection from client, can have auto reset.
Combining site-to-site and point-to-site connectivity
- Q&A offshore team connects via VPN gateway (site-to-site VPN)
- Developers & IT Pros at office connects via VPN gateway (site-to-site VPN)
- Developers working from home connect via direct VPN (point-to-site VPN)
Combining ExpressRoute and site-to-site connectivity
- Reasons
  - Multiple branch offices, it's costly to purchase peering for every location.
  - Multiple networks within the enterprise
    - Connect one to Azure using Express route for higher-risk traffic.
    - For lower-risk traffic, use site-to-site VPN
  - Use site-to-site VPN as a failover link if ExpressRoute connection fails.
Virtual network to virtual network connectivity (VNET to VNET)
- Utilizes Azure VPN gateways to connect VNets in Azure over IPSec/IKE tunnels.
- E.g.: you have following topology (topology=nodes connect to other network via links)
  - IT-pros/developers in office has VPN-to-VPN to Azure East Asia
  - Offshore QA team has VPN-to-VPN to Azure West US
  - You set VNet-to-VNet between Azure East Asia and Azure West US
    - Then both team can access Azure East Asia and Azure West US
Connecting across cloud providers
- For failover, backup or migration between providers.
- Amazon Web Services (AWS) =>
  - Create EC2 VM with Openswan (VPN software)
  - Create gateway on the Azure VNet side using static routing.
  - Use gateway IP from Azure to configure Openswan for tunnel connection

Storing in cloud

Durability of data

A transaction is set of operations.
- Seek to achieve some or all ACID properties.
  - Atomic
    - A transaction is executed only once; all work completes or none does.
    - Why?
      - Operations in a transaction often share common intent or depend on each other.
      - Performing only subset => intent can be missed.
  - Consistent
    - A transaction preserves the consistency of data.
      - Performed on consistent state and leads to consistent state.
      - Typically, developers are responsible for maintaining consistency.
  - Isolated
    - Concurrent transactions behave as if each were the only transaction running in the system.
    - Some applications reduce isolation level for better throughput
      - High isolation => limits number of concurrent transactions
  - Durable
    - A transaction must be recoverable.
    - It must be persisted if e.g. computer crashes.
      - Special logging solves this.
- In relational database systems (RDBMS) it's a single unit of work.
- All-or-none => If it fails, DB is rolled back, all modification are erased.

Caching

Caching aims to improve performance & scalability of a system.
It's done by temporarily copying frequently accessed data to a fast storage, close to application.
Most effective when
- Same data is repeatedly read.
- Original data store =>
  - Relatively static
  - Slow compared to cache's speed
  - Subject to significant level of contention
    - Contention in DB systems =>
      - multiple processes or instances competing for access to the same index or data block at the same time
  - It's far away & network latency cause access to be slow.
Distributed applications typically implement either or both when caching data:
- Private cache : Locally held on computer that's running application.
  - In-memory store: Accessed by single process.
    - Quick & affective, size is typically constrained to host machine.
  - Local file system
    - Slower than in-memory, but faster than retrieving across network.
    - Each application holds its own copy of the data.
  - Problem:
    - Snapshot of the original data at a point of past.
      - Different application instance can hold different versions.
- Shared cache : Common source which multiple processes/machines can access.
  - All instances see same view of data as opposed to in-memory.
  - It's highly scalable
    - Cache services uses cluster of servers and software for distribution.
    - Easy to scale by adding to / removing from a cluster.
  - Disadvantages:
    - Slower to access => Held locally to each application instance.
    - Implementing separate cache service => increases complexity.
Caching considerations
- When?
  - The more data you have, the larger number of users that need to access this data => minimum load on the original data store.
  - If original data store is unavailable, cache can be used.
- How to cache data effectively?
  - Determine the post appropriate data to cache
  - Cache it at the appropriate time.
    - Add data to the cache on demand when it's retrieved first time.
    - Populate in advance
      - Seeding: when the application start.
      - Not good for large cache as it can cause sudden high load.
- Manage data expiration.
  - Cached data becomes stale after a while.
  - Expire caches so they're removed, and retrieved on next read.
  - Set a default policy, many cache services you can set period for individual objects while storing them programmatically.
Redis Cache
- Recommended by Azure, replaces Azure Cache (deprecated).
- NoSQL key-value database.
  - Unique: Allows complex data structure for its keys.
- SKU's: Basic (single node), Standard (2 nodes + SLA)

Measuring throughput

Normalized units
- Relative performance guarantees by cloud vendors.
- your application uses 20 units, 40 unit will give you appr. double performance.
DTUs – Database throughput units (Azure SQL Database)
- Based on compute, storage and IO.
- DTU's for single databases, eDTUs for elastic pools.
- Fixed per pricing plans, e.g.: Basic = 5 DTU, Standard 2 = 50 DTU
RUs – Request unit processing per second (Azure Cosmos DB)
- Each operation incurs a request charge, which is expressed in Rus.
  - Single request unit (normalized) => 1 read of 1 KB document.
  - Create, replace, delete consumes more processing = more request units.

Structure of data

Polyglot persistence => solutions that uses mix of data store technologies.
Structured data stores
- Most vendors use SQL.
- Have RDMS (relational database management system)
  - Conforms to be ACID.
  - Supports schema-on-write
    - You define data structure, all read+write use same schema.
- Hard to scale out.
- g. Azure SQL Database, Azure Database for MySQL, Azure Database for Postgres
Unstructured / semi-structured data stores
- Doesn't use tabular schema of rows & columns.
- Can store as key/value pairs, JSON documents, or as a graph (edges + vertices)
- Have no relational model.
- Graph databases => Cosmos DB, Gremlin API
  - Optimized for exploring weighted relationships between entities.
  - Stores edges (entities) and nodes (relationship between enodes).
- Document databases => Azure Cosmos Db
- NoSQL => Most systems supports SQL compatible queries, but non-SQL DB's.
- Column family: HBase in HDInsights
  - Key-value pair, where key is mapped to a value that's a set of column.
- Massively parallel & distributed solutions for ingesting, storing, and analyzing data
  - SQL Data Warehouse
  - Azure Data Lake
  - Time series data stores => Time Series Insights
    - ptimized for queries over time-based sequences of data, indexed by datetime.
- Others: Object storage => Blob storage, Shared files => File storage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6. Understanding Cloud Architect Technology Solutions.md

6. Understanding Cloud Architect Technology Solutions.md

Understanding Cloud Architect Technology Solutions

Design and Connectivity Patterns

Hybrid Networking

Storing in cloud

Durability of data

Caching

Measuring throughput

Structure of data

Files

6. Understanding Cloud Architect Technology Solutions.md

Latest commit

History

6. Understanding Cloud Architect Technology Solutions.md

File metadata and controls

Understanding Cloud Architect Technology Solutions

Design and Connectivity Patterns

Hybrid Networking

Storing in cloud

Durability of data

Caching

Measuring throughput

Structure of data