Skip to content

Latest commit

 

History

History
244 lines (231 loc) · 13.3 KB

6. Understanding Cloud Architect Technology Solutions.md

File metadata and controls

244 lines (231 loc) · 13.3 KB

Understanding Cloud Architect Technology Solutions

Design and Connectivity Patterns

  • https://docs.microsoft.com/Azure/architecture/patterns/
  • Partitioning workloads
    • Modularize application to functional units.
      • Each module
        • Handles portion of application's overall functionality
        • Represents set of related concerns.
      • Why?
        • Easier to design both current & future iterations of your application.
        • Modules can also be tested & distributed and otherwise verified in isolation.
  • Load balancing
    • Application traffic or load is distributed among various endpoints by using algorithms.
    • Allows
      • Multiple instances of the website can be created
      • They can behave in a predictable manner
      • Flexibility to grow or shrink the number of instances in application without changing the expected behaviour
    • Load balancing strategy considerations
      • Physical vs Virtual Load balancers
        • Use Virtual Load Balancers (hosted in VM's) if company requires a very specific configuration.
      • Load balancing algorithm
        • round robin => Selects next instance for each request based on a predetermined order that includes all of the instances.
        • random choice
      • Configurations
        • Affinity/stickiness: If subsequent requests from the same client machine should be routed to the same service instance.
        • Required when application has state.
  • Transient fault handling
    • Leads to more resilient applications.
    • Implemented in .NET lib's (entity framework, Azure SDK etc)
    • Transient errors=> occur due to temporary interruptions in the service or to excess latency.
    • Many are self-healing and can be resolved with retry policy
    • Retry policy
      • Retry when a temporary failure occurs.
      • A break in the circuit => abort retries if it's a serious issue.
  • Queues
    • Provides a degree of consistency regardless of the behaviour of the modules.
    • Direct method invocation
      • Connection is severed on transient errors
      • Use 3rd party queue to persist the requests beyond a temporary failure.
        • Allows you to audit failing requests independently.
  • Retry pattern
    • Cloud applications must be sensitive to transient faults.
      • E.g. loss of network connectivity, the temporary unavailability of a service, timeouts that arise when a service is busy.
    • They're typically self-correcting, if the action that triggered a fault is repeated after a suitable delay, it's likely to be successful.
      • DB with too many concurrent requests can have throttling (fails until workload is eased). Fixes itself after some delay.
    • Solution: Retry for temporarily fails.
      • Remote service => retry after short wait.
        • Fails again => Limit attempts to avoid brute forcing retry again until maximum tries are reached.
          • to spread requests from multiple instances of the application as evenly as possible.
  • Competing consumers pattern
    • Sudden large number of requests may cause unpredictable workload.
    • Single consumer => risk of being flooded, or messaging system being overloaded.
    • Solution: asynchronous messaging with variable quantities of message producers and consumers
      • Business logic in the application is not blocked while the requests are being processed.
      • Handle fluctuating workloads => system can run multiple instances of the consumer service.
  • Cache-aside pattern
    • Problem: Cached data consistency
      • A strategy is needed to ensure that the data is up-to-date & handle situations where the data in cache has become stale.
    • Solution: read-through and write-through caching
      • Cache-aside => Effectively loads data into the cache on demand if it's not already available in the cache.
        • Not in cache? Fetch & add it to cache, modifications on cache=> write to data store.
  • Sharding pattern
    • Problem: hosting large volumes of data in a traditional singe-instance store
    • Some limitations
      • Storage space: Upgrading disks is not easy.
      • Computing resources: It's not possible to always increase more
      • Network bandwith: Network traffic might exceed
      • Geography: Reduce latency of data access for different across regions.
    • Scaling up can postpone affects but only temporary solution
    • Solution: partitioning data horizontally across many nodes
      • Divide data store into horizontal partitions, or shards.
      • Shard: Same schema but distinct subset of data.
      • Sharding can be in data access code, or storage system with transparent sharding
        • Abstracting physical location => High level of control over which shard contain which data.
          • Easier to migrate between shard without touching application logic.
          • Tradeoff => Additional data access overhead to determine the location of each data item as it's retrieved
      • For optimal performance & scalability
        • Split data in a way that's appropriate for the types of queries the application performs.
        • Sharding schema will exactly match requirements of every query.
        • E.g.:
          • In multitenant system => You lookup with tenant id + e.g. tenant's name, Tenant's name = sharding key

Hybrid Networking

  • Site-to-site connectivity (Site-to-site VPN)
    • Between your on-premises site <=> VNet in Azure via IPsec tunnel.
    • Resources on local network can communicate with resources on Azure VNet
      • No need for separate connection for each client computer in local network.
    • Requires VPN device.
    • E.g.:
      • IT Pros and Developer in-office have their own gateway and connect to Azure.
      • Q&A offshore team has its own gateway and connect to Azure
  • Point-to-site connectivity (Point-to-site VPN)
    • Configured on each client computer that you want to connect to the VNet in Azure.
    • No need for VPN device
      • Instead you use VPN client you install on each client computer.
      • Requires manually starting connection from client, can have auto reset.
  • Combining site-to-site and point-to-site connectivity
    • Q&A offshore team connects via VPN gateway (site-to-site VPN)
    • Developers & IT Pros at office connects via VPN gateway (site-to-site VPN)
    • Developers working from home connect via direct VPN (point-to-site VPN)
  • Combining ExpressRoute and site-to-site connectivity
    • Reasons
      • Multiple branch offices, it's costly to purchase peering for every location.
      • Multiple networks within the enterprise
        • Connect one to Azure using Express route for higher-risk traffic.
        • For lower-risk traffic, use site-to-site VPN
      • Use site-to-site VPN as a failover link if ExpressRoute connection fails.
  • Virtual network to virtual network connectivity (VNET to VNET)
    • Utilizes Azure VPN gateways to connect VNets in Azure over IPSec/IKE tunnels.
    • E.g.: you have following topology (topology=nodes connect to other network via links)
      • IT-pros/developers in office has VPN-to-VPN to Azure East Asia
      • Offshore QA team has VPN-to-VPN to Azure West US
      • You set VNet-to-VNet between Azure East Asia and Azure West US
        • Then both team can access Azure East Asia and Azure West US
  • Connecting across cloud providers
    • For failover, backup or migration between providers.
    • Amazon Web Services (AWS) =>
      • Create EC2 VM with Openswan (VPN software)
      • Create gateway on the Azure VNet side using static routing.
      • Use gateway IP from Azure to configure Openswan for tunnel connection

Storing in cloud

Durability of data

  • A transaction is set of operations.
    • Seek to achieve some or all ACID properties.
      • Atomic
        • A transaction is executed only once; all work completes or none does.
        • Why?
          • Operations in a transaction often share common intent or depend on each other.
          • Performing only subset => intent can be missed.
      • Consistent
        • A transaction preserves the consistency of data.
          • Performed on consistent state and leads to consistent state.
          • Typically, developers are responsible for maintaining consistency.
      • Isolated
        • Concurrent transactions behave as if each were the only transaction running in the system.
        • Some applications reduce isolation level for better throughput
          • High isolation => limits number of concurrent transactions
      • Durable
        • A transaction must be recoverable.
        • It must be persisted if e.g. computer crashes.
          • Special logging solves this.
    • In relational database systems (RDBMS) it's a single unit of work.
    • All-or-none => If it fails, DB is rolled back, all modification are erased.

Caching

  • Caching aims to improve performance & scalability of a system.
  • It's done by temporarily copying frequently accessed data to a fast storage, close to application.
  • Most effective when
    • Same data is repeatedly read.
    • Original data store =>
      • Relatively static
      • Slow compared to cache's speed
      • Subject to significant level of contention
        • Contention in DB systems =>
          • multiple processes or instances competing for access to the same index or data block at the same time
      • It's far away & network latency cause access to be slow.
  • Distributed applications typically implement either or both when caching data:
    • Private cache : Locally held on computer that's running application.
      • In-memory store: Accessed by single process.
        • Quick & affective, size is typically constrained to host machine.
      • Local file system
        • Slower than in-memory, but faster than retrieving across network.
        • Each application holds its own copy of the data.
      • Problem:
        • Snapshot of the original data at a point of past.
          • Different application instance can hold different versions.
    • Shared cache : Common source which multiple processes/machines can access.
      • All instances see same view of data as opposed to in-memory.
      • It's highly scalable
        • Cache services uses cluster of servers and software for distribution.
        • Easy to scale by adding to / removing from a cluster.
      • Disadvantages:
        • Slower to access => Held locally to each application instance.
        • Implementing separate cache service => increases complexity.
  • Caching considerations
    • When?
      • The more data you have, the larger number of users that need to access this data => minimum load on the original data store.
      • If original data store is unavailable, cache can be used.
    • How to cache data effectively?
      • Determine the post appropriate data to cache
      • Cache it at the appropriate time.
        • Add data to the cache on demand when it's retrieved first time.
        • Populate in advance
          • Seeding: when the application start.
          • Not good for large cache as it can cause sudden high load.
    • Manage data expiration.
      • Cached data becomes stale after a while.
      • Expire caches so they're removed, and retrieved on next read.
      • Set a default policy, many cache services you can set period for individual objects while storing them programmatically.
  • Redis Cache
    • Recommended by Azure, replaces Azure Cache (deprecated).
    • NoSQL key-value database.
      • Unique: Allows complex data structure for its keys.
    • SKU's: Basic (single node), Standard (2 nodes + SLA)

Measuring throughput

  • Normalized units
    • Relative performance guarantees by cloud vendors.
    • your application uses 20 units, 40 unit will give you appr. double performance.
  • DTUs – Database throughput units (Azure SQL Database)
    • Based on compute, storage and IO.
    • DTU's for single databases, eDTUs for elastic pools.
    • Fixed per pricing plans, e.g.: Basic = 5 DTU, Standard 2 = 50 DTU
  • RUs – Request unit processing per second (Azure Cosmos DB)
    • Each operation incurs a request charge, which is expressed in Rus.
      • Single request unit (normalized) => 1 read of 1 KB document.
      • Create, replace, delete consumes more processing = more request units.

Structure of data

  • Polyglot persistence => solutions that uses mix of data store technologies.
  • Structured data stores
    • Most vendors use SQL.
    • Have RDMS (relational database management system)
      • Conforms to be ACID.
      • Supports schema-on-write
        • You define data structure, all read+write use same schema.
    • Hard to scale out.
    • g. Azure SQL Database, Azure Database for MySQL, Azure Database for Postgres
  • Unstructured / semi-structured data stores
    • Doesn't use tabular schema of rows & columns.
    • Can store as key/value pairs, JSON documents, or as a graph (edges + vertices)
    • Have no relational model.
    • Graph databases => Cosmos DB, Gremlin API
      • Optimized for exploring weighted relationships between entities.
      • Stores edges (entities) and nodes (relationship between enodes).
    • Document databases => Azure Cosmos Db
    • NoSQL => Most systems supports SQL compatible queries, but non-SQL DB's.
    • Column family: HBase in HDInsights
      • Key-value pair, where key is mapped to a value that's a set of column.
    • Massively parallel & distributed solutions for ingesting, storing, and analyzing data
      • SQL Data Warehouse
      • Azure Data Lake
      • Time series data stores => Time Series Insights
        • ptimized for queries over time-based sequences of data, indexed by datetime.
    • Others: Object storage => Blob storage, Shared files => File storage