Skip to content

Latest commit

 

History

History
70 lines (39 loc) · 7.67 KB

File metadata and controls

70 lines (39 loc) · 7.67 KB

Well Architected considerations

If you are considering using Azure Durable functions in your solution, here are some of the considerations you can us as input to make your solution 'well architected' meaning not only considering technical components, but also cost effective, convenient for operations, performant, reliable and secure.

More information on the Well-Architected framework here

Cost optimization

  • Consider using different SKUs for Dev/Test/Prod deployment of your Azure function 'hosting' options. Consumption is probably very cost effective for non-production like environments, for production you probably want to look at whether there are non-functional requirements that will limit the options. Like network integrations, maximumem execution timeouts, scale limits and perhaps things like whether you require a managed identity for your function app or not. Even better: clean up non-prod environments after you stop using them and have infra-as-code deploy them as soon as you start using them again.

  • Ensure you've not scaled out beyond what you need. Especially the Elastic Premium plan seemed to perform well for me with regards to scaling. Out-of-the-box it scaled to 20 instances within 10 minutes, and back down to 1 when my load test stopped.

  • Same considerations with regards to SKU selection go for the Azure Service Bus selection. For the sequential processing you need Standard or Premium SKUs but there's not need to have that non-production.

Operational Excellence

  • Instrument your functions with Application Insights. There is no excuse not to be able to have real-time insight in how your function is doing. Portal deployment will help you configure this, the Azure CLI also makes this very easy. I typically send application insights logs to the same workspace as my infrastructure logs - for potential future correlation.

  • Use the TelementryClient inside your code to have more options to write to Application Insights. Things like TrackDependency, TrackEvent are very powerful to decrease time to figure out what's going on in case of a problem.

  • PRO TIP: Related to Service Bus examples - consider using correlation to correlate telemetry over different executions triggered by the Azure Service Bus. Especially in an event-driven, loosly coupled system, it's hard (and can be very powerful) to be able to correlate all kinds of executions happening in different systems, triggered by an event.

az monitor log-analytics workspace create --resource-group "${prefix}-${project}-${postfix}" --workspace-name "${project}-logs-${postfix}"
az monitor app-insights component create --resource-group "${prefix}-${project}-${postfix}" --location westeurope --app "${project}-appinsights-${postfix}" --kind web --workspace "${project}-logs-${postfix}"
  • Ensure your infrastructure is sending logs to the beforementioned log analytics workspace. This means configuring the 'diagnostics settings'. E.g.
az monitor diagnostic-settings create --resource-group "${prefix}-${project}-${postfix}" --name SendToLogAnalytics --resource "${project}-func-${postfix}" --resource-type Microsoft.Web/sites --logs '[{"category":"FunctionAppLogs","Enabled":true}]' --metrics '[{"category":"AllMetrics","Enabled":true}]' --workspace "${project}-logs-${postfix}"
  • Create a dashboard to show the health of your function. Using the Application Dashboard feature of Application Insights is a great starting point.

  • Deploy Azure Alerts to notify you when your application is not healthy and so you can respond appropriately.

  • Think of a good naming and tagging strategy before you get started so that problematic resources can be identified quickly. For inspiration, start here

  • Ensure your infrastructure as well as your configuration is 'as code' and you can deploy programmatically. The potential tiny overhead of longer deployment time does not outweigh the benefits from having everything-as-code and deployed by a non-human.

Performance Efficiency

  • Especially with functions, this means looking into the characteristics of the service plan. Again a reference here to the Azure Functions hosting options.

Reliability

Reliability considerations is typically a tradeoff between money and uptime SLA. Given enough money - you can deploy so many redundant infra components that you can achieve a very high theoretical uptime SLA.

  • At the infrastructure or hosting level, look at the specifics of your hosting plan. Have a look at their SLAs. The consumption plan is at 99.95 - for super business critical apps, consider multiple App Service Environments of multiple Availabiltit Zones. This prevent your application from going down in case of a zonal failure.

  • In case you want to prepare for a regional outage, you're going to have to deploy multi-region and use services like Traffic Manager or Azure Frontdoor to failover in case of a failure.

At the application level:

  • At the function level, I would urge you to get a real good understanding of how durable functions work and their internal queueing mechanisms. A real good starting point is this video by Jeff Hollan. This will make you understand that once an orchestration is triggered, it's in fact queued on Azure Storage. So when something happens during execution (an exception, machine restart) it will retry to finish the orchestration. The same goes for the orchestrator calling activity functions! Do note the impact of this - you want to make sure things are idempotent as they may be subject to a retry! Also consider APIs that you may be calling or consuming here.

Security

Read this comprehensive guide on 'Securing Azure Functions'. My top picks: