Effective cost saving in Azure without moving to Platform-as-a-Service

Recently we were asked if we could help a company to reduce the costs of their infrastructure in Microsoft Azure. They were wanting to move rapidly, so working on a plan to re-architect their system to make use of the Platform-as-a-Service (PaaS) features that Azure provides wasn’t a viable option – we had to find another way.

Naturally the instant recommendations we could make would be to resize the servers in all non-Production environments so the compute charges would come down, however this wasn’t an option either as they used these environments to gauge performance of the system before rolling changes to Production – we had to find another way.

Using the Azure Automation feature to turn VMs on and off on a schedule

We noticed that they had non-Production environments (development, test, UAT and Pre-Production) that were not used all the time, however they were powered on in Azure all the time – this meant they were continually getting charged for compute resource. Most of the people who would be using these environments were based in the UK, so the core hours of operation were between 7 am and 10 pm GMT (accommodating all nightly processes).

Using this knowledge, and our knowledge of PowerShell and Azure automation (https://azure.microsoft.com/en-gb/services/automation), we wrote a custom run book that would turn off all servers in each environment at a given time, and then another run book that did the opposite – turn them back on again.

We wanted to use best practices when creating the code, so hard coding values was out, and parameters and tags were in: parameters were used to pass in a resource group, and tags were used to identify servers that were to be turned off. Tags were especially useful for doing this, as in some resource groups we couldn’t power down all the servers.

Below is some of the PowerShell used to turn the VMs off:


workflow Stop-ResGroup-VMs {
param (
  [Parameter(Mandatory=$true)][string]$ResourceGroupName            
)
  #The name of the Automation Credential Asset this runbook will use to authenticate to Azure.
  $CredentialAssetName = 'CredentialName'
  $Cred = Get-AutomationPSCredential -Name $CredentialAssetName
  $Account = Add-AzureAccount -Credential $Cred
  $AccountRM = Add-AzureRmAccount -Credential $Cred
  Select-AzureSubscription -SubscriptionName "MySubscription"
  #Get all the VMs you have in your Azure subscription
  $VMs = Get-AzureRmVM -ResourceGroupName "$ResourceGroupName"

  if(!$VMs) {
    Write-Output "No VMs were found in your subscription."
  } else {
      Foreach -parallel ($VM in $VMs) {
        if ($VM.Tags['server24x7'] -ne 'Yes') {
          Stop-AzureRmVM -ResourceGroupName "$ResourceGroupName" -Name $VM.Name -Force -ErrorAction SilentlyContinue
        }
      }
   }
} 

The ability to hide secret variables within the Automation blade was useful too, as we needed somewhere safe to store credentials that were to be used to connect to the subscription.

Once authored and pushed to Azure the run books had schedules setup as follows:

  • For development and test would turn servers off at 10 pm and on again at 7 am on weekdays, leaving them off all weekend.
  • For UAT would turn the servers off at midnight on the weekend, and back on again Monday at midnight.
  • For Pre-Production, would turn the servers off at 10 pm every day (and off at weekends), however they were configured not to start back up again. We decided to make turning it back an on-demand feature via TeamCity (their CI system). Turning it back off at 10 pm meant it was never online when not needed.
  • For the CI agents; These followed the same pattern as development and test servers, as we didn’t want the CI server to try and deploy changes to servers that were not switched on.

Alerts were set up so that if these scripts failed someone was alerted and could act on it – if they failed for several days the cost saving wouldn’t have been very good!

Using the above proved very effective. As an example, (using the Azure pricing calculator – https://azure.microsoft.com/en-gb/pricing/calculator/) if you had a standard Windows D12 VM without additional disks, leaving this on all month (744 hours) would cost around £322.17. If you used a schedule to turn it on and off again like the above, it would bring the cost down to £140.73 – well over a 50% saving. Imagine that over hundreds of servers and it mounts up to a hefty saving.

They were very happy about this as they could see the cost coming down straight away when they went to view billing details via the portal.

Using the Azure Advisor to guide us on servers that might be underutilised

A useful tool that has been recently released in Azure is the Advisor blade (https://docs.microsoft.com/en-us/azure/advisor/advisor-overview).

This is designed to look over the resources you have in a subscription and give you recommendations around areas such as underutilisation and cost, availability, performance and security. We decided this would be useful to run against the subscription, so we logged into the portal and switched the blade on – this registers the Advisor with the subscription, and kicks off the process straight away.

Azure Advisor

It took about 5 minutes in total for it to work its magic and then the results were ready.

azure advisor tool

One of the best features of the Advisor tool is that is continually works in the background like a virtual consultant looking at your resources. Recommendations are updated every hour so you can go in and search through the content to see what it says. If you choose not to act on a recommendation, then you can snooze it so it doesn’t appear under the active list. The blade is extremely easy to use and navigate to.

The first time it ran, it had identified several VMs that were either underutilised or not used at all, and the recommendation was to either resize, turn off or remove completely. Part of what is presented back is a potential cost saving by doing something with the VMs – this allowed us to very quickly generate a rough total of what could be saved.

After a quick check with the company’s IT director to discuss the highlighted servers, we found they were in fact not being used but had been provisioned for a project that wasn’t started. Having the cost on the screen showed them instantly what could be saved and the decision was made to turn the machines off for now.

The Advisor was a great tool that can be turned on and allowed to look over the resources in a subscription before generating feedback without a human having to click through numerous screens in the portal, or write 1 line of PowerShell code. Certainly a must if you have lots of departments who can access your subscription to create resources who might not be particularly good at cleaning up afterwards!

Conclusions

There are many other ways to go about cost savings in Azure, however if you have a limitation around what you can do then using automation and the Advisor can help.

The Advisor is a tool we’d especially recommend enabling to gain insight into how your resources are being used – not just for cost. We trialed a few other 3rd party tools that either only worked against an Enterprise subscription (which they didn’t have), or we found that they just plan didn’t work. The Azure Advisor was certainly the best of the bunch and the least painful to use.

One thing we noticed was it is far too easy to get carried away with the cloud, and without a careful eye on what’s going on you can end up with a rather large, and unexpected bill to pay at the end. Something nobody likes.

— Chris Taylor, DevOps Engineer, DevOpsGuys

Share this:

Leave a comment

Your email address will not be published. Required fields are marked *

*