Enterprise Policy As Code: Our approach to deploying Azure policy at scale.

image cedit https://www.bing.com/images/search?view=detailV2&ccid=RGK8rBQy&id=989D9BF1F6918C66C87F92590B3DD1564D8D58F2&thid=OIP.RGK8rBQywHcbzSc9BqXUYQHaD4&mediaurl=https%3a%2f%2fazure.microsoft.com%2fsvghandler%2fazure-policy%3fwidth%3d600%26height%3d315&exph=315&expw=600&q=Azure+Policy+Icon&simid=608009185947823397&FORM=IRPRST&ck=2652614ABB1F7F9CFAE767265DC7F992&selectedIndex=0


Introduction

Azure policy is a powerful tool in the Cloud Administrator / Developer’s tool belt. For those of you familiar with Active Directory’s Group Policy it is somewhat similar to that, only Azure Policy governs all things on Azure Subscriptions and not only has the ability to audit certain conditions but also allows you to add and modify resources based on certain conditions and can also do full-blown deployments using Azure Resource Manager. Azure Policy provides an efficient / no-cost way for you to enforce your architectural specifications on Azure resources that are deployed to the platform.

The Problem

Azure policy is easily configurable using the Policy blade in Azure Portal and works really well if you are starting out small with a limited amount of users accessing, deploying and configuring resources on the platform.

There are however some challenges that are introduced when you have multiple teams with differing objectives and priorities deploying across multiple subscriptions and is then when that easy assignment of policies using the portal can turn against you. 

Look at it this way, Azure policies for the most part aim to enforce a certain standard on the platform and sometimes that standard might interfere with product development. This now starts that classic old tug-of-war between Developers and Operational stakeholders. 

If the development team had the right permissions they would be able to remove the policy assignments which means that the “standard” that you are trying to enforce is now no longer enforced and because the change to remove the policy assignment was done manually using the portal there is no change record on why it was removed and then months down the line when it is discovered it is too late and that technical debt needs to be cleaned up.

Some of the other problems that occur when policies are not managed well:

  1. Configuration drift
  2. Lack of documentation to say how and why policies exist and how they align to governance and architectural specifications.
  3. Security incidents because of policies that were removed and never assigned again.
  4. The operational overhead that is caused if defined architectural specifications were not followed because the policy that enforced that specification might have been removed.
  5. Insecure and non consistant approach to developing and deploying policies. Which introduces business operations risks. 
  6. The tension between operational and development stakeholders.

This, however can all be avoided.


The Solution

Enterprise Policy As Code, or EPAC for short provides a mechanism to codify your policies and assignments and store it in source control. It also provides a mechanism to deploy the policies and make assignments at any scope you choose.  It has a pre-defined development lifecycle and workflow defined via Azure DevOps Pipeline that is part of the out-of-the-box setup.

The EPAC solution has been developed by Microsoft over the course of a few years and is an elegant solution to manage your Azure Policies at scale. Unlike managing Policy from the console, EPAC allows you to use a “plan” command that allows you to see what will change on the platform before it makes the changes.

Azure management groups are used to define a separation between policy development and deployment to production. Policies can then be assigned at these scopes and will be inherited by all the scopes below it, and might look something like the following hierarchal structure.

As you can see the EPAC Development Scope is separate from the Organization Platform Management Group. Policies can be applied a the EPAC Development Management Group scope without affecting the resources in the Organization Platform Management Group.  


Some of the key benefits

  1. Provides a consistent automated approach to developing and deploying policies to the Azure platform and avoiding configuration drift.
  2. Policies and assignments are version controlled and the automated deployment provides traceability on how it got there. 
  3. Safer policy development practices by ensuring that policies with modify and deploy “if not exist” actions are validated before they are deployed to the production scope.
  4. The solution provides a flexible approach and policy development can either be centralised or distributed, as it allows multiple teams to use it to deploy at multiple scopes without affecting the policies that have been deployed already. Thus, giving developers, architects, security and other operational stakeholders the ability to layer requirements at the same scope without the need to share a code base. 
  5. Policies and assignments can be authored in the Azure portal and then the code for those assignments and policy definitions can be moved into the codebase by using an export script. This means that teams less familiar with code can still use the solution and gives them a way to build their skills and familiarity with automated deployment and infrastructure as code concepts. 
  6. Exclusions can also be managed programmatically and the nature of source control means that there is traceability back to a work item and reasoning why an exclusion was made.
  7. The solutions follow a least privilege approach to deployment. and has been developed in partnership with the Security & Compliance for Cloud Infrastructure (S&C4CI)
  8. Existing implementations of EPAC can easily be upgraded to the latest version of the solution without introducing risk. 
  9. As the EPAC solution is open source you are able to correct any issues found in your own environment and then create a pull request back to the EPAC codebase to allow the fix to be deployed to other users of EPAC.
  10. Exemptions are easier to implement and provide persistence and a breadcrumb trail via git back to who made the exemption and why it was made.
  11. The solution already has a large community developing and supporting it. 
  12. Provides an industry-standard approach to managing policy at scale, which increases supportability, reliability and performance.
  13. Reduces the time it takes to upskill personnel.


The workflow

The Workflow is really easy and follows Github Flow branching strategy and works something like this:

  • A policy developer (this can be a developer or operations engineer) clones the repository which contains the policies’ definitions, policy set definitions and the policy assignments, they then create a new branch where the new policies will be defined.
  • They then define the new policies’ definitions and assignments. This can be done by creating the policies and assignments in the portal and then using a built-in script to export them to include in the codebase. 
  • Once defined the developer will deploy the policies and assignments to the development scope to ensure that the configuration aligns with the requirements.
  • When they are satisfied with the changes, they will push their code to the remote origin which triggers a pipeline that will deploy the changes to the EPAC Development Management Group scope. This deployment tests that the policies can be deployed to a scope successfully by the pipelines and provides other developers with a chance to review the deployment and if the desired requirement has been met.
  • If the policy requirements have been met then the code is merged into the main branch via a pull request.
  • This merge action triggers the second part of the pipeline that will perform a “plan” or a “what if” at the Organization Platform Management Group scope.  The pipeline run will then wait for an approval step to be completed. This will give the developer and the peer reviewer an opportunity to review the actual changes that will be implemented in the production scope.
  • Once all the stakeholders defined in the approval step are happy with the changes the pipeline proceeds to deploy the changes at the Organization Platform Management Group scope.
  • Once deployed there is a certainty that the policies assigned on the platform exist as code in the repository and drift in policy is avoided by regularly deploying by using the pipeline.
 

Why Arinco chose to use EPAC for our customer engagements

  1. Deploying EPAC gives our customer the ability to manage Policy at scale and allows them to align to a solution that is considered industry best practice.
  2. Because Azure Policy is something that is familiar to operational stakeholders EPAC provides a way for teams that do not have a lot of experience with the cloud to safely learn how to deploy policies into their cloud platform whilst up-skilling and gaining familiarity with automated deployments using Azure DevOps pipelines and some coding practices.
  3. Because EPAC is an open source method other vendors are likely to also use it and have familiarity with it and therefore it should be easier to get support for it from your chosen vendor. It provides continuity for teams down the track and an easy way to hand over platform governance to other teams. 
  4. We use it to standardize the way policies are developed and deployed to ensure that our operational teams and the customers’ operational teams are able to work together in a more collaborative way to ensure platform governance and specifications are enforced and have a well-defined process to change it if needed. 


Conclusion

Microsoft and the community members that contributed to the development of the EPAC solution have done a great job at providing a consistent and reliable way to manage Azure Policy at scale and should in my mind become the standard way of managing Azure Policies for all organizations.

We have found that organizations that use EPAC  greatly reduce the risk of policy drift and ensure that their Azure resources are more likely to be in compliance with their architectural and governance standards which in turn reduces operational risks and costs in the business.

 

Read more recent blogs

Get started on the right path to cloud success today. Our Crew are standing by to answer your questions and get you up and running.