Overview
The observability of cloud resources can be a challenging beast to tame, especially with the rapidly expanding Azure platform. As engineers work to keep up with the pace of deployments, ensuring that workloads have sufficient monitoring and alerting can be a difficult task. After all, stakeholders expect reliability, security, and performance from cloud-based applications.
The Problem
Despite the years that have passed since the advent of DevOps and agile, there is still often a divide between development and operational teams, with different priorities and driving forces. Development teams want to get features out quickly, while operations teams prioritize stability. Monitoring often falls by the wayside, and it can be difficult to ensure that monitoring data is being sent to the right locations. It’s a common problem, but it’s not an insurmountable one.
Solving it
One potential solution is Azure Policy, and specifically the collection of policies that Microsoft has developed for Azure Monitor. These policies help to ensure that Azure resources have diagnostic settings and other monitoring-related settings enabled and configured correctly.
So, what are diagnostic settings? They are a feature of Azure Monitor that allow you to collect metrics and logs from various Azure resources and send them to different destinations, such as Log Analytics workspace, Event Hubs, or Storage accounts. Diagnostic settings can help you gain insight into the performance, availability, security, and functionality of your Azure resources.
Azure Policy is a service that enables you to create, assign, and manage policies that enforce compliance across your Azure environment. Azure Policy can help you automate the configuration and enforcement of diagnostic settings for your Azure resources, ensuring that you have consistent and reliable monitoring data available for analysis and alerting.
Azure Policy built-in definitions for Azure Monitor are predefined policy definitions that you can use to enable diagnostic settings for various Azure resource types. These policy definitions use the “deployIfNotExists” effect, which means that they will check if the target resource has diagnostic settings enabled and configured according to the policy parameters. If not, they will deploy a diagnostic setting resource to enable and configure it.
There are about 120 policies in the collection, including policies to install the Azure Monitor Agent on VMs and enabling resource logs on Public IP addresses. The “deployIfNotExists” effect on the policies will ensure that even if the infrastructure code for your resource deployment does not have diagnostic settings configured, Azure Policy will deploy it regardless.
Some Considerations
- Compliance Challenges: Azure Policy is a powerful tool for enforcing compliance across your Azure environment, but ensuring compliance with organizational policies, industry regulations, and best practices can be complex. Regularly reviewing and updating your policies to align with changing requirements is crucial to ensure ongoing compliance.
- Data Ingestion and Retention Costs: Enabling additional logging on your resources can provide valuable insights into the performance, reliability, and security of your workloads. However, it’s important to be mindful of the data ingestion and retention costs associated with logging. Consider setting appropriate retention periods, such as keeping logs for 3 months instead of 2 years for production, and even less for non-production workloads, to manage costs effectively.
- Log Noise and Alerting: Some applications may generate noisy logs that contain unnecessary data, leading to increased costs for data ingestion and retention. Evaluate the logs to determine if they need to be alerted on, and consider not ingesting unnecessary logs to save costs.
- Acting on Logs: It’s important to actively monitor and act on alert and warning logs generated by diagnostic settings. Addressing the issues identified in logs can not only improve the performance of your applications but also save costs on compute and data ingestion and retention.
- Staged Approach for Logging: When enabling additional logging, consider taking a staged approach if possible. Evaluate and monitor the logs being ingested to ensure they align with your monitoring requirements and adjust as needed to avoid unnecessary costs.
- Region Considerations: The region where you send your logs for analysis can impact your costs. Avoid sending logs to out-of-region log analytic workspaces unless necessary to save on transfer costs.
- Identity and Remediation: To enable the “deployIfNotExist” effect/action on policies, you need either a system-assigned or user-assigned identity to perform the remediation tasks. It’s recommended to use system-assigned identities where possible as they require less maintenance over time.
- Enterprise Policy as Code: While creating and assigning policies via the Azure portal may be suitable for small enterprises, for larger environments, consider implementing Enterprise Policy as Code for better management, consistency, and scalability.
- Review Default Policies: Default policies provided by Microsoft are a good starting point, but it’s important to review and assess their applicability to your specific environment to ensure they align with your monitoring and compliance requirements.
Conclusion
In summary, deploying these policies to your platform can help increase visibility into your cloud infrastructure and applications. By using Azure Policy to create diagnostic settings and the other monitoring related configurations, you can ensure that all your resources have consistent and accurate monitoring data, ultimately leading to improved performance, reliability, and security of your cloud infrastructure and applications.