The below approach substantially reduces the complexity when using private endpoints with Databricks implementations. If you’ve found the official Databricks documentation confusing when considering PrivateLink within your Databricks environment, this blog post is for you.
Introduction
Azure Databricks is a powerful analytics platform that enables organisations to process large volumes of data quickly and efficiently. However, deploying Azure Databricks infrastructure within a private network environment can be complex. This blog post aims to simplify the deployment process by using Azure Private Endpoints and Private DNS Zones with Internet Fallback. By following these recommendations, you can streamline your deployment process, reduce operational overhead, and ensure a more robust and scalable data analytics environment.
One of the most critical aspects of this process is understanding how users authenticate to the Databricks workspace. Central to this is the browser_authentication endpoint (also known as the web_auth endpoint), which integrates directly with Microsoft Entra ID to facilitate secure and interactive user authentication via Single Sign-on (SSO).
‘Simplifying’ the Simplified Deployment Method
As per best practice, most production environments leverage Azure Private Endpoints. The current Databricks documentation states you must deploy one dedicated browser_authentication endpoint per Azure region, if you are deploying multiple Databricks workspaces such as Development, Staging and Production environments.
This endpoint is then registered within your Databricks Private DNS Zone, alongside the databricks_ui_api sub-resource, with both sub-resources residing under a Private DNS Zone called privatelink.azuredatabricks.net.
The official documentation has a bolded “DO NOT DELETE!” warning attached to the dedicated authentication workspace. I find it confusing why this seemingly standalone workspace is required, and if you really need it. As it turns out, you don’t, if you explore a new feature that Microsoft has made available on Private DNS zones, earlier this year.
For clarity, I’m referring to the highlighted red items below, which you can find in the Enable Azure Private Link as a simplified deployment – Azure Databricks | Microsoft Learn documentation.
Introducing Private DNS Zones Internet Fallback
By enabling your privatelink.azuredatabricks.net zone with Internet Fallback, this allows a DNS request to australiaeast.pl-auth.azuredatabricks.net to fallback directly to the public endpoint for australiaeast.pl-auth.azuredatabricks.net to facilitate all browser authentication requests for your Databricks workspaces within the Australia East region. This eliminates the need to deploy a dedicated private endpoint for any web authentication requests when logging into your workspace.
E.g.
Internet Fallback on your Private DNS zone, privatelink.azuredatabricks.net, is enabled on the virtual link configuration.
You can find this setting located within the Virtual Network Link settings of your Private DNS Zone. E.g.
The official Azure Databricks documentation that describes Azure Private Link within a simplified deployment doesn’t describe this new fallback feature. By enabling this fallback feature on your Databricks private DNS zone, you eliminate the need to host a dedicated authentication workspace, per Azure region. This reduces operational overhead, removes the need for an additional private endpoint for web_auth and sends your SSO request directly to Entra ID, which processes your authentication request when logging into your Databricks workspace.
Why using a Private Endpoint for Web Authentication doesn’t add much value
Although the Databricks UI/API endpoint remains private through its dedicated Private Endpoint, user browsers must still connect to Entra ID over the internet to obtain authentication tokens. Complete removal of public internet exposure is not possible because Entra ID endpoints are inherently public, as part of Microsoft’s global services. End users regularly access these endpoints for SSO authentication across various applications, so integrating Databricks does not significantly alter the overall risk profile.
Conclusion
By enabling Private DNS Internet fallback for Databricks, this streamlines and enhances your deployment process for Azure Databricks. This is particularly relevant if you plan to deploy multiple Databricks Workspaces within your chosen Azure region and want to remove the dependency of having a dedicated web authentication workspace.