TL;DR
Integrating Snowflake into Microsoft Fabric can be done through Database Mirroring, Data Ingestion, or Storage Integration with Iceberg tables. Each offers unique benefits; choose based on your latency, cost, and governance needs.
Table of Contents
Why Integrating Snowflake with Microsoft Fabric is a Game-Changer
Given Snowflake’s prevalence as a data source for lakehouses and Power BI’s leadership in business intelligence, Microsoft Fabric offers seamless integration solutions between them. This powerful convergence simplifies the process of bringing your Snowflake data into Fabric for comprehensive analytics.
There are 3 options to integrate Snowflake into Microsoft Fabric:
- Database Mirroring
- Data Ingestion into Fabric from Snowflake
- Storage Integration through Iceberg Tables. This option also allows for hosting them in OneLake.
Each option brings unique benefits and trade-offs. Let’s now explore the details of each approach to help you make an informed decision.
Database Mirroring – Real-Time Snowflake Synchronisation in Fabric
Database Mirroring offers a powerful way to bring your Snowflake data directly into Fabric. This feature allows you to mirror entire Snowflake databases or select specific tables, creating a near real-time, synchronised copy within Fabric’s OneLake. The creation of database mirroring will require the account you use to have permissions to work with tables and create streams.
How Database Mirroring Works
Database Mirroring relies on Snowflake table streams to function. This means it sets up a Change Data Capture (CDC) mechanism that continuously tracks row-level changes in your Snowflake database. While incredibly powerful for keeping data in sync, this process can be quite resource-intensive and incurs Snowflake compute costs. As a result, I’d advise reserving this approach for critical near real-time use cases where immediate data synchronisation is absolutely essential.
It’s also highly recommended to limit the scope of mirroring to only the objects truly essential for your near real-time scenarios. To maintain control over expenditures, consistently run cost management reports in Snowflake. The practice of proactive monitoring will help you track the associated compute costs over time and ensure efficient resource utilisation.
Network and Security Considerations for Integrating Snowflake into Microsoft Fabric
To ensure secure data transfer when mirroring, especially across cloud providers, it’s crucial to deploy an on-premises data gateway. There are trade-off to hosting the data gateway in AWS or Azure if the Snowflake instance is in AWS. Please do your research on the network setup in your organisation including the availability of express route between AWS and Azure.
While it’s always ideal for your Snowflake instance and Fabric capacity to be in the same region for optimal network latency, this isn’t a strict requirement for the mirroring process to function
For instructions on setting up the mirroring, please refer to the tutorial. The steps are straight forward and simple to follow.
Unified Lakehouse Access
Once your data is mirrored, you can easily create a shortcut in a Lakehouse directly to that mirrored data. This is valuable because it allows you to seamlessly combine your replicated Snowflake data with existing Lakehouse tables within a common semantic model. This results in a powerful, unified view that’s perfect for comprehensive reporting and deeper insights across your entire data landscape.
Pros of Using Database Mirroring
- Near Real-time Synchronisation of Rows
- Unified experience
- Cost efficiency in Fabric
- Free storage cost for mirroring, 1TB for each capacity unit.
- No ingress cost
- Background Fabric compute used to replicate data into Fabric OneLake is free.
Cons
- Egress cost if Snowflake is hosted in AWS
- Snowflake-side costs
- Limitations on supported Snowflake objects and features
- Pausing and restarting replication will restart the replication from the beginning.
Batch Data Ingestion – Load Snowflake Data into Fabric with Pipelines
For ingesting Snowflake data, both Data Pipelines (Copy Data activity) and Dataflow Gen2 (query mechanism) are viable options. However, Data Pipelines are generally preferred for data engineering scenarios, offering a robust and suitable approach for data acquisition and ingestion.
Setting Up Secure and Efficient Data Ingestion
For secure data transfer, an On-Premises Data Gateway is highly recommended. When using Data Pipelines, the data copy activity to ingest from Snowflake is both efficient and cost-effective in terms of Snowflake compute. The connector itself is
straightforward to configure and delivers excellent performance; I’ve personally observed 6GB of data transferred in approximately 4 minutes.
Considerations for Ongoing Maintenance
The primary drawback, however, is the creation of multiple data copies and the ongoing need for data synchronisation between Snowflake and Microsoft Fabric via data engineering pipelines.
Pros of Batch Data Ingestion
- Native Connector for Fabric which supports Copy and Lookup activities.
- Snowflake Connector supports On-Premises Data Gateway
- Optimised for large datasets.
- Low cost, Snowflake compute wise
- Seamless integration in Pipelines with Notebooks for transformations post -ingestion.
Cons
- Duplication of data.
- Requires Data Engineering effort to keep data synchronised.
Storage Integration – Virtualised Access with Iceberg Tables
The storage integration capability strikes me as the most promising path forward. This method enables Snowflake to directly interact with external storage via an external volume, facilitating the creation of Iceberg tables. The beauty here is the flexibility: this storage can reside in Azure Blob Storage, ADLS, S3, or even directly within OneLake, moving us closer to a true “single copy” data architecture.
Figure 1: Process Flow for Setting up Shortcut to Iceberg Table
Creating and Accessing Iceberg Tables
Once the Iceberg table is established, you can create a shortcut in a Fabric Lakehouse that points directly to its storage location. Through the magic of virtual Delta Lake metadata (a “Delta log” maintained where the shortcut is created), this Iceberg table instantly becomes visible and fully usable as a Delta table within Fabric.
When setting up an external volume for Azure storage, a multi-tenant application is automatically generated. To enable write access to this external volume, the Service Principal associated with this application will require the Storage Blob Data Contributor role.
Setting Up Permissions and Interoperability
After an Iceberg table is created within the external volume, you can then establish a shortcut in Fabric that points directly to its containing storage. For this connection to function, the workspace identity (or the specific identity used to create the shortcut) must possess Storage Blob Data Reader access to that storage.
Beyond bringing Iceberg into Fabric, the inverse is also possible: you can virtualize Delta Lake tables residing in OneLake as Iceberg tables. This effectively makes your OneLake Delta data accessible within Snowflake as Iceberg tables, creating a powerful bidirectional interoperability.
Azure Storage and OneLake Integration Considerations
Adls/Blob Storage Integration
Some items to note:
- Shortcuts in Fabric are read-only.
- You cannot write data directly to the Iceberg table via the shortcut.
- Fabric uses the shortcut to virtualise access to the Iceberg table, enabling querying through various engines like T-SQL, Spark, or Power BI.
OneLake Integration
For the case of leveraging OneLake integration into Snowflake, caveats are:
- Snowflake instance must exist in Azure
- The Snowflake instance and Fabric capacity are to be in the same region if OneLake integration is to be implemented.
- Service Principal from Snowflake require sufficient storage permissions.
There is also a possibility of setting up Iceberg library in a Spark notebook and operate on the Iceberg tables. However I wouldn’t attempt write operations.
Pros of Storage Integration
- No Data Duplication or Movement
- Unified Data Access
- Flexibility and Scalability
Cons
- Dependency on Snowflake creation of Iceberg tables.
- Limitations on Supported Snowflake Object and Features
Comparing Options for Integrating Snowflake into Microsoft Fabric
Feature / Criteria | Database Mirroring | Data Ingestion (Pipelines / Dataflow Gen2) | Storage Integration (Iceberg Tables) |
Use Case | Near real-time data sync | Scheduled batch data loading | Virtualised access without duplication |
Latency | Near real-time | Depends on schedule (minutes to hours) | On-demand access |
Data Duplication | Yes (mirrored copy in Fabric OneLake) | Yes (copied into Lakehouse) | No (single copy accessed via shortcut) |
Snowflake Compute Cost | High (due to CDC & streams) | Low (efficient batch reads) | Depends on Iceberg table operations |
Fabric Storage Cost | Free (up to 1TB per CU) | Yes | Minimal (no data copy, just metadata) |
Fabric Compute Cost | Free for mirroring | Charged for pipeline execution | Charged for query/compute on Fabric side |
On-Premises Data Gateway Needed? | Yes (especially for cross-cloud) | Yes | Depends on storage location & security setup |
Effort to Set Up | Medium – stream setup, permissions, gateway | Low – GUI-based setup with native connectors | Medium – requires external volume & role configuration |
Integration with Lakehouse | Shortcuts supported | Native ingestion into Lakehouse | Shortcuts supported (via Delta Lake metadata) |
Best For | Real-time insights, critical operational data | ETL/ELT, batch reporting, large-scale movement | Unified architecture, minimal movement, data sharing |
Limitations | Pausing loses sync; limited Snowflake feature support | Requires ongoing sync logic; data duplication | Requires Iceberg tables in Snowflake; read-only access in Fabric |
Choosing the Right Snowflake–Fabric Integration Approach
In summary, bridging your Snowflake data with Microsoft Fabric offers flexible and powerful integration paths:
- Choose Database Mirroring for critical, near real-time synchronisation
- Leverage Data Pipelines for efficient batch ingestion
- Embrace Storage Integration with Iceberg Tables for true data virtualisation
The best approach will depend on your specific latency, cost, and data governance requirements.
Need help deciding which approach is right for your business? Get in touch with our team of experts who can help you architect the right approach for your data strategy.