As long as mankind has existed, data has been around and over the years the way we look, interpret and store it has evolved with the advancements in technology. New methods have also been developed in modern society to extract the most out of your untapped data. From a phone to an enterprise data warehouse, its data will go through different procedures to be extracted, stored, transformed, and consumed.
Let’s start by looking at how data storage and processing needs have evolved over the last few decades.
1960s – this was the beginning of an era of Database Management Systems (DBMS). Companies started shifting from punch cards to DBMS to store their data.
1970s – with DBMS gaining popularity and the increasing volume of data being generated by organisations, it was getting challenging to process this data before it was stored. Shortly after, the concepts of Extract, Transfer and Load (ETL) emerged. This allowed organisations to work with multiple heterogeneous data sources and store data in their DBMSs.
1980s – the limitations of Online Transactional Processing (OLTP) databases to run complex analytical queries gave rise to the concept of data warehousing. This meant that data could be ingested in huge batches and stored in a simplified manner to help the performance of complex analytical queries.
1990s – with the increased demand for ETL tools, they became widely available at an affordable price. This helped organisations leverage the power of Business Intelligence (BI) Analytics to drive critical business decisions.
2000s – companies were introduced to the world of cloud computing. New storage techniques like data lakes were introduced for storing unstructured and semi-structured data.
Let’s start by isolating ETL and ELT to understand them better.
The full form of ETL is Extract, Transform, and Load. ETL used to be quite popular traditionally, when the data being inserted had to conform with the relational data format. In ETL, data is stored in a staging area and goes through a set of transformations on a secondary server where it is ready to be loaded into the target database. This implies that the data must be fully processed before being inserted into the data warehouse. ETL is an ideal choice for computationally expensive transformations, traditional system architecture, or data that requires to undergo manipulation before entering the target database.
While ETL is commonly used to process structured data, ELT is a relatively new approach that opens avenues to process unstructured and semi-structured data. The full form of ELT is Extract, Load, and Transform. As the name suggests, the data is first loaded into the target database and later transformed on demand. With ELT, the data is ingested with some basic checks such as, data quality or removing duplication. Since the data is loaded into the target system directly, there’s no need for a staging area in this case. Given the nature of the ELT pattern, the data platform can maintain raw data for any ad-hoc historical reporting requests.
Regardless of the procedure one chooses, ETL and ELT both include cleaning and filtering of data. However, the decision between ETL or ELT will determine your data storage, analysis, and processing capabilities.
What’s the mindset shift after the introduction of ELT?
With ELT, data is inserted straight into the data lake or the data warehouse. It is beneficial in scenarios where data availability is more important than data quality as raw data is ingested without any transformations into the target databases.
Key differences between ETL and ELT
- ELT transforms data within the data warehouse whereas ETL transforms data on a separate processing server.
- ELT allows raw data to be stored in the data warehouse while ETL won’t.
- ELT will ensure that the data is made available at the earliest without worrying about any code failures in the data transformation logic.
- Performing ETL is generally easier given the years of intellectual property built around the tools floating in the market today. These tools help individuals to build intelligent data processing solutions with least amount of code possible. These tools also ensure that data is always encrypted in the process.
- ELT is a good choice when the data changes frequently in comparison to running complex ETL jobs
Examples of when to use what
- Data coming from multiple sources to be ingested into one system
- Dealing with traditional data and update from legacy systems
- Stock market data
- Medical sensor data
- Live traffic feed
How can you achieve this on Azure?
The Azure cloud provides us with modern and cutting-edge technologies to satisfy our ETL or ELT needs. Using the rich tools on Azure such as, Data Factory, Azure Synapse Analytics, Stream Analytics, and Databricks, you can build robust end-to-end data solutions to support your business. From data encryption to high availability these tools can cater to any custom business requirements.
How can Arinco help you do that? Contact us to find out more