Back to List

Data Lake vs Data Warehouse: Pros and Cons

Scott Hietpas Scott Hietpas  |  
Jun 18, 2019
In this blog series, Scott Hietpas, a principal consultant with Skyline Technologies’ data team, responds to some common questions on data warehouses and data lakes. For a full overview on this topic, check out the original Data Lake vs Data Warehouse webinar.
There's a lot of discussion around data lakes and data warehouses. When organizations are evaluating these two technologies, there is one big question: “Where should you store your data to do the type of analysis you need and get the most out of it?” While it may be tempting to choose only the newer data lake format, or the tried-and-true data warehouse, ultimately both systems offer both pros and cons. We are going to explore some of the big ones in this blog.

Pros of the Traditional Data Warehouse

While the traditional data warehouse has started to see some Big Data challenges (from social media and marketing tools) and the growing demands of data science, there are still a lot of benefits to this solution.


One of the core benefits of the data warehouse is that it's a well-established, proven solution. The SQL server stack is a widely available, mature toolset that has been around for a long time. If you need additional resources or a team internally to support your data storage system, it's easy to find resources that are familiar with those tools. Although new functionality can create issues, for the most part a data warehouse is well-proven and can run optimally with very little maintenance.


A data warehouse also delivers very good performance - partially due to its underlying structure. Since a data warehouse has a schema-on-write, SQL servers (or other data warehouses) already understand its underlying structure. Therefore, its query engine can understand how to work with that data to quickly return query results.


The usability of a data warehouse is due to the analytical model. Users may not be familiar with how to get data out of the source systems where the data originates. Often, those structures can be extremely complex and normalized. Hence the data warehouse is an opportunity to have a more denormalized model that makes it easier for users to understand how to slice and dice that data. The data warehouse is flexible in that it can work well on-premises or in the cloud. Whether you're using a SQL server stack on-prem, or exploring an Azure SQL database / data warehouse in the cloud, any of these data warehouse options can support this type of approach very well.

Cons of the Traditional Data Warehouse

While data warehouses have significant advantages, they also have some significant drawbacks in this era of Big Data and high data science demand. Some of the cons of the data warehouse are:

Storage Costs

Storage cost is one of the key modern drawbacks of a data warehouse. Database resources are more expensive than those of a data lake due to features. The budget-conscious company (that is, most companies) must ask if it needs to store large volumes of data in an Azure SQL database or data warehouse (which may have a higher storage cost than a data lake).


With a data warehouse, you have to wait for each business process component to be built to get value from that data. Although there's a lot of value in having an analytical model, it also takes time to have your development team support getting data into that model, typically through the development of ETL processes. If you have data that's not in your data warehouse today, and users want to report on it, then there can be additional latency to get that data into the warehouse.

Limited Exploration

Cost and warehouse model can restrict capturing data of unknown value, which can limit exploration. You might only pull certain amounts of data where you had known reporting requirements. There may be a whole other set of data that you're not bringing in that users might want to explore.

Not Designed for Big Data

Of course, the Big Data challenge is one of the major sticking points when it comes to the data warehouse. Its schema-on-write is not really optimized for that variety, velocity and volume of data. Disruptors like hybrid source systems (cloud and on-prem), and high data volume, variety and velocity coming from big data scenarios can limit the effectiveness of your data warehouse.

Pros of the Data Lake

The data lake solves some of the pros and cons of the traditional data warehouse, but it also offers drawbacks of its own. First, let’s look at its pros.

Volume, Variety and Velocity

The real strength of the data lake is that it does a good job addressing some of those Big Data challenges like volume, variety, and velocity. It’s infinitely scalable, handles structured or unstructured data, and is designed for rapid data ingestion. It can also support Internet of Things scenarios. Because it's schema-on-read, we don't have to understand the format of the data until we're ready to read it back out. We can write that data very quickly before we have that understanding and without risk of write errors.

Low Cost Storage

A data lake is also low cost (relatively speaking). That allows us to not worry as much about the types of data that we're storing. We may not know the analytical value around our data yet, but with data lake storage, we can inexpensively hang on to it for some future point where we might be able to find value for it.

Accessible Data

Landing the data in the data lake makes it easy for us to open copies of that data (or subsets of that data) to different user groups. Whether that's for self-service or data science, we can control access to that data.

Cons of the Data Lake

While these are some big positives, the data lake also has some significant cons.

Cloud Bias

There are on-premises data lake solutions (Hadoop is a very common one). However, installing a data lake solution on-prem can be much more complex, whereas spinning off a data lake in the cloud is very simple.  This may be considered a negative if it does not align with your infrastructure strategy.

Skillset Learning Curve

The data lake often comes with a new set of tools and services that need to be understood (and it’s a bit of a learning curve). That requires some additional investment, either from recruiting to get the right team members or doing internal professional development to understand those new tools.

Transition Period

When moving from a data warehouse to a data lake (or just adding a data lake to your existing system), there’s going to be a transition period. If you already have an existing data warehouse, how does the data lake fit into that? Do you rework some of that for the new needs and functionality, or not?

Investment in Best Practices and Processes

A data lake also requires significant investment not just from a skillset perspective, but also in understanding how that impacts your best practices. In addition to being able to do the work, you need to understand what your organizational standards will be around doing that work.

Not Optimized for Query Performance

Although we can very easily land data in the data lake, in order to query that data back out, the data lake doesn't have the same underlying query engine that a data warehouse does. Those queries may not perform to the level you need.

Which Option Is Right for You?

Often, we see that the data lake and the data warehouse really work better together. In my next blog, I’m going to walk through what a successful hybrid solution looks like.
Can’t wait for the next blog? Check out the Data Lake vs Data Warehouse webinar to explore this hot topic in full.
Business IntelligenceData Analytics


Love our Blogs?

Sign up to get notified of new Skyline posts.


Related Content

Blog Article
How to Find, Train, and Empower Citizen Data Scientists
Adam WidiAdam Widi  |  
Mar 09, 2021
About the author: Adam Widi is a Senior Solution Architect in Data Analytics with over 15 years of experience in Business Intelligence and Data Analytics. In his spare time, he enjoys dabbling in machine learning and data science tools and online courses.   Advertisements for artificial...
Blog Article
Power BI Tips for Star Schema and Dimensional Data Modeling
Marcus RadueMarcus Radue  |  
Feb 16, 2021
In this blog, Marcus Radue, Data Analytics Engineer, offers high-level guidance in the advantages of star schema and dimensional data modeling in Power BI reporting. For a full overview on this topic, watch the full video below.   Using a dimensional data model, especially when...
Blog Article
Using Tabular Editor’s Best Practice Analyzer to Apply Best Practices
Cory CundyCory Cundy  |  
Feb 02, 2021
In this blog, Scott Hietpas and Cory Cundy – two principal consultants in Skyline Technologies’ Data Analytics practice – explore the advantages of using Tabular Editor for tabular model development. For the full presentation and demo, check out their webinar: How to Divide and...
Blog Article
The Advantages of Using Tabular Editor for Tabular Model Development
Scott HietpasScott Hietpas  |  
Jan 19, 2021
In this blog, Scott Hietpas and Cory Cundy – two principal consultants in Skyline Technologies’ Data Analytics practice – explore the advantages of using Tabular Editor for tabular model development. For the full presentation and demo, check out their webinar: How to Divide and...
Blog Article
Power BI Governance: Refining and Maintaining Your Power BI Strategy
Marcus RadueMarcus Radue  |  
Nov 10, 2020
In this blog series, Marcus Radue, Data Analytics Engineer at Skyline Technologies, offers high-level guidance for implementing Power BI effectively in your organization. For a full overview on this topic, check out the original Power BI Governance A-Z Webinar.  Sharing content, report...