Back to List

“The Other Realtime”: Low-Latency Data Processing via DirectQuery

Matt Pluster Matt Pluster  |  
Dec 03, 2019
 
In this blog series, Matt Pluster, Director of Skyline’s Data Analytics Consulting Practice, explores data sources and processing options. For a full overview on this topic, check out the Realtime vs Near-Realtime webinar
 
So far in this blog series, we have talked about Realtime and Near-Realtime data processing. However, in the Power BI world, there is a third option that we call “the Other Realtime.” It’s low latency data analysis via a DirectQuery data model. In this case, we're usually looking at data that is more transactional in nature.
 

What is an Example of “the Other Realtime?”

In the graphic below, I have a use-case example of what “the Other Realtime” might look like. The left-hand side shows orders that have been placed and need to be shipped off from a dock. This is based on a scenario that we encountered in the past. A client needed to monitor how quickly their trucks were being filled coming out of the warehouse to make sure they met certain time limit requirements. They paid a penalty if they didn't get those trucks off the dock in the allowed time.
 
other realtime data processing example
 
Our client needed to see how quickly they were getting orders loaded onto the truck. They had to make sure they were on target to get the truck full and away from the dock within a two-hour window.
 
In a case like this, we probably have two different systems that are feeding the data. We might turn on something like transactional replication so that orders from the ERP system and shipping information from the warehouse floor picking and scanning system can be quickly replicated into an Azure SQL DB Managed Instance. The Managed Instance can have sub-databases inside of it. One of the things we need to keep in mind with DirectQuery is that it can only run against a single database instance. In this use case, we solved that problem by creating cross-database queries that loaded orders and shipment data and joined it into a single unified Azure SQL DB instance. (In the image, that's the database at the bottom.)
 
Then we connected Analysis Services via a DirectQuery to that Azure SQL DB instance, and that provided very low latency processing. It's not to the second or sub-second, but it's less than a minute from the time the data is appearing in the shipping system (which then monitors the loading of the trucks) to the time when we see it in Power BI. We get a very quick refresh rate in Analysis Services and in Power BI.
 

Advantages of “the Other Realtime”

There are a couple of advantages to using this kind of DirectQuery approach instead of a pure Realtime or Near-Realtime setup.
 

Up-to-Date Data

With “other” Realtime there's no extra management overhead to maintain a separate copy of the data, like an in-memory cache of Analysis Services. We are not doing our typical Analysis Services processing. Therefore, changes to the underlying data source can be immediately reflected in queries against the data model.
 

Large Data Sets

Another advantage of this scenario is that data sets can be larger than the memory capacity of the Analysis Services server. In Analysis Services, we're limited to the amount of RAM on the server where it lives. In typical model processing, we have to be able to hold the entire data set in memory during that processing step. That's not true with DirectQuery, although there are some other data limitations we’ll cover at the end of this blog.
 

Provider-Side Query Acceleration

DirectQuery can also take advantage of provider-side query acceleration – meaning that SQL server has the notion of memory-optimized column indexes. We're basically pushing our processing from Analysis Services down to the SQL level, which is an interesting way of approaching query acceleration.
 

Query Efficiency Optimization

Finally, if the model does contain complex formulas that could potentially require multiple queries, Analysis Services performs optimization to ensure that the query plan executed against the back-end database will be as efficient as possible.
 

“Other Realtime” Factors

While “the Other Realtime” has many advantages, it has some drawbacks and limitations. Key considerations to be aware of when using this approach include:
 

Database Limit

The query can only run against a single database instance. As we saw above, there are ways to work around this.
 

Inefficient Modeling

Complex modeling may not be efficient within a direct query model. It’s rare to have complex modeling requirements in a Realtime scenario. If you find yourself looking at complex modeling requirements, you might want to ask yourself if the use case in question would be better addressed by a Near-Realtime solution.
 

Record Limit

We have a one million record row limit by default. Again, a DirectQuery Realtime use case with one million records would be rare. If you have too many records, consider truncating inactive (historical) data from the previous day(s).
 

Supported Data Sources

Be aware that the main data sources supported by “the Other Realtime,” or DirectQuery, are Microsoft SQL Server, Azure SQL DB, Azure Synapse Analytics (formerly known as Azure SQL Data Warehouse, and also known in the on-premise world as the Microsoft SQL Analytics Platform System – or APS), Oracle relational databases, and Teradata relational databases.
 
 
In my next blog post, I’m going to dig into some of the best ways to mitigate the risks of more traditional Realtime and Near-Realtime processing.
 
Data Analytics

 

Love our Blogs?

Sign up to get notified of new Skyline posts.

 


Related Content


Blog Article
Mitigating the Risks of Realtime or Near-Realtime Data Processing
Matt PlusterMatt Pluster  |  
Dec 10, 2019
In this blog series, Matt Pluster, Director of Skyline’s Data Analytics Consulting Practice, explores data sources and processing options. For a full overview on this topic, check out the Realtime vs Near-Realtime webinar.   In previous blogs in this series, I’ve talked about...
Blog Article
Realtime vs Near-Realtime Data: Pros and Cons
Matt PlusterMatt Pluster  |  
Nov 26, 2019
In this blog series, Matt Pluster, Director of Skyline’s Data Analytics Consulting Practice, explores data sources and processing options. For a full overview on this topic, check out the Realtime vs Near-Realtime webinar.   In this blog series, we are looking at the matchup of...
Blog Article
Two Megatrends Driving the Coming Data Tsunami (Are You Ready?)
Tim MorrowTim Morrow  |  
Oct 29, 2019
Data is growing and proliferating at an unprecedented rate. According to a 2016 report from IBM, "90% of the data in the world today has been created in the last two years alone" - and that was three years ago. Imagine how much more data we have today. ...
Blog Article
Keys to Winning When Transitioning Your Data to the Cloud
Scott HietpasScott Hietpas  |  
Oct 22, 2019
In this blog series, Scott Hietpas, a Principal Consultant with Skyline Technologies’ data team, explores the pros and cons of different data storage architecture. For a full overview on this topic, check out the original Cloud vs On-Premises Architecture Webinar.   In an On...
Blog Article
Cloud Data Storage Architecture: Pros and Cons
Scott HietpasScott Hietpas  |  
Oct 08, 2019
In this blog series, Scott Hietpas, a Principal Consultant with Skyline Technologies’ data team, explores the pros and cons of different data storage architecture. For a full overview on this topic, check out the original Cloud vs On-Premises Architecture Webinar.   Like the on...