Back to List

Mitigating the Risks of Realtime or Near-Realtime Data Processing

Matt Pluster Matt Pluster  |  
Dec 10, 2019
 
In this blog series, Matt Pluster, Director of Skyline’s Data Analytics Consulting Practice, explores data sources and processing options. For a full overview on this topic, check out the Realtime vs Near-Realtime webinar.
 
In previous blogs in this series, I’ve talked about the pros and cons of Realtime and Near-Realtime scenarios, and even shared what we in the Power BI world call “the Other Realtime” (which offers low latency data analysis via DirectQuery). Now, I want to dig into some best practices that lead to success whether you choose Realtime or Near-Realtime. 
 

Keys to Success with Realtime

 

Understanding Use Cases

It’s important to make sure you understand the use cases. If we are going to use Realtime, the data scenario we're describing should require Realtime. Is it really a Realtime use case? End users sometimes say, "I need data right away." Does that mean sub-second? Or does it mean they need data to refresh every 15 minutes? Understand that because it can make a big difference in your approach.
 

Metrics

Clear metrics goals are important because they help you understand the use case. Make sure you understand what the organization wants to measure from a metric standpoint.
 

Persist

Even though the strength of Realtime streaming is in its immediacy, it can still be important to persist the data. Store a copy of the data, especially if you think there could be current or potentially future analytical value for that data.
 

Technology

This entire conversation is about technology, but not all technologies are created equal or offer the same capabilities. Choose wisely. Ensure that whatever technology you choose is flexible and scalable so your solution can grow with the business.
 
While some of these Realtime factors (like knowing the use case and having clear metrics and goals) can carry over into other scenarios, Near-Realtime has a few unique keys to winning.
 

Keys to Success with Near-Realtime

 

Transactional Replication

In a Near-Realtime scenario, use transactional replication if possible. That helps move updated records into staging more frequently and more efficiently.
 

Incremental Loads

Using incremental loads is something we see being highly valuable when processing data. Many organizations will start out with a full truncate-and-load approach. However, if we use incremental loads, we can much more efficiently optimize getting data into the data model. That also lowers the latency.
 

Data Models

Make sure your Tabular model is well designed and sourced from a physical data model in SQL server. You don't want all your transformation taking place in Analysis Services. That's a very common mistake made by those new to designing Power BI or Tabular models because they're trying to do their ETL process inside of Analysis Services. The best practice is to perform your ETL in SQL server because SQL Server is optimized for that.
 

Refresh Rates

It’s also important to understand what your refresh rate is and what is acceptable to the business. Depending on data size, it's often possible to get end-to-end data model refresh down to the 5-15 minute range.
 

Direct Query

Finally, in Near-Realtime scenarios, you should use direct query to address those simple up-to-date reporting needs.
 

Challenges and Advantages of Low Latency Tabular Models

In the webinar I gave on this topic, one of the questions I received was, “What challenges have you seen in implementing very low latency Tabular models?” The biggest challenges are almost always upstream from the Tabular model itself. The lower the latency that is needed from a Tabular model processing perspective, the more important it becomes to set up your ETL processes to run in SQL Server, to have those processes be highly efficient, and to be almost entirely (if not entirely) automated.
 
The second thing you want to do is to be sure to build a well-formed physical data model in SQL server.  This includes adhering to Kimball data modeling best practices, especially aiming for a tightly architected star schema. From a processing perspective, if we force Tabular to do what would traditionally be considered an ETL-type process, Analysis Services doesn't perform well. It's inefficient in handling those transformations and the related processing. As stated above, look to have SQL Server or your Azure SQL DB instance handle that heavy transformation of the data.
 
If your data model schema in SQL Server closely matches the data model in your Tabular instance, then data model processing becomes very efficient. (We coach our clients to aim for 1:1 relationships from SQL Server physical model tables to Analysis Services Tabular model tables). This is how we can get down to that optimal 5-15 minute processing of an Analysis Services Tabular model. My team has helped clients from mid-market enterprises to Fortune 500 companies address this challenge by using that approach, and we have seen tremendous success with it.
 

Who’s the Winner Between Realtime and Near-Realtime?

If time is of the essence, then you want to use Realtime either via streaming or that low latency DirectQuery path. If you have a historical and/or flexible type of analysis need, then you want to go with a Near-Realtime data model.
 
In a situation where you have multiple use cases, it’s ideal to separate the different use cases and apply the appropriate model to each. For example, if you need to run a historical path, then you might want to run a Near-Realtime path for data that needs to be available in very short order.
 
In the next and last installment in this blog series, I’m going to dig into a few general topics related to Realtime and Near-Realtime – including datasets and cloud solutions.
 
Data Analytics

 

Love our Blogs?

Sign up to get notified of new Skyline posts.

 


Related Content


Blog Article
What is Microsoft’s Power Platform and How to Use It: The Guide
Skyline Technologies  |  
Jan 14, 2020
In this guide, Libby Fisette (Director of Skyline Modern Workplace team) and Marcus Radue (Data Analytics Engineer), dig into the functionality of the Microsoft Power Platform and how you can leverage this toolset to solve many business situations. From basics to key questions, you will find...
Blog Article
Realtime and Near-Realtime Data Sources and Data Integrity
Matt PlusterMatt Pluster  |  
Dec 17, 2019
In this blog series, Matt Pluster, Director of Skyline’s Data Analytics Consulting Practice, explores data sources and processing options. For a full overview on this topic, check out the Realtime vs Near-Realtime webinar.   In previous blogs in this series, I dug into advantages...
Blog Article
“The Other Realtime”: Low-Latency Data Processing via DirectQuery
Matt PlusterMatt Pluster  |  
Dec 03, 2019
In this blog series, Matt Pluster, Director of Skyline’s Data Analytics Consulting Practice, explores data sources and processing options. For a full overview on this topic, check out the Realtime vs Near-Realtime webinar.    So far in this blog series, we have talked...
Blog Article
Realtime vs Near-Realtime Data: Pros and Cons
Matt PlusterMatt Pluster  |  
Nov 26, 2019
In this blog series, Matt Pluster, Director of Skyline’s Data Analytics Consulting Practice, explores data sources and processing options. For a full overview on this topic, check out the Realtime vs Near-Realtime webinar.   In this blog series, we are looking at the matchup of...
Blog Article
Two Megatrends Driving the Coming Data Tsunami (Are You Ready?)
Tim MorrowTim Morrow  |  
Oct 29, 2019
Data is growing and proliferating at an unprecedented rate. According to a 2016 report from IBM, "90% of the data in the world today has been created in the last two years alone" - and that was three years ago. Imagine how much more data we have today. ...