Optimize Your Power BI Ecosystem with Dynamic Report Refreshes
Out-of-the-box scheduled refreshes can be unreliable
One of the fundamental features of the Power BI Service is the ability to schedule report refreshes to avoid manual intervention. For organizations using Pro licenses, a report can have up to 8 refreshes in a day, which can be set at half-hour increments. With Premium licensing, up to 48 refreshes can be scheduled in a day.
For organizations that build the majority of their reports sourced from a data warehouse – that data warehouse is often updated once per day, typically in the morning. As a report creator, you want to ensure that your end users are viewing the most current data possible. This requires a bit of coordination to know when the data warehouse load typically completes and making sure that your scheduled report refresh triggers subsequently.
This leaves you with two options:
- Establish a set time each day for the Power BI scheduled refresh to trigger, with enough buffer from when the data warehouse load typically finishes
- Create multiple scheduled refreshes throughout the day for a report to ensure that at least one of the refreshes captures the final data warehouse updates
The issue with the first scenario is that typically the data warehouse load time varies. Let’s say that the DW traditionally completes around 8:00am, so you set your report refresh for 9:00am. But today, the DW completed early, at 6:30am. That means users could have been viewing refreshed data as early as 6:30am, but instead they’ll have to wait until 9:00am when the report refreshes.
The even worse scenario is if there is a delay in the DW load. Let’s say there is network slowness or an ETL failure that causes the DW to finish at 10:00am. When your report refreshes at 9:00am, it’s still pulling in stale data.
The way to mitigate the risk of the scenario above is to employ the second method – multiple scheduled refreshes. Since we know the DW typically refreshes by 9:00am, we could schedule our report to refresh every hour, starting at 6:00am to make sure that no matter if the DW finishes early or late, we always have the freshest data.
The issue with the second method is that it puts unnecessary strain (and compute) on your DW. And this is just for one report – extrapolate that over potentially hundreds of reports in your organization, and it could become a problem.
So if the out-of-the-box scheduled refreshes aren’t ideal, what else can be done?
Leverage the Power BI REST API to trigger dynamic scheduled refreshes
The Power BI REST API contains a set of calls that allows for on-demand refreshes of semantic models tied to reports. You can use any programming language that supports REST calls to do semantic model refreshes – typically I see clients use PowerShell cmdlets to perform this action.
Regardless of the scripting method you choose, the script can be triggered as the last step of an ETL job/package. When creating the script, be sure to use a service account that has admin access to all of the datasets that will be triggered.
Depending on how many datasets/reports need to be refreshed, the process may need to be batched. You wouldn’t want to trigger potentially hundreds of report refreshes simultaneously – so grouping into batches of 5 or 10 avoids bogging down the data warehouse.
The core part of the API call looks something like this:
https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes
You will need to have the GUID from the workspace and also the GUID from the dataset (semantic model). If you click on the settings of any semantic model within the Power BI Service, both of those unique identifiers are located within the URL string.
When you finalize the script with all of the datasets that you’d like to refresh, the end result should look something like this (PowerShell):
Set it and forget it!
Instituting scripting logic to kick off the dataset refresh via REST API is a low-maintenance solution to making sure your end users are viewing the most recent data possible, while not putting unnecessary strain on your data warehouse.