Press "Enter" to skip to content

Author: Kevin Feasel

Parallel Loading of Tables in Power BI Dataset Refresh

Chris Webb hits the turbo button:

Do you have a a large dataset in Power BI Premium or Premium Per User? Do you have more than six tables that take a significant amount of time to refresh? If so, you may be able to speed up the performance of your dataset’s refresh by increasing the number of tables that are refreshed in parallel, using a feature that was released in August 2022 but which you may have missed.

Click through for that tip.

Comments closed

One Repo for Every Environment

Meagan Longoria explains an important part of source control repositories:

I’ve seen a few people start Azure Data Factory (ADF) projects assuming that we would have one source control repo per environment, meaning that you would attach a Git repo to Dev, and another Git repo to Test and another to Prod.

Microsoft recommends against this, saying:

Read on for the citation as well as the practical reason why we don’t want multiple repos. This is true not only for Azure Data Factory but for every development project. You have one repository with branches. Certain branches represent checkpoints where code goes out to a specific environment via use of a release tool (e.g., Azure DevOps release pipelines, GitHub actions, etc.).

Comments closed

Azure Synapse Analytics R Language Support

Ryan Majidimehr has a short list of updates for Azure Synapse Analytics but it includes a big one:

Azure Synapse Analytics provides built-in R support for Apache Spark. As part of this, data scientists can leverage Azure Synapse Analytics notebooks to write and run their R code. This also includes support for SparkR and SparklyR, which allows users to interact with Spark using familiar Spark or R interfaces. To learn more read the official how-to Use R for Apache Spark with Azure Synapse Analytics (Preview).

That it took this long for R support was a bit weird, but I’m glad it’s there now.

Comments closed

Appending Rows to a Pandas DataFrame

Matt Eland acquires some rows that fell off a truck:

Recently I was working on comparing the performance of different machine learning models and I wanted to add entries to a Pandas DataFrame as I evaluated each model. What I found was that adding new rows to a Pandas DataFrame was a little harder than I suspected and required some mild searching, so I wanted to preserve the two solutions I found here in case it helps someone else.

Read on for those two solutions, though as Matt points out, only one of them is a good solution.

Comments closed

Backups and Restores when a NAS Requires a Password

Jana Sattainathan needs to give the daily password:

Sometimes, you have a share (like Azure Data Box via SMB as was the case for me) that you can access only with a UserName and Password. This is fine as long as you are accessing it interactively by typing it in, but how about accessing it from SQL Server for the purposes of backing up and restoring?

This is where “NET USE” command comes in handy becomes necessary

Read on to see how that can help you out.

Comments closed

Automatic Partition Maintenance in Power BI Incremental Refresh

Shabnam Watson goes investigating:

In this post, I am going to look at automatic partition maintenance by Power BI service for datasets with Incremental Refresh and focus on what happens to the partitions as time goes by. To do this, I am going to set up a couple of sample datasets with different Incremental Refresh (IR) policies with and without the Hybrid option, schedule automatic refreshes from the Power BI Service, and record how their partitions change over time. As a result, this post is going to get updated as time goes on as it documents how the partitions evolve.

Read on to learn more about what Incremental Refresh does and how things have changed over time. This looks like a post to come back to a few times.

Comments closed

Testing Azure SQL DB Hyperscale Performance

Reitse Eskens continues a series on performance testing Azure SQL DB tiers:

So far, my blogs have been on the different Azure SQL DB offerings where there are differences between DTU and CPU based. But in general, the design is recognizable. With the hyperscale tier, many things change. There are still cores and memory of course, but the rest of the design is totally different. I won’t go into all the details, you’re better off reading them here [https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale?view=azuresql] and here [https://learn.microsoft.com/en-us/azure/azure-sql/database/hyperscale-architecture?view=azuresql] , but the main differences are the support of up to 100 TB of data in one database (all the other tiers max out at 40 TB), fast database restores based on file snapshots, rapid scale out and rapid scale up.

There are differences in testing this one versus the others, so buyer beware.

Comments closed

Row-Level Security against Power BI Shared Datasets

Teo Lachev combines two capabilities in Power BI:

In a typical engagement, I create an organizational semantic model(s) and “report packs”, such as Sales Report Pack, Inventory Report Pack, etc. These report packs are typically implemented as Power BI reports connected to the semantic model as a shared dataset using the Power BI Datasets connector. Reports sanctioned by IT are published to a dedicated workspace, such as Corporate BI. Departmental reports are deployed to their respective workspace, such as Sales, to enforce content-level security. Usually, the semantic model has row-level security (RLS) roles defined to enforce restricted access to data depending on the identity of the interactive user.

Read on to see how you can test out the results once you get it working.

Comments closed