Press "Enter" to skip to content

Day: June 2, 2020

Fun with Randomness

Holger von Jouanne-Diedrich takes us through some random thoughts:

Many a student thinks that the first pic was created by some underlying pattern (because of its points clumping together in some areas while leaving others empty) and that the second one is “more” random. The truth is that technically both are not random (but only pseudo-random) but the first resembles “true” randomness more closely while the second is a low-discrepancy sequence.

While coming to the point of pseudo-randomness in a moment “true” randomness may appear to have a tendency to occur in clusters or clumps (technically called Poisson clumping). This is the effect seen (or shall I say heard) in the iPod shuffling function. Apple changed it to a more regular behaviour (in the spirit of the second picture)… which was then perceived to be more random (as with my students)!

Read the whole thing.

Comments closed

Dynamic Partition Pruning in Apache Spark 3.0

Anjali Sharma walks us through a nice improvement in Spark SQL coming with Apache Spark 3.0:

Partition pruning in Spark is a performance optimization that limits the number of files and partitions that Spark reads when querying. After partitioning the data, queries that match certain partition filter criteria improve performance by allowing Spark to only read a subset of the directories and files. When partition filters are present, the catalyst optimizer pushes down the partition filters. The scan reads only the directories that match the partition filters, thus reducing disk I/O.

However, in reality data engineers don’t just execute a single query, or single filter in their queries, and the common case is that they actually have dimensional tables, small tables that they need to join with a larger fact table. So in this case, we can no longer apply static partition pruning because the filter is on one side of the join, and the table that is more appealing and more attractive to prune is on the other side of the join. So, we have a problem now.

And that’s where dynamic partition pruning comes into play.

Comments closed

Migrating from Azure SQL DB to an Azure SQL Managed Instance

Eitan Blumin walks us through the process of moving from Azure SQL Database to an Azure SQL Managed Instance:

What we cannot do:

– You cannot backup and restore from Azure SQL DB to a Managed Instance.
– You cannot use Azure SQL DB as a source in Azure Data Migration Service (DMS).
– You cannot use Azure SQL DB as a source in Data Migration Assistant (DMA).
– You cannot use Azure SQL DB as a source in DB Mirroring, AlwaysOn, Transaction Log Shipping, or Replication.
– Implementing a custom T-SQL, SSIS, or ADF (Azure Data Factory) solution will be too complicated, and will require an unacceptable development overhead.

That last one is a bit iffy, though Eitan’s two solutions are going to be easier than a custom solution.

Comments closed

Power BI Report Page Access Control

Gilbert Quevauvilliers walks us through access control on Power BI pages:

With the update to the latest version in Power BI Desktop there is now the capability to control which users will see which pages in a report.

This is achieved using a combination of Row Level Security (RLS) and conditional formatting. This makes it a secure way of controlling access for specific users. I will demonstrate how this works below.

This is a clever solution to the problem of access control.

Comments closed

Pulling GROUP BY Above a Join

Paul White has fun with a SQL Server optimization:

When the optimizer moves a Group By above a Join it has to preserve the semantics. The new sequence of operations must be guaranteed to return the same results as the original in all possible circumstances.

One cannot just pick up a Group By and arbitrarily move it around the query tree without risking incorrect results.

Read on for the demo, including a couple ingenious tricks the optimizer has up its sleeve.

Comments closed

Optimizing Power Query Merges

Chris Webb wants to make your joins in Power Query faster:

The first question I decided to investigate was this:

Does the number of columns in a table affect the performance of a merge?

First of all, I created two identical queries called First and Second that connected to the CSV file, used the first row from the file as the headers, and set the data types to all seven columns to Whole Number.

Click through for the answer to this question. Chris promises a series out of this and I would expect there to be enough content for that.

Comments closed

The Power BI Pro’s Guide to Azure Synapse Analytics

Brett Powell gives a quick overview of a new whitepaper:

The Power BI Professional’s Guide to Azure Synapse Analytics, a white paper I wrote describing the Synapse Analytics platform and its benefits and use cases for Power BI professionals, was published a couple weeks ago. This post discusses a few themes from this paper and also shares a couple notes that weren’t included.

There are some interesting notes in here, so check those out and also get ahold of the whitepaper to understand how Power BI relates to the artist formerly known as Azure SQL Data Warehouse.

Comments closed

SQL Server on a Windows Container

Kevin Chant lives dangerously:

In this post I want to cover an interesting Windows Container with SQL Server installed experiment that I did. Because it was fairly involved, and it took a while.

In fact, this is the experiment I was talking about in my recent post about recent Azure Data Studio updates. Which you can read about in detail here.

My general philosophy is to avoid Windows containers at all costs, though I’m glad that there are some more adventurous than I.

Comments closed