Press "Enter" to skip to content

Day: April 1, 2024

Quantile Normalization in R

Steven Sanderson has achieved normality:

Before we dive into the code, let’s understand the concept behind quantile normalization. At its core, quantile normalization aims to equalize the distributions of multiple datasets by aligning their quantiles. This ensures that each dataset has the same distribution of values, making meaningful comparisons possible.

This is a bit different from normalizing individual data points in one dataset, as you can see in the post.

Comments closed

Power BI Paginated Reports over Excel and Web Services

Chris Webb shares an announcement:

By far the most exciting announcement for me this week was the new release of Power BI Report Builder that has Power Query built in, allowing you to connect to far more data sources in paginated reports than you ever could before. There’s a very detailed blog post and video showing you how this new functionality works here:

Given that this seems to be the spiritual replacement for SSRS, it’s good to see them still working on it.

Comments closed

SSMS 20 Minus Azure Data Studio

Erik Darling makes an observation:

I was quite publicly mystified by the coupling of Azure Data Studio into the SQL Server Management Studio installer. They’re different tools for different people.

This isn’t a tirade against Azure Data Studio, nor is it a victory lap since apparently the feedback item I posted was part of the decision to remove it.

This is purely informational, since the announcement that it’s not in there anymore is nearly as quiet as the announcement that it was being included back in SQL Server Management Studio 18.7, and only slightly louder than the availability of a command line switch to skip installing it.

I say this as someone whose primary SQL tool is Azure Data Studio: good. There’s a lot of functionality overlap between the two but it’s better making both of them optional.

Also, I’m subscribed to that same RSS feed Erik mentioned and fully agree with Erik’s assessment of Erin. Even if she is a Michigan fan.

Comments closed

Designing for Direct Lake Mode

Paul Turley shares some advice:

Since the introduction of Power Pivot for Excel, SQL Server Analysis Services Tabular, Azure Analysis Services and Power BI; the native mode for storing data in a semantic data model (previously called a “dataset” in Power BI) has been a proprietary file structure consisting of binary and XML files. These file structures were established in the early days of multidimensional SSAS back in 2000 and 2005. When an Import mode model is published to the Power BI service, deployed to an SSAS server or when Power BI Desktop is running, data for the model is loaded into memory where it remains as long as the service is running. When users interact with a report or when DAX queries are run against the model, results are retrieved very quickly from the data residing in memory. There are some exceptions for very large models or when many models in the service don’t all fit into memory at the same time, the service will page some or all of the model in and out of memory to make sure that the most-often used model pages remain in memory for the next user request. But, for argument’s sake, the entire semantic model sits in memory, waiting for the next report or user request.

Rather than the proprietary SSAS file structure, Direct Lake models use the native Delta-parquet files that store structured data tables for a Fabric lakehouse or warehouse in One Lake. And rather than making a copy of the data in memory, the semantic model is a metadata structure that shares the same Delta-parquet file storage. As soon as a report runs against a model, all of the model data is paged into memory which then behaves a lot like an Import mode model. This means than while the model remains in memory, performance should about the same as Import, with a few exceptions.

Read on to see what the capabilities of Direct Lake mode are today, as well as a few design considerations for your Microsoft Fabric architecture.

Comments closed

Using Azure SQL Database Elastic Pools

Josephine Bush shares some tips on using elastic pools in Azure SQL DB:

Main points about elastic pools

  • Elastic pools enable you to purchase resources for a pool shared by multiple databases to accommodate unpredictable usage periods by individual databases.
  • You can create multiple pools on a server, but you can’t add databases from different servers into the same pool.
  • Pools are well-suited for many databases with specific utilization patterns. These patterns are characterized by low average utilization with infrequent utilization spikes for a given database. Conversely, multiple databases with persistent medium-high utilization shouldn’t be placed in the same elastic pool.

All the databases in the elastic pool share the DTU and storage. This is very apparent when multiple persistent medium-high utilization databases are in the same pool, which is why Microsoft advises against it.

Read on to see information about storage, DTUs, noisy neighbors, and more.

Comments closed