Press "Enter" to skip to content

Month: June 2022

KEEPFILTERS in DAX

Marco Russo and Alberto Ferrari explain how the KEEPFILTERS function works:

KEEPFILTERS is a CALCULATE modifier used to change the way CALCULATE merges new filters with the outer filter context. Indeed, the default behavior of CALCULATE is to override existing filters. By using KEEPFILTERS you ask CALCULATE to add the new filter to the outer filter context, instead of overriding the outer filter.

Read on for the explanation and a demo.

Comments closed

Reviewing Power BI Datamarts

Teo Lachev looks at Power BI Datamarts:

As Microsoft announced here, Power BI datamarts are upon us. I can almost see an important enterprise client demanding “self-service datamarts me now or else… “, thus inspiring an opportunity for another premium feature, spearheaded with great vision and effort, but questionable practical value. In a nutshell, a Power BI datamart is a combo of Power BI Premium and a Microsoft-hosted Azure SQL Database aiming to simplify the implementation of a departmental datamart.

This take is a bit more negative than most of the others I’ve seen, so it’s worth a read in comparison to what others have written.

Comments closed

Reconciling Tag Names across Azure

Anthony Watherston has an interesting script:

During a recent cost optimization workshop with a customer, they mentioned that although they had some tagging policies in place there was no consistency of tag names across the Azure environment. This post introduces a script to remediate this and remove some confusion from your tagging strategy.

The customer was trying to ensure that all resources were being tagged with a cost centre tag. Some of this was automatic and some was done manually by people. While there was a policy in place to control this in the future, they needed a way to remediate the existing resources.

This is really useful if you have enough information to create a to-and-from mapping. It won’t automatically understand anything, so you’ll need to do the digging but it will do the renaming.

Comments closed

Testing Azure Synapse Link for SQL Server 2022

Kevin Chant gives Synapse Link for SQL Server a try:

Azure Synapse Link for SQL Server 2022 allows you to replicate your data from a SQL Server 2022 database to an Azure Synapse Analytics dedicated SQL Pool.

It is one of the options for the new Azure Synapse Link for SQL feature that was announced during Microsoft Build. You can read more about this in the Microsoft post which also announced the Public Preview of Azure Synapse Link for SQL.

Click through to see what Kevin has found so far. I think by the time this rolls out GA, it should be pretty good.

Comments closed

Downplaying Logical Reads

Erik Darling lays out an argument:

I decided to expand on some scripts to look at how queries use CPU and perform reads, and found some really interesting stuff. I’ll talk through some results and how I’d approach tuning them afterwards.

Interestingly, I just dealt with a mini-consulting engagement in which I saw the opposite: CPU sitting there twiddling its thumbs because of I/O insanity—and not even slow disks. In that case, the advice generally was “add this obviously missing index from this rather large table and stop scanning when you get 1 row on these really busy queries.” There was a little more nuance than that—and in fairness, physical reads were bad as well—but that’s why we investigate systematically.

Also, I generally accede to Erik’s point: for most busy environments, logical reads are unlikely to be the constraining factor and there are plenty of times where I choose the query form with more logical reads because it reduces CPU and memory requirements.

Comments closed

Comparing Properties between SQL Server Instances

Eitan Blumin plays duck-duck-goose:

A few years ago, I created a couple of T-SQL scripts that can be used for comparing instance-level and database-level properties between two HA/DR replicas. Originally, this supported comparing only two servers. But recently, following a fan request, I upgraded the script to support an unlimited number of servers that you can compare to each other.

So, I figured, if one person found this useful, there must be more out there that would need this, right?

Read on for the script itself, how to use it, and some limitations you’ll need to know about.

Comments closed

Knowledge Graphs, Data Fabrics, and Data Meshes

Alan Morrison describes the history of three concepts:

By 2014, SAP was using “in-memory data fabric” to describe a virtual data warehouse, a key element of its HANA “360-degree customer view” product line. Gartner for its part uses the term “data fabric” to this day as an all-encompassing means of heterogeneous data integration. From a 2021 post on data fabric architecture: 

Read on for a high-level discussion of what each is and how it fits into the context of data warehouses and data lakes.

Comments closed

Processing Security Logs in Databricks with Delta Live Tables

Silvio Fiortio ingests some data:

Databricks recently introduced Workflows to enable data engineers, data scientists, and analysts to build reliable data, analytics, and ML workflows on any cloud without needing to manage complex infrastructure. Workflows allows users to build ETL pipelines that are automatically managed, including ingestion, and lineage, using Delta Live Tables. The benefits of Workflows and Delta Live Tables easily apply to security data sources, allowing us to scale to any volume or latency required for our operational needs.

In this article we’ll demonstrate some of the key benefits of Delta Live Tables for ingesting and processing security logs, with a few examples of common data sources we’ve seen our customers load into their cyber Lakehouse.

Click through to learn more.

Comments closed

Ingesting Event Hub Telemetry Data with PySpark Streaming

Charles Chukwudozie shows how to read from Event Hubs in Databricks with Python:

Ingesting, storing, and processing millions of telemetry data from a plethora of remote IoT devices and Sensors has become common place. One of the primary Cloud services used to process streaming telemetry events at scale is Azure Event Hub.

Most documented implementations of Azure Databricks Ingestion from Azure Event Hub Data are based on Scala.

So, in this post, I outline how to use PySpark on Azure Databricks to ingest and process telemetry data from an Azure Event Hub instance configured without Event Capture.

Click through for the process.

Comments closed

Making the Case for Azure Analysis Services

Teo Lachev makes the case:

Microsoft BI practitioners have three options for hosting semantic models: SSAS (on prem), Azure Analysis Services (cloud), and Power BI (cloud). AAS is somewhat caught between a rock and a hard place. Given that Power BI gets the most attention for cloud deployment, why would you consider AAS at all? There are two main reasons:

Read on for the reasons. Knowing how much it does cost, it almost feels like trying to thread a needle: if you don’t spent enough or have enough data, Power BI is typically much more efficient; if you have sufficient data, I’d want to do a proper cost analysis between on-prem (or IaaS) Analysis Services and Azure Analysis Services.

Comments closed