Press "Enter" to skip to content

Author: Kevin Feasel

The Most Important Tool for a Data Detective

Andy Yun wants you to use your earholes:

The All Powerful…

… Question. That is what I now believe is the most important tool for a Data Detective.

Asking Questions Effectively

This nuance involves HOW you ask a question. Some of this involves knowing your audience. Is this the right place or the right time? Sometimes there comes a point where asking questions is just counter-productive because your audience has no interest in answering. And it also means you need to make sure you’re asking the correct audience in the first place.

Asking questions is difficult, so instead, I just strawman my way to success.

Comments closed

Data Types and Arithmetical Calculations in DAX

Marco Russo and Alberto Ferrari do some counting:

The VertiPaq engine is basically data type-independent. This means that it does not matter whether a column is a string, a floating point, or a date: because of the dictionary encoding happening inside VertiPaq, all these data types use around the same amount of memory and perform at nearly the same speed.

However, when mixing different data types in the same expression, DAX will likely need to perform conversions between data types. Some of these conversions are nearly free, whereas others require the intervention of the formula engine, with a related performance impact.

We have already written about possible errors occurring during data type conversion here: Understanding numeric data type conversions in DAX and here: Rounding errors with different data types in DAX. The issue with conversion errors is mostly due to the fact that the precisions of fixed decimals (also known as Currency) and decimals (also known as floating point) are different. This article starts with a focus on performance.

Read on to see what Marco and Alberto have for us this time.

Comments closed

The Power of Computed Columns

Andy Brownsword speeds up a query:

Bad code smells can run through a system, and one instance can quickly spread when code is recycled. Simon recently looked at a non-sargable example and was able to fix it by using an index and refactoring the query.

I wanted to consider an alternative approach if we saw the same issue repeated consistently. We don’t want to fix each instance, we want a single fix. We’ll solve this with indexed computed column.

We can index computed columns to help resolve deterministic (i.e. won’t change, no GETDATE() allowed) clauses. Let’s get started.

Read on to learn more. This is a powerful approach to the problem of needing to perform some sort of data transformation before filtering your data.

Comments closed

Building a Data Detective Toolkit

Deb Melkin talks tools:

Happy T-SQL Tuesday! I wasn’t really sure I’d be able to crank something out for this one but somehow I managed to squeeze it in. Tim Mitchell ( b ) is hosting and he has a great topic for us: What’s in our Data Detective toolkit?

I love this topic for so many reasons. Partly because I feel like I’m asked to look at so many projects where I’m dropped in and asked to figure things out, usually performance related but occasionally new functionality or features. But as I’m asked to do this fairly often, I may have to see if Data Detective can be my new title… hmm…

Being a Data Detective in a film noir. On the one hand, that sounds like a really neat idea. On the other hand, things usually don’t turn out so well for the detective.

Comments closed

Grouping Rows in R

Steven Sanderson needs a GROUP BY clause:

Combining rows with the same column values is a fundamental task in data analysis and manipulation, especially when handling large datasets. This guide is tailored for beginner R programmers looking to efficiently merge rows using Base R, the dplyr package, and the data.table package. By the end of this guide, you will be able to seamlessly aggregate data in R, enhancing your data analysis capabilities.

Click through for several code examples.

Comments closed

Referencing a Microsoft Fabric ML Model from another Workspace

Sandeep Pawar crosses workspaces:

I have written a couple of blogs about working with ML models in Microsoft Fabric. Creating experiments and logging and scoring models in Fabric is very easy, thanks to the built-in MLflow integration. However, the Fabric Data Science experience has one limitation. There are no model endpoints yet, and you cannot load a model from another workspace because the model URI, unlike in Databricks, does not reference a workspace. If you use MLFlowTransformer as shown in this blog, only the model from the workspace where the notebook is hosted is loaded. However, there is a workaround.

Read on for that workaround, as well as the core limitation associated with it.

Comments closed

Fixing Missing SQL Agent Jobs Post-Migration

Lee Markum is looking for that lost shaker of SQL Agent jobs:

I’ve been doing migrations fairly continuously for the past 18 months. PowerShell has been my primary mechanism for many parts of the process, including copying jobs from the source SQL Server to the target. That has worked almost without incident each time. However, recently, an app team noticed that there were SQL Server Agent jobs missing on their new 2022 SQL Servers.  Because the first couple of missing jobs also existed on their Development environment, they were able to recreate those jobs in production. They naturally expressed concern that other jobs may be missing.

Read on for Lee’s process, including the solution.

Comments closed

Reduced Auto-Pause Delay for Azure SQL DB Serverless

Morgan Oslake goes to sleep sooner:

Azure SQL Database serverless automatically scales compute based on workload demand and bills for compute used per second.  In the General Purpose tier, serverless also provides an option to automatically pause the database during idle usage periods when only storage related costs are billed.  When workload activity returns, the database is automatically resumed.

Customers choosing to enable auto-pausing can specify the auto-pause delay as part of the serverless configuration.  The auto-pause delay is the length of time the database must be idle before auto-pausing.  The lower the auto-pause delay and the more frequently auto-pausing occurs, the greater the potential compute cost savings. 

Read on for the update in minimum auto-pause time.

Comments closed

Announcements from the European Fabric Community Conference

James Serra brings tidings:

A TON of new features announcements at the European Microsoft Fabric Community Conference help last week. The full list is here, and I wanted to list my favorite announcements from that list:

  • Access Databricks Unity Catalog tables from Fabric (public preview): You can now access Databricks Unity Catalog tables directly from Fabric. In Fabric, you can now create a new data item called “Mirrored Azure Databricks Catalog”. When creating this item, you simply provide your Azure Databricks workspace URL and select the catalog you want to make available in Fabric. Rather than making a copy of the data, Fabric creates a shortcut for every table in the selected catalog. It also keeps the Fabric data item in sync. So, if a table is added or removed from UC, the change is automatically reflected in Fabric. Once your Azure Databricks Catalog item is created, it behaves the same as any other item in Fabric. Seamlessly access tables through the SQL endpoint, utilize Spark with Fabric notebooks and take full advantage of Direct La

Read on for the rest of what James found exciting.

Comments closed