Press "Enter" to skip to content

Author: Kevin Feasel

Role-Playing Dimensions in Direct Lake

Chris Webb puts on a mustache and changes his shirt really quickly:

Note that the Sales fact table has two date columns, OrderDate and ShipDate.

If you create a DirectLake semantic model using the Web Editor and add these two tables you could rename the Date table to Order Date and build a relationship between it and the OrderDate column on the Sales table:

What about analysing by Ship Date though? You could create a physical copy of the Date table in your Lakehouse and add that to the model, but there’s another option.

Read on for that answer. Interesting that, as of right now, the primary way to do this is with third-party software.

Comments closed

Boosting versus Bagging in Tree Models

Vinod Chugani compares two techniques for working with trees:

Ensemble learning techniques primarily fall into two categories: bagging and boosting. Bagging improves stability and accuracy by aggregating independent predictions, whereas boosting sequentially corrects the errors of prior models, improving their performance with each iteration. This post begins our deep dive into boosting, starting with the Gradient Boosting Regressor. Through its application on the Ames Housing Dataset, we will demonstrate how boosting uniquely enhances models, setting the stage for exploring various boosting techniques in upcoming posts.

Read on for more information. The neat part about the “boosting versus bagging” debate is that both techniques are quite useful. Although boosting (via algorithms like XGBoost or LightGBM) is the more popular technique, bagging (random forest) is extremely powerful in its own right.

Comments closed

Rebuilding a Transaction Log

David Fowler fixes a large-scale oopsie:

“Could you help me, we deleted the database’s transaction log file and now that database is stuck in ‘Recovery Pending’?”

This was a panicked call that I received a few weeks ago.

“Sure, no problem” said I, “we’ll have to restore back to your last backup”

And then things went silent for a while before the inevitable, “it’s only a development database, we don’t take backups”.

I can feel the face-palm from here. Read on to learn what you can do if you’re in that situation, as well as David’s important note about taking backups so that you don’t end up in this situation to begin with.

Comments closed

Connecting to Azure Storage from SSIS

Andy Brownsword makes a connection:

Migrating to the cloud can be disruptive to existing processes. Moving storage to Azure isn’t a simple configuration change for SSIS packages.

SSIS doesn’t have native connections for Azure. That doesn’t mean we need to completely re-engineer the process or change technology though.

How can we take the simple package below and move to using Azure storage?

Read on for the answer. Also, I am 100% on Team SAS Token. They are easy to create and give you a lot of control over who gets access to what.

Comments closed

Kafka Internals: Handling a Producer Request

Danica Fine continues a series on Kafka internals:

Welcome to the second installment of our blog series to understand the inner workings of the beautiful black box that is Apache Kafka®. 

We’re diving headfirst into Kafka to see how we actually interact with the cluster through producers and consumers. Along the way, we explore the configurations that affect each step of this epic journey and the metrics that we can use to more effectively monitor the process. 

In the last blog, we explored what the Kafka producer client does behind the scenes each time we call producer.send() (or similar, depending on your language of choice). In this post, we follow our brave hero, a well-formed produce request, that’s on its way to the broker to be processed and have its data stored on the cluster.

Click through to learn more about how it all works.

Comments closed

Copying Content between Databricks Unity Catalogs

Meagan Longoria busts out the photocopier:

I had a couple of clients who were moving content from development catalogs to production catalogs for the first time. They wanted to copy the schema and data from tables, views, and volumes.

So I wrote a python notebook to handle this task. It creates the objects in the new catalog and then changes the owner to a selected user/group. You can find the full notebook on GitHub.

You can check out the notebook, and Meagan also has explanatory notes covering the process.

Comments closed

Cardinality Estimation Changes Post-SQL Server 2014

Brent Ozar reminds us that small changes happen:

About 10 years ago, Microsoft made changes to the Cardinality Estimator (CE) which caused some problems for SQL Server upgrades. When folks upgraded to SQL Server 2014, they also casually switched their databases’ compatibility level to the latest version, because for years that hadn’t really affected query plans. They just figured they wanted the “latest and greatest” compat level, without regard to the effects. That backfired badly when they suddenly got 2014’s Cardinality Estimation changes.

So for several years, whenever someone upgraded from older versions, and they complained about performance, the stock community answer was, “Change your compatibility level back to what it used to be.” In many cases, that just solved the problems outright, leading to blog posts like this and this.

Even today on SQL Server 2019 & 2022, this advice is still relevant! If you mess around with compatibility levels, you can absolutely change cardinality estimations in ways you didn’t expect.

Read on for more information, including examples where cardinality estimation has improved with SQL Server 2022 and where it has gotten worse.

Comments closed