Press "Enter" to skip to content

Curated SQL Posts

Writing Data to an Unattached Lakehouse via Fabric Notebook

Prathy Kamasani does a bit of movement:

Regardless of which architecture we follow, during stages of data integration and transformation there’s always a step to move data from one location to another. And, we work with multiple tables, schemas, and even lake houses.Same goes with Fabric Notebooks. I often find myself in scenarios where I don’t want to attach Lakehouse to my notebook, but I do want to read or write data from various bakehouses.

I recently blogged about a way to achieve this as part of documenting your workspaces. In that post, I described how to write data to a workspace that was not attached to the notebook. I used MsSparkUtil(renamed to NotebookUtils) to mount and then write data in the Lakehouse as Delta tables.

Read on for the answer.

Comments closed

Converting Audit Files to CSV via Read-SqlXEvent

Patrick Keisler finds a work-around:

Reading the contents of an audit event file can be accomplished using either Management Studio or the T-SQL function, sys.fn_get_audit_file. However, sometimes a customer may want to use a third-party tool to read and aggregate audit records, and some of those tools do not have the ability to read the binary audit file. In that case, the customer will just use sys.fn_get_audit_file to write the audit records to a database and then use the third-party tool to ingest those records from the database.

What if that third-party tool cannot even read from SQL Server? That happened to me recently where the only option was to read from a text-based file.

Read on for Patrick’s solution to the problem and a real pain point you’ll find along the way.

Comments closed

Thoughts on Combining UPDATE Operations

Brent Ozar thinks about consolidation:

I’ve got a BEGIN TRAN in there before the updates just so I can test the same queries repeatedly, and roll them back each time. The execution plan for the updates is quite nice: SQL Server divebombs into the supporting indexes:

Relatively few rows match, so our query does less than 1,000 logical reads – way less than there are pages in the table. In this case, separate UPDATE statements make sense.

Brent then continues with, but what do we do when we need to perform multiple independent scans of the same table? Read on for Brent’s answer, but definitely check out Thomas Franz’s comments, which indicate a potential complicating factor.

Comments closed

Ingesting Blob Storage Data into SQL Server

Andy Brownsword brings in some data:

We may associate consuming data from Azure Storage with tools like Data Factory or even SSIS as we saw recently. We don’t always need the middle man though.

Here we’ll demonstrate how to use an External Data Source to perform the ingestion directly into SQL Server.

Click through for the solution. As a quick note, the TYPE attribute that Andy uses in CREATE EXTERNAL DATA SOURCE was necessary from SQL Server 2016 through SQL Server 2019, but no longer exists for SQL Server 2022. Instead, for SQL Server 2022, you’d switch the LOCATION to start with abs:// for Azure Blob Storage and PolyBase would infer the type from the protocol.

Comments closed

Functions in JOIN and Performance Problems

Steve Stedman lays out a warning:

When writing SQL queries, it’s easy to focus on getting the right results without thinking too much about performance. One common mistake that can lead to significant slowdowns is using functions in the JOIN ON clause of a SELECT statement.

While SQL Server supports a wide range of built-in functions, using them incorrectly—especially in the join conditions—can severely impact performance. Let’s break down why this happens, how it can affect your query speed, and what you can do to avoid it.

Read on for the answer.

Comments closed

Restarting Failed Control Flows in Azure Data Factory

Meagan Longoria doesn’t want to repeat good work:

I presented at SQL Saturday Pittshburgh this past weekend about populating your data warehouse with a metadata-driven, pattern-based approach. One of the benefits I mentioned is that it’s easy to employ this pattern for restartability.

For instance, let’s say I am loading data from 30 tables and 5 files into the staging area of my data mart or data warehouse, and one of table loads fails. I don’t want to reload the other tables I just loaded. I want to load the ones that have not been recently loaded. Or let’s say I have 5 dimensions and 4 facts, and I had a failure loading a fact table. I don’t want to reload my dimensions, and I only want to reload the failed facts. How do we accomplish this?

Read on to learn how.

Comments closed

Configuration Is Code

Steve Jones has a public service announcement:

I posted a note on Twitter/X with this quote: “The content updates had not previously been treated as code because they were strictly configuration information.” This is from testimony given by Crowdstrike to a US Congressional committee in trying to explain how they grounded much of the airline industry a few months ago. That was a mess of a situation, and apparently, the vendor didn’t think their configuration was part of their code.

That’s an amazing viewpoint to me. The fact that any developer or manager thinks that their configuration data isn’t a part of their code is worth testing. Yet, I see this attitude all the time, where developers, QA, managers, and more think that the code is the only thing that changes or doesn’t change, ignoring the fact that there are configuration items that affect the code and need to be managed appropriately. Certainly, if the config data were in enums rather than in a file or database they’d feel differently.

Read on for Steve’s extended thoughts. I can understand the urge to call something “just a configuration file” so that you don’t have to do as much work. But that can lead to disaster.

Comments closed

Tips for Optimizing Power BI Semantic Models

Koen Verbeeck shares some tips:

Power BI is designed to be user-friendly. With just a few clicks, you can import data from various sources, combine them together in one data model and start analyzing it using powerful data visualizations. This sometimes leads to a scenario where people are just importing data into the tool without giving it too much thought. When you’re working on a solo project on a small dataset, there probably won’t be too many issues. But what if your report is successful and you want to share it with your colleagues and maybe other departments? Or more data is loaded into the model, but refreshes are taking more and more time? Even other data sources are added into your model, but writing DAX formulas has become hard, and reports are slowing down.

In this article, we’ll cover a couple of tricks that will help you make your Power BI models smaller, faster and easier to maintain. In the immortal words of Daft Punk: “Harder. Better. Faster. Stronger”.

Click through for those tricks and tips.

Comments closed