Press "Enter" to skip to content

Curated SQL Posts

Number of Fabric Workspaces and the Medallion Architecture

Kevin Chant opens a can of worms:

Since I got asked about it this week during the Learn Together session I did alongside Shabnam Watson (l/X). Plus, it is a highly debated topic in our community, and I wanted to share my thoughts about it.

Due to the fact that my personal opinion is that it depends. However, the number you choose depends on a variety of reasons which I intend to cover in this post.

By the end of this post, you will know my personal opinions as to why. Plus, plenty of things to consider when deciding on the number of workspaces to implement.

Read on for Kevin’s thoughts. My quick opinion is, one workspace per layer. Just from a logistical standpoint, keeping the several layers separated in one workspace is an immense challenge and typically requires exposing data engineering details (like what “gold”/”silver” or “curated”/”refined” actually means) with end users.

Comments closed

Max Offline Semantic Model Size Error

Chris Webb continues a series on Power BI semantic model memory errors:

In the Power BI Service, Import mode models are stored offline and paged into memory only when they are needed – for example because someone runs a report that uses the model as its source. As discussed in my last post, though, there’s a limit on the amount of memory that a model can use which varies by the size and type of capacity you’re using using. There’s also an extra complication in that Import mode models are stored offline in a compressed format that means the Power BI Service doesn’t know exactly how much memory will be needed if the entire model needs to be held in memory. As a result there is an additional limit enforced on the size of the offline copy of Import mode models to ensure they don’t use too much memory when they are eventually paged in.

Read on to learn more about this setting and what it looks like if you trigger the error.

Comments closed

SQL Server FAST N Query Hint

Chad Callihan hits the Turbo button:

How familiar are you with the OPTION (FAST N) query hint? It’s not one that I had ever used, so I decided to experiment with it a bit. Let’s look at what it does and how to apply it to a query.

My recollection is that this query hint highly prioritizes “streaming” operators and de-prioritizes operators that work on the whole dataset as a block, such as sorts. This means, for example, that you’ll get more nested loops joins and natural merge joins, but few sort + merge or hash match joins. It may also make some queries take considerably longer as a result.

Comments closed

Checking for Duplicate Rows with TidyDensity

Steven Sanderson looks for dupes:

Today, we’re diving into a useful new function from the TidyDensity R package: check_duplicate_rows(). This function is designed to efficiently identify duplicate rows within a data frame, providing a logical vector that flags each row as either a duplicate or unique. Let’s explore how this function works and see it in action with some illustrative examples.

Read on to see how it works. Though I am curious about whether there’s an option to ignore certain columns, such as row IDs or other “non-essential” columns you don’t want to include for comparison. Also, checking how it handles NA or NULL would be interesting.

Comments closed

Comparing SQL Server to Databricks

Paul Andrew makes a comparison:

Microsoft SQL Server and Azure Databricks over the many years I’ve been working in the data/IT industry have easily become my two favourite data processing tools. When Databricks became a first-class resource in Microsoft Azure, it was a big moment for the evolution of the data platform architectures I’ve designed and built (but architecture isn’t the focus for this blog). That said, rather than considering the tooling and technology as an evolution, I find a lot of people drawing comparisons between the products. This often leads to confusion and friction, as they are ultimately offering a lot of different capabilities, with only some common areas where comparisons could be made.

Read on for Paul’s thoughts. Spoilers: I agree with pretty much all of it.

Comments closed

Getting Row Counts of All Tables in a Microsoft Fabric Warehouse

Koen Verbeeck busts out the tally counter:

It says the data is 352MB in size, but after loading the data I was curious about how many rows were actually in that sample data set. Unfortunately, it’s not as straight forward as with a “normal” SQL Server database to get the row counts. First of all, when you connect with SSMS to the database there’s sadly no option to get the row counts report:

The post is a little depressing, really. But still worth the read.

Comments closed

Building an Elastic Job with Bicep

Josephine Bush flexes some muscles:

Bicep is an open-source Domain-Specific Language (DSL) that simplifies the process of deploying Azure resources. It is an abstraction layer on top of Azure Resource Manager (ARM) templates, making it easier to write and understand infrastructure code. Bicep lets you describe your Azure infrastructure using a cleaner and more concise syntax than traditional ARM templates.

It’s definitely easier to read and work with Bicep than directly with ARM template JSON. Larger Bicep scripts can still be pretty confusing, but it’s definitely easier to write and maintain.

Comments closed

Third-Party Applications and Poor Database Design

Kevin Hill talks about a bugbear of mine:

As a SQL DBA, what do you do when a vendor application has performance problems that are code related?

Server settings don’t generally seem to be an issue.

Queries and vendor code…total hands off. I just point at code and say “There’s a great choice for optimizing in your next update!”

Indexes are the “Sticky Bits” in between client data and vendor code.

To an extent, I do feel for software vendors, who have to write software that works in a variety of environments with a variety of workloads. That can be a real challenge, especially because the people developing those databases don’t get to monitor them in real time and observe what’s going on until someone reports an issue.

That’s the empathy part. The other side of the coin is, there are a bunch of vendors who have garbage-tier database designs and awful queries. And as a user of this third-party application or a consultant trying to help users of the application, that can be frustrating. The reason is, like Kevin mentions, you really don’t want to go mucking with the queries or database design, because with my luck, the change I make to improve a query’s performance will affect a trigger nested three levels deep in some totally unrelated process that somehow manages the data integrity of the entire application.

Comments closed

Fabric Workload Items in the Scanner API

Gilbert Quevauvilliers checks out the latest changes to the Scanner API:

All Fabric Workload Items are now available from the Scanner API

I was working with the customer and was looking for some information in the Scanner API.

For a change I went into Power Query and expanded the workspaces item.

Read on for more information. And if you’d like to learn more about the Scanner API, here’s a sample application showing how to use it.

Comments closed

Central Management Servers and SSMS 20

Greg Low works around an issue:

I’ve recently been doing work with a site that makes extensive use of Central Management Servers. And that’s an issue if you upgrade past v19.3 of SSMS.

Here’s my counter-argument: how frequent is it to find organizations that have enough SQL Server instances to make a Central Management Server worthwhile and also do not have any sort of certificate management process?

And more importantly, why don’t they have certificate management processes in place for SQL Server? This isn’t 2008 anymore—everybody (for some slight exaggeration of the term “everybody”) has certificate management in place for websites. It’s incredibly rare to find websites without TLS certificates, so somebody in your organization is managing certificates somehow. Why are these people not also managing certificates for SQL Server? Because once you have proper certificates in place rather than self-signed certs, there is no SSMS problem.

And if money is the issue, money is not the issue. Note that Daniel’s post is over 6 years old (and here’s me self-linking for street cred), meaning any company without the budget for proper certificates could have put this into place anytime over the past 6 years.

Self-signed certificates are okay for debugging purposes on personal machines. But they should not be acceptable for connecting to SQL Server in any environment. Certificate-driven encryption is a critical part of securing data movement over the wire, and a trusted certificate chain is critical for ensuring attackers cannot sit in the middle of that connection and read the data.

Comments closed