Press "Enter" to skip to content

Month: May 2024

The Joy of Partitioned Views

Rod Edwards talks partitioned views:

This post came around when I was at a loose end one evening, and just started poking at a local sandpit database, and it got me reminiscing and revisiting / testing a few things. The devil makes work for idle thumbs and all that…

Partitioned Views…do they have a place in society anymore?

Rod does a great job of following Betteridge’s Law of Headlines, as well as saving the ‘Yes’ answer for the post itself. Partitioned views come with their own pains, though one use case Rod did not include is using PolyBase and partitioned views to move “cold” data to slower external storage.

Comments closed

Job Threading and Thread Partitioning in SQL Server

Aaron Bertrand continues a series on threading:

In part 2 of this series, I showed an example implementation of distributing a long-running workload in parallel, in order to finish faster. In reality, though, this involves more than just restoring databases. And I have significant skew to deal with: one database that is many times larger than all the rest and has a higher growth rate. So, even though I had spread out my 9-hour job with 400 databases to run faster by having four threads with 100 databases each, one of the threads still took 5 hours, while the others all finished within 1.5 hours.

Read on to learn what Aaron did to make things move faster.

Comments closed

MFA Requirement for Azure Users

Erin Chapple opens a can of worms:

This July, Azure teams will begin rolling out additional tenant-level security measures to require multi-factor authentication (MFA). Establishing this security baseline at the tenant level puts in place additional security to protect your cloud investments and company. 

MFA is a security method commonly required among cloud service providers and requires users to provide two or more pieces of evidence to verify their identity before accessing a service or a resource. It adds an extra layer of protection to the standard username and password authentication.

The problem is, there are a lot of good questions people are asking in the comments and currently, there are no answers.

Comments closed

Actual Execution Plans and Lock Waits

Erik Darling notices me in a leg cast staring through his window with my telescope:

A long time ago, I complained that wait stats logged by actual execution plans don’t show lock waits. That seemed like a pretty big deal, because if you’re running a query and wondering why sometimes it’s fast and sometimes it’s slow, that could be a pretty huge hint.

Click through for the full story. Getting actual waits is indeed a big deal, and way easier than any of the alternatives like spinning up a special extended events session or yelling at everyone not to use the server for a few minutes while you ran your query.

Comments closed

Using F-SKU Power BI Capacity and Microsoft Fabric

Chris Webb has a public service announcement:

Since the announcement in March that Power BI Premium P-SKUs are being retired and that customers will need to migrate to F-SKU capacities intead I have been asked the same question several times:

Why are you forcing me to migrate to Fabric???

This thread on Reddit is a great example. What I want to make clear in this post is the following:

Moving from P-SKU capacities to F-SKU capacities is not the same thing as enabling Fabric in your tenant

Click through for Chris’s explanation. Also check out the comments section for this one, as there are plenty of questions and responses in there.

Comments closed

Using the pg_repack Extension

Muhammad Ali tries out an extension:

Regular updates and deletions within PostgreSQL tables can lead to various issues such as bloat, fragmentation, and a decline in performance over time. These challenges can significantly impact the efficiency and reliability of the database, potentially affecting critical operations.

To address these concerns, PostgreSQL introduced the pg_repack extension, which provides a robust solution for managing table maintenance without disrupting the production environment. By allowing tables to be rebuilt online, pg_repack tackles bloat and fragmentation issues, ensuring that database storage remains optimized and performance is consistently maintained.

Read on to see why vacuuming might not be enough and what pg_repack does.

Comments closed

Gradient Boosting for Classification

I have a new video:

In this video, I take a look at an alternative to bootstrap aggregation & random forest: boosting. We cover a brief history of boosting and see how it works in action with XGBoost and LightGBM.

This is probably the video with the single largest number of links in my show notes. It’s also one of the shortest in the series; it’s funny how things work out sometimes.

Comments closed

Authenticating to Fabric APIs via Sempy and Service Principals

Gilbert Quevauvilliers links everything together:

I have been doing a fair amount of work lately with Fabric Notebooks.

I am always conscious to ensure that when I am authenticating using a Service Principal, I can make sure it is as secure as possible. To do this I have found that I can use the Azure Key Vault and Azure identity to successfully authenticate.

Read on for some of the advantages of using Azure Key Vault for this sort of credential management, as well as how to get it all working.

Comments closed

Creating Orchestrators in Azure Data Factory

Martin Schoombee continues a series on building an orchestration framework in Azure Data Factory:

The orchestration layer of the framework is where all the magic happens. It facilitates the execution of processes and/or tasks as defined in the metadata, and needs to do it both seamlessly and efficiently. Ideally you would want to deploy this layer only once, and never have to touch it again. And it is really with that in mind that I designed this layer…to function independently and with minimal dependencies in both directions.

I would have loved for this layer to consist of only one pipeline but there are some nuances in Data Factory that make it impossible, the primary nuance being that you cannot nest ForEach activities. As a result, this layer contains three pipelines that will be covered by the sections below in more detail.

Read on to see what those three pipelines are.

Comments closed

Refreshable Excel Files in the Power BI Service

Kristyna Ferris shows how to refresh a Power BI data source from Excel files in Sharepoint:

Ever since Excel made its debut in the 1980’s, it has been used as a quick way for end users to input and manipulate data on their own without going through the extensive data engineering and data ingestion processes. With Power BI coming on to the scene in 2015, it quickly became the go-to visualization tool for various data sources. These two powerful tools can be used together to drive customized insights for your organization. By uploading your Excel file into SharePoint/OneDrive, you can easily connect and set up a refresh to a Power BI report in the Power BI Service without an on-premises gateway.

Read on to see how it all works.

Comments closed