Press "Enter" to skip to content

Day: October 16, 2024

A Primer on Pandas

Rajendra Gupta talks about Pandas:

Have you heard about Pandas in Python? It is widely used open-source library for analyzing and manipulating data in the Python programming language. Let’s explore it with use cases and examples.

Click through for an overview of the library. Pandas isn’t the quickest performer as your data sets get large, but for ease of use on moderately-sized datasets (up to hundreds of thousands of rows, or maybe millions if you manage things well), it does a good job.

Comments closed

Simplifying Azure Authorization with Managed Identities

MIka Sutinen doesn’t need a password:


Managed Identities
 in Azure provide an efficient and secure way for managing credentials when accessing databases, or other Azure resources. In this post, we build on my recent post about using Service Principal for database authentication, further exploring how we can leverage Managed Identities to simplify the process.

While Service Principals can make some parts of the access management simpler in Azure, you’re still left with several responsibilities in their management.

These responsibilities include rotating secrets and managing the lifecycle of the Service Principal, to name maybe the two most crucial ones. And if you have multiple applications, each with a separate Service Principal, this can be complex to manage.

Click through to learn more about managed identities and service principals, and how they work to link together Azure resources behind the scenes.

Comments closed

Backup Types and Recovery Models in SQL Server

I have a new video:

In this video, I cover the three recovery models available to SQL Server, describe (most of) the types of backups you can take, and spend a bit of time covering Recovery Point Objective & Recovery Time Objective.

The advice for video length that people tend to give is 8-10 minutes. I typically average closer to 15 minutes per video. This one is nearly 30 minutes long because there’s just so much information to cover, even with me repeatedly saying “Don’t worry, I’ll cover this bit in a future video.”

Comments closed

Writing Data to an Unattached Lakehouse via Fabric Notebook

Prathy Kamasani does a bit of movement:

Regardless of which architecture we follow, during stages of data integration and transformation there’s always a step to move data from one location to another. And, we work with multiple tables, schemas, and even lake houses.Same goes with Fabric Notebooks. I often find myself in scenarios where I don’t want to attach Lakehouse to my notebook, but I do want to read or write data from various bakehouses.

I recently blogged about a way to achieve this as part of documenting your workspaces. In that post, I described how to write data to a workspace that was not attached to the notebook. I used MsSparkUtil(renamed to NotebookUtils) to mount and then write data in the Lakehouse as Delta tables.

Read on for the answer.

Comments closed

Converting Audit Files to CSV via Read-SqlXEvent

Patrick Keisler finds a work-around:

Reading the contents of an audit event file can be accomplished using either Management Studio or the T-SQL function, sys.fn_get_audit_file. However, sometimes a customer may want to use a third-party tool to read and aggregate audit records, and some of those tools do not have the ability to read the binary audit file. In that case, the customer will just use sys.fn_get_audit_file to write the audit records to a database and then use the third-party tool to ingest those records from the database.

What if that third-party tool cannot even read from SQL Server? That happened to me recently where the only option was to read from a text-based file.

Read on for Patrick’s solution to the problem and a real pain point you’ll find along the way.

Comments closed

Thoughts on Combining UPDATE Operations

Brent Ozar thinks about consolidation:

I’ve got a BEGIN TRAN in there before the updates just so I can test the same queries repeatedly, and roll them back each time. The execution plan for the updates is quite nice: SQL Server divebombs into the supporting indexes:

Relatively few rows match, so our query does less than 1,000 logical reads – way less than there are pages in the table. In this case, separate UPDATE statements make sense.

Brent then continues with, but what do we do when we need to perform multiple independent scans of the same table? Read on for Brent’s answer, but definitely check out Thomas Franz’s comments, which indicate a potential complicating factor.

Comments closed

Ingesting Blob Storage Data into SQL Server

Andy Brownsword brings in some data:

We may associate consuming data from Azure Storage with tools like Data Factory or even SSIS as we saw recently. We don’t always need the middle man though.

Here we’ll demonstrate how to use an External Data Source to perform the ingestion directly into SQL Server.

Click through for the solution. As a quick note, the TYPE attribute that Andy uses in CREATE EXTERNAL DATA SOURCE was necessary from SQL Server 2016 through SQL Server 2019, but no longer exists for SQL Server 2022. Instead, for SQL Server 2022, you’d switch the LOCATION to start with abs:// for Azure Blob Storage and PolyBase would infer the type from the protocol.

Comments closed