Press "Enter" to skip to content

Curated SQL Posts

Wrapping up the Advent of Microsoft Fabric

Tomaz Kastrun gets to 25. Day 24 covers OneLake in Fabric:

OneLake comes automatically with every Microsoft Fabric tenant and represents a single, logical data lake. Its main features are its unification and one copy of data across the organization and multiple analytical engines.

And Day 25 provides some additional references Tomaz has found useful along the way:

To wrap up the series, let’s check the material available online, for you to continue learning, exploring and enjoying Microsoft Fabric.

All in all, this has been a really good series and well worth going through if you are learning Microsoft Fabric.

Comments closed

Installing SQL Server on Ubuntu 22.04

I have a new video:

In this video, we will show how to install SQL Server on a machine running Linux, specifically SQL Server 2022 on Ubuntu 22.04 LTS.

This ties back to one of my first videos, covering the installation of SQL Server 2022 on Ubuntu 20.04, as at that time, there was no support for Ubuntu 22.04 and some of the libraries Microsoft was counting on had changed, so you couldn’t trick installation by using the 20.04 repository.

Comments closed

Using KQL in Azure SQL DB Audits

Josephine Bush tracks what’s happening on that Azure SQL Database:

According to Microsoft, “Kusto Query Language (KQL) is a powerful tool to explore your data and discover patterns, identify anomalies and outliers, create statistical modeling, and more. The query uses schema entities that are organized in a hierarchy similar to SQLs: databases, tables, and columns.”

Note: KQL is case-sensitive for everything. Also, remember to refrain from querying everything just like you wouldn’t with SQL — don’t do the equivalent of SELECT * from gianttable.

Microsoft also has a lot of documentation with best practices and a quick reference guide to the Kusto commands. This blog post covers the ones I use the most.

Read on for a primer on the language, specifically some of the things you can do when reading Azure SQL Database audit information.

Comments closed

Advent of Code in T-SQL Day 7

Kevin Wilkie continues the 2023 advent of code. Part 1 of day 7 covers card hands:

I have my version of the Day 7 data in my handy dandy table called AOCDay7. This time we’re being asked to figure out what kinds of Camel hands are given to us and then told to rank all of them.

Let’s start by aggregating and analyzing the data a little bit.

Part 2 makes jokers wild:

This time, our wonderful friends the elves have decided that they don’t like Jacks but they do like Jokers. So, there are a few changes to our code that will need to be made:

Click through for Kevin’s solutions to these challenges.

Comments closed

Tracking Progress on Deletion

Kenneth Fisher has a way:

Alternate title: How do I tell how far I am on that command?

This little command (frequently with a WHERE clause) is a great way to tell how far along you are on any given command. Specifically, I’ve been running a bunch of DELETEs recently. I’ve got something like 5-50 million rows of data to delete and I’d like to know how far along I am. A common way to do this would be just a simple count.

Click through for Kenneth’s technique, as well as good information from Jeff Moden in the comments.

Not mentioned in this post is that hopefully, your massive delete operations are running in batches, as trying to delete 5 million or more rows in a single DELETE command is…resource-intensive.

Comments closed

After Login Triggers in Oracle

David Fitzjarrell performs some auditing:

In a world of ever-increasing access account security has become a major concern. More often than not accounts used for automated processes gather more and more privileges, on the assumption that they are used only by managed, automated processes and execute vetted code, The argument is that these automated processes can’t go rogue and do damage with these elevated privileges, which is true, sort of. In and of itself the database does not police where these account connections originate, which can pose a significant security risk should this vetted code, or passwords for these privileged accounts, become known.

Oracle has, for decades, provided after logon triggers, which can be utilized to ‘police’ accounts by reporting where they originate and, if necessary, disallowing such logons. Of course it will be necessary to record such connection ‘violations’ in a table so that they may be investigated. A “complete” script to do this is shown below:

Click through for that script, as well as additional thoughts from David.

Comments closed

Local Variables in Stored Procedures

Erik Darling does not approve:

Like many other things we’ve discussed thus far, local variables are a convenience to you that have behavior many people are still shocked by.

You, my dear and constant reader, may not be shocked, but the nice people who pay me money to fix things seem quite astounded by what happens when you invoke local variables.

So I find myself in a difficult position: do I dredge up more red meat for the millions of die-hard SQL Server performance nuts who come here for the strange and outlandish, or produce evergreen content for people who pay my substantial bar tabs.

You have at least a 50% chance to guess what Erik does next.

Comments closed

Exploratory Data Analysis with F# and Plotly

Matt Eland is speaking my language (F#):

One of the most common tasks with data roles is the need to perform exploratory data analysis (EDA).

With EDA a data scientist, data analyst, or other data-oriented programmer can:

  • Understand the value distributions of their data
  • Identify outliers and data anomalies
  • Visualize correlations, trends, and relationships between multiple variables

Exploratory data analysis usually involves:

  1. Loading the data into a DataFrame
  2. Performing descriptive statistics to identify the raw shape of the data
  3. Visualizing variables of interest on their own or with other variables.

In this article I’ll walk you through the process of loading data from a sample dataset into a Microsoft.Data.Analysis DataFrame (the kind featured in ML.NET). Next, we’ll look at the descriptive statistics the DataFrame class provides and then explore the process of creating some simple visualizations with Plotly.NET.

Read on for the scenario and analysis.

Comments closed

Continuing the Advent of Fabric

Tomaz Kastrun has me playing catch-up. First up, monitoring workspaces in Fabric:

The easy way to check, view and track your activities and execution and runs of notebooks, data pipelines, data factory executions, datasets refresh, and many others.

Next, third-party applications:

Apps are collections of dashboards and reports in one easy-to-find place. Go to Apps and click on “Get Apps”.

And finally, the admin portal:

Admin portal serves purpose for governing and setting the Microsoft Fabric, where you can make  tenant settings, also access the Microsoft 365 admin portal, and control how users interact with Microsoft Fabric.

To access the admin portal, not only you need a Fabric license but also admin rights with the following roles (in one of these roles; if you are not, you can only see Capacity setting in the admin portal):

  • Global administrator
  • Power Platform administrator
  • Fabric administrator

If you’re just getting started with Microsoft Fabric, you could do a lot worse than going through Tomaz’s series.

Comments closed

Fabric F2 Performance

Teo Lachev has started a new series. We begin with warehouse ETL:

As inspired by Amir Netz‘s encouragement to partners to test the Fabric F2 capacity performance, I got on a quest to test what it would do to ETL loads for Fabric Warehouse. I must admit that I was skeptical that a quarter of a core would take a warehouse off the ground, but as usual, life proved me wrong and “wrong” is a big understatement of what happened.

After provisioning a Fabric F2 capacity and a warehouse, I settled on the Retail Data Model for World Wide Importers sample star schema dataset consisting of five dimension tables and one fact table. In terms of performance, I was mostly interested in how long it would take for the ADF copy activity to insert all the data (50 million rows) in the fact table. Granted, it’s a limited test but enough to rule out the technology for real-life projects. Then, I compared the performance against Azure SQL Database Serverless running on up to 2 cores and provisioned by the free trial offer that Microsoft has on Azure. To exclude impact on data transfer between regions, both technologies were provisioned on East US 2 data region, which is the region where my Power BI tenant is hosted on.

Then we have report load time:

What a better way to spend a lazy holiday afternoon than to do more Fabric performance testing? In my previous post, I shared my results from a single-threaded ETL load test to gauge the F2 ingest performance and F2 did pretty well (or at least outperformed Azure SQL DB). Will F2 hold as parallelism increases? Throughput testing is especially important for report loads because parallel tasks can run within a report, such as visuals executing DAX queries in parallel, and across reports, such as when concurrent report requests overlap.

I’m legitimately surprised at the results. I expected F2 to be barely sufficient for testing purposes. Read both posts to see how it performs and some caveats around performance.

Comments closed