Press "Enter" to skip to content

Author: Kevin Feasel

Advent of Code in T-SQL Day 7

Kevin Wilkie continues the 2023 advent of code. Part 1 of day 7 covers card hands:

I have my version of the Day 7 data in my handy dandy table called AOCDay7. This time we’re being asked to figure out what kinds of Camel hands are given to us and then told to rank all of them.

Let’s start by aggregating and analyzing the data a little bit.

Part 2 makes jokers wild:

This time, our wonderful friends the elves have decided that they don’t like Jacks but they do like Jokers. So, there are a few changes to our code that will need to be made:

Click through for Kevin’s solutions to these challenges.

Comments closed

Tracking Progress on Deletion

Kenneth Fisher has a way:

Alternate title: How do I tell how far I am on that command?

This little command (frequently with a WHERE clause) is a great way to tell how far along you are on any given command. Specifically, I’ve been running a bunch of DELETEs recently. I’ve got something like 5-50 million rows of data to delete and I’d like to know how far along I am. A common way to do this would be just a simple count.

Click through for Kenneth’s technique, as well as good information from Jeff Moden in the comments.

Not mentioned in this post is that hopefully, your massive delete operations are running in batches, as trying to delete 5 million or more rows in a single DELETE command is…resource-intensive.

Comments closed

After Login Triggers in Oracle

David Fitzjarrell performs some auditing:

In a world of ever-increasing access account security has become a major concern. More often than not accounts used for automated processes gather more and more privileges, on the assumption that they are used only by managed, automated processes and execute vetted code, The argument is that these automated processes can’t go rogue and do damage with these elevated privileges, which is true, sort of. In and of itself the database does not police where these account connections originate, which can pose a significant security risk should this vetted code, or passwords for these privileged accounts, become known.

Oracle has, for decades, provided after logon triggers, which can be utilized to ‘police’ accounts by reporting where they originate and, if necessary, disallowing such logons. Of course it will be necessary to record such connection ‘violations’ in a table so that they may be investigated. A “complete” script to do this is shown below:

Click through for that script, as well as additional thoughts from David.

Comments closed

Local Variables in Stored Procedures

Erik Darling does not approve:

Like many other things we’ve discussed thus far, local variables are a convenience to you that have behavior many people are still shocked by.

You, my dear and constant reader, may not be shocked, but the nice people who pay me money to fix things seem quite astounded by what happens when you invoke local variables.

So I find myself in a difficult position: do I dredge up more red meat for the millions of die-hard SQL Server performance nuts who come here for the strange and outlandish, or produce evergreen content for people who pay my substantial bar tabs.

You have at least a 50% chance to guess what Erik does next.

Comments closed

Exploratory Data Analysis with F# and Plotly

Matt Eland is speaking my language (F#):

One of the most common tasks with data roles is the need to perform exploratory data analysis (EDA).

With EDA a data scientist, data analyst, or other data-oriented programmer can:

  • Understand the value distributions of their data
  • Identify outliers and data anomalies
  • Visualize correlations, trends, and relationships between multiple variables

Exploratory data analysis usually involves:

  1. Loading the data into a DataFrame
  2. Performing descriptive statistics to identify the raw shape of the data
  3. Visualizing variables of interest on their own or with other variables.

In this article I’ll walk you through the process of loading data from a sample dataset into a Microsoft.Data.Analysis DataFrame (the kind featured in ML.NET). Next, we’ll look at the descriptive statistics the DataFrame class provides and then explore the process of creating some simple visualizations with Plotly.NET.

Read on for the scenario and analysis.

Comments closed

Fabric F2 Performance

Teo Lachev has started a new series. We begin with warehouse ETL:

As inspired by Amir Netz‘s encouragement to partners to test the Fabric F2 capacity performance, I got on a quest to test what it would do to ETL loads for Fabric Warehouse. I must admit that I was skeptical that a quarter of a core would take a warehouse off the ground, but as usual, life proved me wrong and “wrong” is a big understatement of what happened.

After provisioning a Fabric F2 capacity and a warehouse, I settled on the Retail Data Model for World Wide Importers sample star schema dataset consisting of five dimension tables and one fact table. In terms of performance, I was mostly interested in how long it would take for the ADF copy activity to insert all the data (50 million rows) in the fact table. Granted, it’s a limited test but enough to rule out the technology for real-life projects. Then, I compared the performance against Azure SQL Database Serverless running on up to 2 cores and provisioned by the free trial offer that Microsoft has on Azure. To exclude impact on data transfer between regions, both technologies were provisioned on East US 2 data region, which is the region where my Power BI tenant is hosted on.

Then we have report load time:

What a better way to spend a lazy holiday afternoon than to do more Fabric performance testing? In my previous post, I shared my results from a single-threaded ETL load test to gauge the F2 ingest performance and F2 did pretty well (or at least outperformed Azure SQL DB). Will F2 hold as parallelism increases? Throughput testing is especially important for report loads because parallel tasks can run within a report, such as visuals executing DAX queries in parallel, and across reports, such as when concurrent report requests overlap.

I’m legitimately surprised at the results. I expected F2 to be barely sufficient for testing purposes. Read both posts to see how it performs and some caveats around performance.

Comments closed

Continuing the Advent of Fabric

Tomaz Kastrun has me playing catch-up. First up, monitoring workspaces in Fabric:

The easy way to check, view and track your activities and execution and runs of notebooks, data pipelines, data factory executions, datasets refresh, and many others.

Next, third-party applications:

Apps are collections of dashboards and reports in one easy-to-find place. Go to Apps and click on “Get Apps”.

And finally, the admin portal:

Admin portal serves purpose for governing and setting the Microsoft Fabric, where you can make  tenant settings, also access the Microsoft 365 admin portal, and control how users interact with Microsoft Fabric.

To access the admin portal, not only you need a Fabric license but also admin rights with the following roles (in one of these roles; if you are not, you can only see Capacity setting in the admin portal):

  • Global administrator
  • Power Platform administrator
  • Fabric administrator

If you’re just getting started with Microsoft Fabric, you could do a lot worse than going through Tomaz’s series.

Comments closed

Building Nested Data Types in Excel

Chris Webb shows off a feature:

A year ago support for nested data types in Excel was announced on the Excel blog, but the announcement didn’t have much detail about what nested data types are and the docs are quite vague too. I was recently asked how to create a nested data type and while it turns out to be quite easy, I thought it would be good to write a short post showing how to do it.

I came into this prepared to cringe, but it’s actually pretty cool insamuch as you’re using Excel as a UI for business work. And, let’s be honest, Excel is still the most common UI for business work out there.

Comments closed

Conditional Logic in Stored Procedures

Erik Darling has a warning:

There are two forms of conditional logic that I often have to fix in stored procedures:

  1. Branching to run different queries at different times
  2. Complicated join and where clause logic

The problems with both are similar in terms of performance. You see, when smart people tell you that SQL is a declarative language, and not a procedural language, they’re usually trying to get you to stop using cursors.

And that’s not always wrong or bad advice, trust me. But it also applies here.

Read on for exceptions to the rule and how you can make your life a bit easier if you do have this in place.

Comments closed

TRY: CAST, CONVERT, and PARSE

Andy Brownsword tries and tries and tries again:

In the previous post we looked at how ISNUMERIC and TRY_CAST work and why we may want to utilise the latter when building validation for our data. When SQL Server 2012 rolled around it wasn’t only TRY_CAST which was added, we also had TRY_CONVERT and TRY_PARSE introduced too.

Here we’re going to look at how those function and the differences in the outputs which they can provide. We’ll start with the sample data below as the basis for these demonstrations:

Andy focuses mostly on the behavioral differences, where I like TRY_PARSE() a lot, especially because you can control locale for dates and times. When it comes to performance, I’ve found TRY_CAST() and TRY_CONVERT() to be essentially the same performance-wise. Maybe there’s a tiny difference between the two but there’s no guarantee of it.

TRY_PARSE(), however, calls into the .NET CLI for each execution and will be considerably slower. If you’re parsing a few or a few dozen values, you probably won’t notice the difference. If you’re parsing tens of thousands of values (or more), I assure you that you’ll notice the difference.

Comments closed