Press "Enter" to skip to content

Author: Kevin Feasel

Advent of Code Day 4

Kevin Wilkie continues a Quixotic quest. Part 1 sets up the story:

Now we get to work with a set of numbers that our elf played and a set of winning numbers for each card. This sounds like something that SQL is meant for!

As always, we load the fun input data provided by the AoC group into our database. I actually loaded it in as 2 columns – Card and String. I’m just not a fan of throwing all of our data into one column of a table and letting it all get sorted out later.

Part 2 wraps it up:

With part 1, we just had to figure out how many times each of the winners showed up. With part 2 though, we have to jump through several hoops since we now have to determine how many cards we will end up with if we win the next series of cards for every match.

I slimmed down the table that I’m working with to only 2 columns – yes, in the real world, I would have used a view, but today was not that day.

By the way, as soon as I saw OverlyLongNamesThatNoOneCanTypeWithoutUsingIntellisense I wondered when Kevin got access to my code base. I have, on a few occasions, created punishment names, names so long that they punished the people who had to type or track them. In fairness, it wasn’t just a fit of pique, though pique was a common factor in all of those situations.

Comments closed

Batch Changes in T-SQL

Erik Darling doesn’t update all the rows at once like some barbarian:

The goal of batching modifications is to accomplish large numbers of row modifications with exacerbating locking problems, and being kinder to your server’s transaction log.

There is generally an appreciable difference in transaction time in modifying 1,000 rows and modifying 10,000,000 rows. Go try it, if you don’t believe me.

Click through for more information on good ways to set up batching, including the use of the highly underrated OUTPUT ... INTO ... clause on DML statements.

Comments closed

Fun with WAITFOR

Aaron Bertrand plays red light, green light:

WAITFOR is a very useful command to prevent further operations from occurring until a certain amount of time has passed (WAITFOR DELAY) or until the clock strikes a specific time (WAITFOR TIME). These commands are covered quite well (including pros and cons) in a previous tip, “SQL WAITFOR Command to Delay SQL Code Execution.”

WAITFOR DELAY is also commonly used to “let SQL Server breathe.” For example, when performing some operations in a loop, on a resource-constrained system, an artificial delay can potentially allow for more concurrency, even if it makes the intensive operation take longer.

But these commands can also be used to synchronize multiple query windows for local testing and debugging, and I thought I would share a recent example.

Click through for some of the ways you can use WAITFOR in your scripts.

Comments closed

Creating Charts in Microsoft Fabric Notebooks using Vega

Phil Seamark tries out Vega in a Microsoft Fabric notebook:

I recently needed to generate a quick visual inside a Microsoft Fabric notebook. After a little internet searching, I found there are many good quality charting libraries in Python, however it was going to take too long to figure out how to create a very specific type of chart.

This is where Vega came to the rescue. The purpose of this short article is to share a very simple implementation of generating a Vega chart using a Microsoft Fabric notebook.

Click through for the example code.

Comments closed

Backup and Recovery Options for Relational Databases

Adron Hall keeps a copy:

In the realm of data management, ensuring the safety and recoverability of data is paramount. Relational databases, being at the core of many business operations, require robust backup and recovery strategies. This article delves into the general concepts of backup and recovery in relational databases, followed by specific strategies for SQL Server, Oracle, MariaDB/MySQL, and PostgreSQL.

Click through for a high-level overview of general database backup concepts and a variety of options available in the major relational database platforms.

Comments closed

Analyzing Shiny App Startup Times

Osheen MacOscar wants to know how long it takes to start up that Shiny app:

In the last blog I spoke about using Google Lighthouse to test the speed of web pages. I wanted to build upon that and use Lighthouse to test some Shiny apps.

To get a feel for Shiny’s performance in a Lighthouse analysis, I needed a lot of shiny apps that I could test and create a dataset from, so I used the entries to the 2021 Shiny app contest, which is a competition where people enter Shiny apps to be judged on technical merit and artistic achievement. I used the 2021 apps as there has unfortunately not been a competition since. A full list of the submissions can be found on the Posit Community website.

Read on to see what you can do with Lighthouse, as well as a few pain points around it.

Comments closed

Testing the Reproducibility of Random Numbers in STAN

Sebastian Sauer performs some tests:

Bayes models (using MCMC) build on drawing random numbers. By their very nature, random numbers are random. Unless they are not. As you may know, the random number fuctions in computers are purely deterministic.

However, in practice, some inpredictable behavior may still show up. The reason being simply that two computational environment must – in theory – being exactly identical in order to reproduce the same results. At least identical in every bit that influence random number generation.

Click through for the testbed and code.

Comments closed

Troubleshooting an Azure ML Deployment Locally

I have a new video:

In this video, I take us through the process of creating a local deployment of an Azure ML managed endpoint. We will cover requirements, why you might want to do this, and common problems you may run into along the way.

This was a fun video to make, especially in anticipating the sorts of problems that come up along the way. I won’t pretend that it’s comprehensive but it does hit several of the most common problems I see (or cause).

Comments closed

Continuing the Advent of Fabric

Tomaz Kastrun has been busy. On day 9, we build a custom environment:

Microsoft Fabric provides you with the capability to create a new environment, where you can select different Spark runtimes, configure your compute resources, and create a list of Python libraries (public or custom; from Conda or PyPI) to be installed. Custom environments behave the same way as any other environment and can be used and attached to your notebook or used on a workspace. Custom environments can also be attached to Spark job definitions.

On day 10, we have Spark job definitions:

An Apache Spark job definition is a single computational action, that is normally scheduled and triggered. In Microsoft Fabric (same as in Synapse), you could submit batch/streaming jobs to Spark clusters.

By uploading a binary file, or libraries in any of the languages (Java / Scala, R, Python), you can run any kind of logic (transformation, cleaning, ingest, ingress, …) to the data that is hosted and server to your lakehouse.

Day 11 introduces us to data science in Fabric:

We have looked into creating the lakehouse, checked the delta lake and delta tables, got some data into the lakehouse, and created a custom environment and Spark job definition. And now we need to see, how to start working with the data.

Day 12 builds an experiment:

We have started working with the data and now, we would like to create and submit the experiment. In this case, MLFlow will be used here.

Create a new experiment and give it a name. I have named my “Advent2023_Experiment_v3”.

Click through to catch up with Tomaz.

Comments closed