Press "Enter" to skip to content

Curated SQL Posts

The Library of Congress Control Number (LCCN)

Robert Cain continues a series on book archival:

This is part of my ongoing series on my ArcaneBooks project. The goal is to provide a module to retrieve book data via provided web APIs. In the SEE ALSO section later in this post I’ll provide links to previous posts which cover the background of the project, as well as how to use the OpenLibrary APIs to get data based on the ISBN.

In this post I will provide an overview of using the Library of Congress API to get data based on the LCCN, short for Library of Congress Control Number.

This has been an interesting series to watch, as it’s a practical application of non-work use of a series of practical development skills.

Comments closed

Putting tempdb on an Azure VM Temp Disk

Daniel Hutmacher uses a temp disk for a temp database:

Almost all Azure virtual machine sizes come with a temporary disk. The temporary disk is a locally attached SSD drive that comes with a couple of desirable features if you’re installing a SQL Server on your VM:

  • Because it is locally attached, it has lower latency than regular disks.
  • IO and storage are not billed like regular storage.

As the name implies, the temporary disk is not persistent, meaning that it will be wiped if you shut down your VM or if the VM moves to another VM host (as part of maintenance or troubleshooting). For that reason, we never want to put anything on the temporary disk that we need to keep.

I’d say this was a lot more popular several years ago, back when spinning disk was the default for Azure storage. There can still be benefits from doing this, though if you’re using Premium storage with high IOPS, the biggest remaining benefit is around latency.

Comments closed

Data-Level Security in Power BI

Reza Rad explains different ways to secure data in Power BI:

Power BI supports the security of the data at the dataset level. This security means everyone can see the data they are authorized to see. There are different levels of that in Power BI, including Row-Level Security, Column-Level Security, and Object-Level Security. All these help Power BI Developers create one dataset but give users different views of the data from the same report. In this article, I’ll explain each of those methods and give some guidance on how to use them.

This serves as the opener to a series of articles on Power BI data security.

Comments closed

T-SQL and Fun Puzzles

Rob Farley puzzles it out:

Back in my uni days I remember a Prolog assignment to solve “each letter represents a number” puzzles, and my solution being slow. Years later I tried it again and it worked out just fine, but by then the due date was in the past and they weren’t prepared to change my grade.

While these kinds of things can be fun (more so when there aren’t uni grades dependent on the solution), there are also times that it can be fun to rewrite some code in a way that is more intuitive, or that feels clever in a profoundly simple way.

Rob shares links to a few examples along those lines.

Comments closed

Fixing the Parallelism Documentation

Erik Darling shreds the docs:

The section with the weirdest errors and omissions is right up at the top. I’m going to post a screenshot of it, because I don’t want the text to appear here in a searchable format.

That might lead people not reading thoroughly to think that I condone any of it, when I don’t.

Erik pulls no punches on this post. Hopefully the end result is that this part of the documentation improves.

Comments closed

Changes to the IaaS Agent for SQL Server on Azure VMs

Aditya Badramraju has an announcement:

SQL Server on Azure Virtual Machines is powered by the SQL IaaS Agent extension which provides many features that make managing your SQL Server easy. This blog will discuss new features and changes we’ve recently released in this extension. 

Click through for those changes. I was prepared, upon seeing the “Retiring Modes” section, to have a cynical response about forcing everyone into what was effectively Full mode, but that proto-take ended up being way off base and the truth is much nicer.

Comments closed

Reading Multi-Sheet Excel Files in R

Steven Sanderson does a bit of Excel file reading:

Reading in an Excel file with multiple sheets can be a daunting task, especially for users who are not familiar with the process. In this blog post, we will walk through a sample function that can be used to read in an Excel file with multiple sheets using the R programming language.

Click through for the process, which makes use of the lapply() function and the readxl package.

Comments closed

An Overview of the Kappa Architecture

Amian Patnaik provides an overview:

The Kappa Architecture, introduced by Jay Kreps, co-founder of Confluent, is designed to handle real-time data processing in a scalable and efficient manner. Unlike the traditional Lambda Architecture, which separates data processing into batch and stream processing, the Kappa Architecture promotes a single pipeline for both batch and stream processing, eliminating the need for maintaining separate processing pipelines.

What’s interesting to me is that Lambda, an architecture which was an explicit product of its time (in the sense that it was a compromise architecture trying to do two things, the combination of which limited hardware and tooling didn’t allow), is still thriving today. Kappa, meanwhile, isn’t an architectural style that people throw around a lot anymore, at least in the circles I run around in.

Comments closed

Spark ELT in Synapse Notebooks

Liliam Leme performs some data movement:

I often receive various requests from customers while working on FastTrack projects, and I have compiled some examples to help you build your solution on top of a data lake using useful tips. Most of the examples in this post use pandas, and I hope they will be helpful for you as they were for me.

Please note that all examples in this post use pyspark.

In my scenario, I exported multiple tables from SQLDB to a folder using a notebook and ran the requests in parallel.

Read on for the examples and some of the things you can do with Spark notebooks in Azure Synapse Analytics.

Comments closed