Press "Enter" to skip to content

Curated SQL Posts

Parquet File Performance in Power Query

Chris Webb troubleshoots a performance issue:

There has been a lot of excitement around the newly-added support for reading from Parquet files in Power BI. However I have to admit that I was disappointed not to see any big improvements in performance when reading data from Parquet compared to reading data from CSV (for example, see here) when I first started testing it. So, is Power Query able to take advantage of Parquet’s columnar storage when reading data?

The answer is yes, but you may need to make some changes to your Power Query queries to ensure you get the best possible performance. Using the same data that I have been using in my recent series of posts on importing data from ADLSgen2, I took a single 10.1MB Parquet file and downloaded it to my PC.

It seem like an area of future growth for Power Query, but Chris does show how to eke out some gains right now.

Comments closed

Actions with Edge and Node Tables in SQL Server

Louis Davidson is a man of action:

One of the interesting things about working with many-to-many relationships in SQL Server with graph tables instead of a relational table is that unlike a relational many-to-many table, by default an edge may can implement relationships from lots of different tables (nodes). You can also limit what nodes can be related using which edges.

For example, say you have 4 nodes and 2 edges, both of the edges, by default, each edge would allow relationships from each node to itself, or each node to each other node. It can all get a bit complicated to figure out if you have a lot of objects (and to be fair, you probably also want to be able to check to make sure your objects are configured as you expect.

In this blog, I will demonstrate how to determine, given a given edge or node, what operations are possible. 

Click through to learn more.

Comments closed

Building a Friendship Lamp

Drew Furgiuele is looking for mood lighting tips:

It did get me thinking, though: what if I could take this idea and change it up a bit to where people could send me messages WITHOUT the need for them to have a lamp (and thereby give them plausible deniability of being, in fact, my friend). How would that work? In absence of a lamp, would a web application work? And what if we could let people pick a color in lieu of an actual message? You could send a whole mood!

And just like that, my motivation was restored. Time to get to work.

Click through for the build process, which includes 3D printing components, wiring and soldering to circuit boards, writing software for the IoT device, building the front-end web app, and more. Also, I sent red but now I’m not sure if I regret that color choice based on re-reading the first paragraph above.

Comments closed

Restoring a Database Formerly in an Availability Group

Jack Vamvas has a process for us:

Steps to restore a database from a backup device that was part of an Always On Availability Group, and now needs to be restored 

Recovery Scenario : Requesting an older database copy previously backed up 

Name of Always on Availability Group = MyAG1

Name of Always On Availability Group db = MyAGDB1

Note: this is a workflow – and there may be some slight variations depending your Availability Group set up 

Read on for rest of the workflow.

Comments closed

Disabling the Powershell Update Nag

Constantine Kokkinos hides an annoyance:

To be clear – I think you should be updating your PowerShell regularly, however the HUGE WHITE BLOCK ACROSS MY ENTIRE SCREEN EVERY TIME I LAUNCH VISUAL STUDIO CODE ISN’T GREAT.

Hated that caps? Yeah, that’s basically my eyes every time I see this nag window inverting the colors across my ennntiiirrreee screen.

Read on for the one-liner which gets rid of this message.

Comments closed

An Important Vocabulary Lesson

Taiob Ali shares some commonly mistaken terms:

A list of items that people often get wrong. If you have a suggestion, let me know, and I will add them to this page.

On-premises and braces are the two I hear people get wrong the most. I had been afraid that the recognition of “braces” as the proper term had been so lost that I was like a monk in 9th century Ireland transcribing illuminations of worn tomes here.

Comments closed

Kafka Replication with MIrrorMaker

Paul Brebner starts a new series:

In this new two part blog series we’ll turn our gaze to the newest version of MirrorMaker 2 (MM2), the Apache Kafka cross-cluster mirroring, or replication, technology. MirrorMaker 2 is built on top of the Kafka Connect framework for increased reliability and scalability, and is suitable for more demanding geo-replication use cases including migration, backup, disaster recovery and fail-over.  In part one we’ll focus on MirrorMaker 2 theory (Kafka replication, architecture, components and terminology) and invent some MirrorMaker 2 rules. Part two will be more practical, and we’ll try out Instaclustr’s managed MirrorMaker 2 service and test the rules out with some experiments. 

Go check out part 1.

Comments closed

Spark on Windows Subsystem for Linux 2

Gavin Campbell tries out Spark on Linux on Windows:

I’m not a frequent user of Windows, but I understand getting dependencies installed for local development can sometimes be a bit of a pain. I’m using an Azure VM, but these instructions should work on a regular Windows 10 installation. Since I’m not a “Windows Insider”, I followed the manual steps here to get WSL installed, then upgrade to WSL2. The steps are reproduced here for convenience:

Click through for the installation steps and the process.

Comments closed

Loading Azure Synapse Analytics using PolyBase

Gauri Mahajan needs to load some data:

Azure Synapse Analytics is Microsoft’s data warehousing offering on Azure Cloud. It supports three types of runtimes – SQL Serverless Pool, SQL Dedicated Pool, and Spark Pools. As there are a variety of data sources on Azure, it’s very obvious that there can be varying types and volumes of data that would have to be loaded into Azure Synapse pools. There are three major types of data ingestion approaches that can be used to load data into Synapse. The COPY command is the most flexible and elaborate mechanism, where someone can execute this command from a SQL pool to load data from supported data repositories. This command is convenient to load ad-hoc and small to medium-sized data loads into Synapse. The second method of loading data is the Bulk Insert, where the method name is self-relevant regarding the approach functionality. To ingest the data from supported repositories into dedicated SQL pools, PolyBase is as efficient and at times it’s even more efficient than the COPY command. This article will help you understand the process to ingest data into Azure Synapse Analytics using PolyBase to load the data.

Click through for the process.

Comments closed

Calculating Pagination Metadata in SQL

Lukas Eder has a single query which includes pagination data:

This is rather straightforward. It will give us page 2 out of N pages, with a page size of 10. But how do we calculate these values? How do we know we’re on page 2? How do we know the number of pages N? Can we calculate this without an extra round-trip e.g. to calculate the total number of actors:

-- Yuck, a second round-trip!
SELECT COUNT(*)
FROM actor

We can do it with a single SQL query and window functions, but before I explain how to do this, please consider reading this article on why OFFSET pagination is a bad thing for your performance

If you’re still convinced OFFSET pagination is what you need, as opposed to keyset pagination, let’s look at how to calculate the above meta data with SQL.

Click through for the query, as well as Lukas’s explanation of how it works. But also heed that warning about keyset pagination, as it’s usually a lot better.

Comments closed