Press "Enter" to skip to content

Curated SQL Posts

Generating Nested Time Series Models

Steven Sanderson can’t stop at just one time series:

There are many approaches to modeling time series data in R. One of the types of data that we might come across is a nested time series. This means the data is grouped simply by one or more keys. There are many methods in which to accomplish this task. This will be a quick post, but if you want a longer more detailed and quite frankly well written out one, then this is a really good article

The quick post doesn’t include a lot of commentary but does show the code you’d use for the operation.

Comments closed

Brief Code File Analysis with Python

Matt Eland reviews the code:

Last year I devised some ways of analyzing the history and structure of code in a visual way, but I didn’t document much of that to share with the community. This article explores the process I used to build a CSV file containing the path, extension, project, and line of code count in all source files in a project.

Click through for the Python code and an explanation of what it’s doing.

Comments closed

The Most Common SQL Server Engine Errors Generating Support Tickets

Joseph Pilov collects a list:

About 6 months ago we decided to look at what SQL Server engine error messages are most commonly generating support cases to Microsoft. The end goal was to update the documentation for those error messages to allow our customers to find answers for themselves before they have to call Microsoft for technical assistance.

The task, as we suspected from previous experience, was not easy because we had to mine cases for error numbers and a relatively small number of support cases get reported with error messages when they are opened. Please report full error messages when you open support cases with Microsoft – it would help us get you answers faster. Still, we were able to find the trends even from the small percentages because were looking for the top 20 or so most common errors, based on case count, and we needed relative information – which error is reported more than another.

Click through for the list. A benefit from going through this exercise is that Microsoft has provided more information on each of those error IDs, hopefully making it easier for people to diagnose and resolve problems without needing to reach out to support.

Comments closed

Peeking into Azure SQL DB via Extended Events

Grant Fritchey observes the observers:

Last week I posted the results from using Extended Events to snoop on what happens inside an AWS RDS database. This week, I’m taking a look at what happens on Azure SQL Database. I’m using the same toolset again, if for no other reason that I’m consistent in my approach. So it’s basically just rpc_completed & sql_batch_completed on the database in question. Let’s check out the results.

Here’s the prior post, in case you missed it like I did.

Comments closed

Options to Export Power BI to Tables

Gilbert Quevauvilliers counts the ways:

I was recently helping out a customer and they contacted me asking why was the export option not in the format that they expected.

I had a look and now because there are so many options to export data, each one exports the data differently.

My goal for this blog post is to show you what each export type looks like, so when a user is exporting data, they can export in the format they expect.

It turns out that there are several such ways and Gilbert describes each.

Comments closed

Speeding Up Queries via IF EXISTS

Chad Callihan doesn’t need to wait for the query’s end credits sequence:

When checking for the existence of a value or values in a table, typically, the whole table does not need to be read. The goal is to obtain more of a true or false answer, whether a criteria is met or not. If the criteria is met with the first few records then there’s no need to read on. Nonetheless, you may come across scripts written to unnecessarily retrieve the complete count of a value in a table. Let’s compare the performance difference between using COUNT(*) and using “IF EXISTS” when checking a table for values.

One’s going to give you a full scan and the other will give you a semi-join. Read on to see what the practical effect of this is.

Comments closed

Fun with Implicit Conversions to DateTime

Andrea Allred gets tested:

I have been teaching a T-SQL 101 class and for the homework, we asked the students to get all the records where our heroes had a birthdate between 1995 through 1999. I expected something like this:

[…]

Imagine my surprise when one of the students turned in this:

SELECT FirstName, LastName, Birthdate
FROM Heroes
WHERE Birthdate BETWEEN '1995' AND '1999'


When I first saw the query I thought, “There is no way they ran that and it worked.” So I wrote it up and ran it on my data. Guess what? IT RUNS AND RETURNS DATA! I was shocked.

Click through to see what it returns and how that’s not quite right.

Comments closed

An Intro to R for the Excel User

Amieroh Abrahams explains some of the benefits of R:

The era of data manipulation and analysis using programming languages has arrived. But it can be tough to find the time and the right resources to fully switch over from more manual, time-consuming solutions, such as Excel. In this blog we will show a comparison between Excel and R to get you started!

When choosing between R and Excel, it is important to understand how both solutions can get you the results you need. However, one can make it an easy, reputable, convenient process, whereas the other can make it an extremely frustrating, time-consuming process prone to human errors.

I like this post as a way of showing current Excel users how R can perform a variety of tasks programmatically which they might do manually, though the it probably beats up on Excel too much. There’s a good reason why Excel is the single most important business tool out there and people who are deep into Excel can always break out DAX or M to perform operations.

Comments closed

Combining On-Demand and Spot VMs in AKS

Prakash P covers a topic near and dear to my heart—saving money by using spot instances:

While it’s possible to run the Kubernetes nodes either in on-demand or spot node pools separately, we can optimize the application cost without compromising the reliability by placing the pods unevenly on spot and OnDemand VMs using the topology spread constraints. With baseline amount of pods deployed in OnDemand node pool offering reliability, we can scale on spot node pool based on the load at a lower cost.

I like this idea a lot, as spot instances trade off saving a lot of money (up to 90%) for unreliability: you lose the spot instance as soon as someone else comes in willing to pay more. This gives you the best of both worlds with AKS: emphasize spot instances for the money savings but include the ability to use on-demand pricing for VMs when spot isn’t available. If I’m understanding the post correctly, this also reduces the downside risk of service instability that you get when spot instances are bought out from under you, as Kubernetes will automatically spin up and down services within a pod to keep a consistent number of instances available across the nodes to users.

Comments closed

PolyBase, JRE7, and TLS Support

Nathan Schoenack explains an error:

At end of October 2022 we saw an issue where a customer using PolyBase external query to Azure Storage started seeing queries fail with the following error:

Msg 7320, Level 16, State 110, Line 2

Cannot execute the query “Remote Query” against OLE DB provider “SQLNCLI11” for linked server “(null)”. EXTERNAL TABLE access failed due to internal error: ‘Java exception raised on call to HdfsBridge_IsDirExist: Error [com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: ]occurred while accessing external file.’

Prior to this, everything was working fine; the customer made no changes to SQL Server or Azure Storage.

I guess it doesn’t matter so much unless you’re interested in getting support, but Java SE 7 is no longer supported. Java SE 8 is still in support and JRE 8 remains the best version for PolyBase integration in my experience.

Comments closed