Press "Enter" to skip to content

Month: February 2023

An Intro to R for the Excel User

Amieroh Abrahams explains some of the benefits of R:

The era of data manipulation and analysis using programming languages has arrived. But it can be tough to find the time and the right resources to fully switch over from more manual, time-consuming solutions, such as Excel. In this blog we will show a comparison between Excel and R to get you started!

When choosing between R and Excel, it is important to understand how both solutions can get you the results you need. However, one can make it an easy, reputable, convenient process, whereas the other can make it an extremely frustrating, time-consuming process prone to human errors.

I like this post as a way of showing current Excel users how R can perform a variety of tasks programmatically which they might do manually, though the it probably beats up on Excel too much. There’s a good reason why Excel is the single most important business tool out there and people who are deep into Excel can always break out DAX or M to perform operations.

Comments closed

Combining On-Demand and Spot VMs in AKS

Prakash P covers a topic near and dear to my heart—saving money by using spot instances:

While it’s possible to run the Kubernetes nodes either in on-demand or spot node pools separately, we can optimize the application cost without compromising the reliability by placing the pods unevenly on spot and OnDemand VMs using the topology spread constraints. With baseline amount of pods deployed in OnDemand node pool offering reliability, we can scale on spot node pool based on the load at a lower cost.

I like this idea a lot, as spot instances trade off saving a lot of money (up to 90%) for unreliability: you lose the spot instance as soon as someone else comes in willing to pay more. This gives you the best of both worlds with AKS: emphasize spot instances for the money savings but include the ability to use on-demand pricing for VMs when spot isn’t available. If I’m understanding the post correctly, this also reduces the downside risk of service instability that you get when spot instances are bought out from under you, as Kubernetes will automatically spin up and down services within a pod to keep a consistent number of instances available across the nodes to users.

Comments closed

PolyBase, JRE7, and TLS Support

Nathan Schoenack explains an error:

At end of October 2022 we saw an issue where a customer using PolyBase external query to Azure Storage started seeing queries fail with the following error:

Msg 7320, Level 16, State 110, Line 2

Cannot execute the query “Remote Query” against OLE DB provider “SQLNCLI11” for linked server “(null)”. EXTERNAL TABLE access failed due to internal error: ‘Java exception raised on call to HdfsBridge_IsDirExist: Error [com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: ]occurred while accessing external file.’

Prior to this, everything was working fine; the customer made no changes to SQL Server or Azure Storage.

I guess it doesn’t matter so much unless you’re interested in getting support, but Java SE 7 is no longer supported. Java SE 8 is still in support and JRE 8 remains the best version for PolyBase integration in my experience.

Comments closed

Maximum Number of Actions on an Extended Event

Jonathan Kehayias hits the limit:

Did you know that there is a limit to the number of actions you can add to a single event in Extended Events? While I was playing around with Trace Flag 9708 for my previous blog post, one of the things I wanted to try to see was whether it would be easy to determine the impact of adding too many actions to a single event. I picked a single event, in this case wait_info, and I checked all of the actions for the event in the UI, and tried to create the event session. I was surprised when I got a validation error back so I clicked on the details link and got the following:

Session validation for the alter operation failed. (Microsoft.SqlServer.Management.XEvent)

Read on for more details and to see the limits. Applying Swart’s 10% Rule to this, you’d say 3 actions for any given event, which is probably a little bit low but in the neighborhood of what I’d call reasonable for a given XE.

Comments closed

Rolling Your Own Serverless SQL Pool Database Project

Kevin Chant doesn’t let the lack of support for a product limit him:

In this post I want to share how I created a homemade serverless SQL Pool database project.

Because I know people are keen to work this way right now. Mostly due to the comments I received when I covered how to deploy a dacpac to a serverless SQL pool.

By the end of this post you will know how I created a database project for it. Plus, how you can deploy the contents of the database project with Azure DevOps. I also share plenty of links along the way.

Though Kevin did run into some challenges trying to hack in a solution, so it’s not quite as useful as you’d first hope.

Comments closed

Removing Diacritics with Power Query

Chris Webb gets rid of them scribbles what you sometimes find on perfectly good letters:

…then the output is “un garon trs g Nol”. As you can see, removing all the characters leads to unreadable text. Instead, what you have to do is find all the letters with diacritics (accents and other glyphs that can be added to characters) and remove the diacritics. Doing this may be ungrammatical and make it harder to understand the meaning of the text but in most cases the text will still be readable.

The bad news is that there is no straightforward way to do this in Power Query, and indeed there is no straightforward way to do this at all because there are no hard-and-fast rules about what to replace a letter with a diacritic with: should “ö” become “o” or “oe” for example? My first thought was to create a big lookup table with all the rules of what to replace each character with in, similar to the approach taken here for solving this problem in Google Sheets. Building a comprehensive lookup table would be gigantic task though.

Chris does take a different approach, though do read the comments because there are scenarios in which a simple removal of the diacritic can lead to a not-so-subtle alteration of the phrase.

Comments closed

Route Planning in Postgres

Mark Litwintschik plans a journey:

I recently came across a transit route feed aggregator called Transitland. They list feeds from 2,500 operators in 55+ countries around the world. Among these feeds is one for FlixBus, a 12-year-old coach service provider. Below is a route map of their European destinations.

In this post, I’ll import their feed into PostgreSQL, build visualisations of their routes and plan a bus trip from Vienna to Oslo.

Read on for the process.

Comments closed

Scripting Drop Statements for Redundant Indexes

Eitan Blumin deals with a clone problem:

This article published by Brent Ozar is very informative about redundant/duplicate indexes, what they mean, why they’re bad, and what should be done with them.

Also, a few years ago, Guy Glantser published a post about dropping redundant indexes. It’s very useful for finding all redundant indexes within all tables in a specific database.

But what both of these articles are missing – is the ability to easily generate Drop/Disable commands for these redundant indexes.

Additionally, what if there are “similar” indexes that are only “partially” redundant, and therefore it’s not enough to simply drop one of them? Otherwise, some queries may suffer a negative performance impact.

Click through for the article and be sure to pay close attention to the important note, which I’ll summarize: “kind of redundant” doesn’t always mean redundant.

Comments closed

Diagnosing a Resource Semaphore Wait Issue

Jose Manuel Jurado Diaz finds excessive resource semaphore waits:

Today, we got a service request that our customer reported that they query are taking too much for their execution. The main wait stats found was RESOURCE_SEMAPHORE and I would like to share with you my lessons learned here. 

We executed this query to find out the queries and check the resource semaphore wait type. 

Click through for the queries and diagnosis.

Comments closed