Press "Enter" to skip to content

Author: Kevin Feasel

Scripting Drop Statements for Redundant Indexes

Eitan Blumin deals with a clone problem:

This article published by Brent Ozar is very informative about redundant/duplicate indexes, what they mean, why they’re bad, and what should be done with them.

Also, a few years ago, Guy Glantser published a post about dropping redundant indexes. It’s very useful for finding all redundant indexes within all tables in a specific database.

But what both of these articles are missing – is the ability to easily generate Drop/Disable commands for these redundant indexes.

Additionally, what if there are “similar” indexes that are only “partially” redundant, and therefore it’s not enough to simply drop one of them? Otherwise, some queries may suffer a negative performance impact.

Click through for the article and be sure to pay close attention to the important note, which I’ll summarize: “kind of redundant” doesn’t always mean redundant.

Comments closed

Route Planning in Postgres

Mark Litwintschik plans a journey:

I recently came across a transit route feed aggregator called Transitland. They list feeds from 2,500 operators in 55+ countries around the world. Among these feeds is one for FlixBus, a 12-year-old coach service provider. Below is a route map of their European destinations.

In this post, I’ll import their feed into PostgreSQL, build visualisations of their routes and plan a bus trip from Vienna to Oslo.

Read on for the process.

Comments closed

Diagnosing a Resource Semaphore Wait Issue

Jose Manuel Jurado Diaz finds excessive resource semaphore waits:

Today, we got a service request that our customer reported that they query are taking too much for their execution. The main wait stats found was RESOURCE_SEMAPHORE and I would like to share with you my lessons learned here. 

We executed this query to find out the queries and check the resource semaphore wait type. 

Click through for the queries and diagnosis.

Comments closed

Thoughts on the New Power BI Accessible Themes

Meagan Longoria is moderately pleased:

Everyone’s vision is a little different. It is rare (impossible?) that a color theme is accessible for everyone. For instance, while many people with color vision deficiency have trouble distinguishing red and green hues, others have trouble distinguishing blue hues. So when we optimize to accommodate one condition, we may make things more difficult for another condition. This happened with the change in accent color in Power BI Desktop from yellow to teal. Changing to teal increased color contrast, which was great for people with low vision, but it caused new issues for some people with color vision deficiency.

While I am very happy to see these new color themes, I hope everyone understands that they aren’t just generally accessible for all uses. As mentioned in the blog post, they specifically have better color contrast to achieve a contrast of at least 3:1, which is the contrast recommended by WCAG for non-text content.

Read the whole thing. There’s a delicate balancing act between having a complete color scheme and satisfying a variety of needs. It sounds like this theme doesn’t quite cut it, though hopefully there will be some improvements in the future.

Comments closed

Performing a Detailed Code Review of a Stored Procedure

Aaron Bertrand has 99 problems and this stored procedure is 40 of them:

I’ve been at this for a while now, and have a very particular set of rules and coding conventions that I follow when writing and, more importantly, reviewing T-SQL code. I often perform code reviews and thought it would be fun to frame this exercise: a completely fictitious stored procedure hits my desk, I’ll reject it of course, and enumerate the reasons why. The procedure might pass review in a lot of shops, but not in mine.

Click through and give it a try. Aaron has outdone himself with this and I got angry with him about 2/3 of the way through the procedure. That’s how you know it’s a good example.

Comments closed

Calculating Log Likelihood Ratios with jeva

Peter M.B. Cahusac takes us through a jamovi package:

Ever wanted to try doing an evidential analysis? You may have found it difficult to find a statistical platform to do it. Now there is the jamovi module jeva which can provide log likelihood ratios for a range of common statistical tests.

Imagine for a moment that we wish to carry out a statistical test on our sample of data. We do not want to know whether the procedure we routinely use gives us the correct answer with a specified error rate (such as the Type I error) – the frequentist approach. Nor do we want to concern ourselves with possible a priori probabilities of hypotheses being true – the Bayesian approach. We need to know whether a statistic from this particular set of data is consistent with one or more hypothetical values. Also, let’s say that we weren’t happy with how much data we had collected (a familiar problem?), and just added more when convenient. Welcome to the likelihood (or evidential) approach!

Read on for an explanation and how to try jeva out.

Comments closed

Calibrating and Plotting a Time Series with healthyR.ts

Steven Sanderson builds a plot:

In time series analysis, it is common to split the data into training and testing sets to evaluate the accuracy of a model. However, it is important to ensure that the model is calibrated on the training set before evaluating its performance on the testing set. The {healthyR.ts} library provides a function called calibrate_and_plot() that simplifies this process.

Click through for the function’s input parameters and an example of how to use it.

Comments closed

Tips for Kafka Streams Developers

Ludovic Dehon shares some advice:

We built Kestra, an open-source data orchestration and scheduling platform, and we decided to use Kafka as the central datastore to build a scalable architecture. We rely heavily on Kafka Streams for most of our services (the executor and the scheduler) and have made some assumptions on how it handles the workload.

However, Kafka has some restrictions since it is not a database, so we need to deal with the constraints and adapt the code to make it work with Kafka. We will cover topics, such as using the same Kafka topic for source and destination, and creating a custom joiner for Kafka Streams, to ensure high throughput and low latency while adapting to the constraints of Kafka and making it work with Kestra.

Click through for several tips.

Comments closed

Synapse and Azure ML Pipelines

Santosh Thomas integrates two Azure products:

As more customers standardize on the Synapse data platform, enabling machine learning workflows through Azure Machine Learning (Azure ML) becomes particularly interesting. This is especially true as more customers look to bring their data engineering and data science practices together and mature capabilities on both sides.

The goal of this blog post is to highlight how Synapse and Azure ML can work well together to deliver key insights. This is motivated by a scenario where a customer modernized their data platform on Azure Synapse but was looking to improve their data science practices through Azure ML. The focus of this blog is to expose existing functionality, and it is not a “hardened” solution with security or other cloud best practice implementations. The workflow steps also assume some level of comfort with Python and working with the Azure Python SDKs.

There was a time in which Microsoft wanted us to remain in Synapse for machine learning tasks, but that time is gone: the emphasis is definitely to do machine learning tasks in Azure ML, regardless of where the data lives…unless there’s a Spark job involved, in which case things get all weird again.

Comments closed