Press "Enter" to skip to content

Curated SQL Posts

Perspective on Spinlocks

Erik Darling speaks with wisdom:

The more people want to avoid fixing what’s really wrong with their server, the more they go out and find all the weird stuff that they can blame on something else (usually the product), or keep doing the same things that aren’t fixing the problem.

Spinlocks are one of those things. People will measure them, stare them, Google them, and have no idea what to make of them, all while being sure there’s something going on with them.

I don’t want to discount when spinlocks can actually cause a problem, but you shouldn’t treat every performance problem like it’s a bridge too far from what you can solve.

I have seen performance problems which actually did come down to spinlock issues. For every one of those, I’ve seen, oh, about 95-100 or so which came down to inefficient code.

Comments closed

How It Works: Power BI Field Parameters Edition

Gilbert Quevauvilliers figures out how field parameters work:

In this blog post I want to give a visual representation as to how field parameters works and what the current limitations are.

It is important to be aware of the limitations so that you do not get caught out later or you are trying to figure out why it is not working.

I do hope my descriptions and pictures below help you understand how it works and when it does not work!

Click through for some detailed graphics and explanation.

Comments closed

Monitoring Streaming Queries in PySpark

Hyukjin Kwon, et al, lay out some monitoring advice:

Streaming is one of the most important data processing techniques for ingestion and analysis. It provides users and developers with low latency and real-time data processing capabilities for analytics and triggering actions. However, monitoring streaming data workloads is challenging because the data is continuously processed as it arrives. Because of this always-on nature of stream processing, it is harder to troubleshoot problems during development and production without real-time metrics, alerting and dashboarding.

Read on to see how you can use the Observable API for alerting in PySpark—previously, it had been a Scala-only API.

Comments closed

Projecting (Selecting) Results with KQL

Robert Cain continues a series on the Kusto Query Language:

So far in my Fun With KQL series, we have used the column tool, found on the right side of the output pane and described in my original post Fun With KQL – The Kusto Query Language, to arrange and reduce the number of columns in the output.

We can actually limit the number of columns, as well as set their order, right within our KQL query. To accomplish this we use the project operator.

Read on for several good uses of the project operator.

Comments closed

Distributed Transactions in T-SQL

Kevin Wilkie explains what distributed transactions are and why you probably don’t want to use them:

In the version of transactions that we going to discuss today, we’re going to discuss doing transactions on multiple servers!

A Distributed transaction is defined by HazelSet to be “a set of operations on data that is performed across two or more data repositories”. In even simpler terms, it’s a command run against data on more than one server.

Click through for the warnings about what might possibly go wrong.

Comments closed

Fun with Nested Loops

Jared Poche explains my favorite type of join:

Nested loops joins are the join operator you are likely to see the most often. It tends to operate best on smaller data sets, especially when the first of the two tables being joined has a small data set.

In row mode, the first table returns rows one at a time to the join operator. The join operator then performs a seek\scan against the second table for each row passed in from the first table. It searches that table based on the data provided by the first table, and the columns defined in our ON or WHERE clauses.

Read on for more information about nested loop joins.

Comments closed

Finding Duplicates in Type 2 SCDs

Dinesh Asanka wants to verify some Type 2 slowly changing dimension results:

As we discussed in a previous article, Implementing Slowly Changing Dimensions (SCDs) in Data Warehouses, there are three main types of slowly changing dimensions, such as Type 1, Type 2, and Type 3. Out of these Type 1 is the simple dimension where you will simply maintain only the latest version of the attribute. For example, if the employee got promoted to Senior Software Engineer from Software Engineer, you will simply overwrite the existing value to the new value so that the historical aspect is lost.

Type 2 Slowly Changing Dimensions are used to track historical data in a data warehouse. This is the most common approach in dimension. This article uses a sample database of AdventureworksDW which is the sample database for the data warehouse.

Click through for one way to compare, one which you could build using dynamic SQL.

Comments closed

When to Use a Map Visual

Mick Cisneros explains when to use map visuals:

That ubiquity has given all of us an increased familiarity with maps, as well as a deeper affinity for them. (Probably a dependence as well!) It’s natural, then, to want to use a map to visualize data that has a geographic dimension. Why not, right? There is an obvious upside: audiences are drawn to the way they look, as it’s a more memorable image than the same old bar chart or line graph. Not to mention: it’s fun to make maps!

The problem is that maps look interesting, but their very nature limits our options for visualizing data within them. Per a recent paper by Franconeri, Padilla, Shaw, et. al., here are a couple of the comparisons that people are very good at making, perceptually:

Read on for a comparison of good map versus bad map. Just because something has a geographical component doesn’t mean you should map it.

Comments closed