Press "Enter" to skip to content

Curated SQL Posts

Freshness Labels on Content

Steve Jones does some noodling:

I chose the title slightly to poke at Stack Overflow (SO), but the same take expressed in this tweet could be said about SQL Server Central. It’s not quite the same as anyone can answer questions on SQL Server Central.

The tweet is a (long) hot take from Jerry Nixon, a C# developer and MS evangelist in Denver. Essentially he says that a lot of the SO answers are wrong, especially as the software and languages change. Old answers are upvoted, and remain at the top of the list, even as newer answers might be better. People don’t like the behavior on SO of moderators and people who post, which is something we’ve tried to avoid or limit here at SQL Server Central. We want there to be professional discussions. SO also doesn’t allow much discussion or nuance in the questions or answers.

This isn’t just a SO problem or am SSC one. 

Read the whole thing. This is a huge problem with search engines today and there’s a hacky solution for it. Going back to the original PageRank algorithm that Google used, your rank on the search results list was heavily tied to how many individuals linked back to you. Older pages tend to have more linkbacks because they’ve been around longer, and so there’s a built-in bias toward older content. Google, in particular, has done a lot to work around this problem, but there’s a real issue with timeliness in articles: sometimes, you want the brand new information (like say, product recommendations); other times, you want older or even the original information (such as if you’re researching historical activities). The problem is that there’s no good way to indicate this to the search engines we have, so the hacky solution is for content creators to create sites like “The May 2023 Guide to Blahblahblah” and for search engine users to look for terms like “2023 blahblahblah” so they can avoid all of the outdated 2022 and 2021 blahblahblah discussions.

There’s also a story in here around keeping things up to date. Some people are good about that—they’ll go back and update years-old blog posts based on what’s new and happening. I am not one of those people.

Comments closed

Adding Text to a Plot in R

Steven Sanderson texts up a plot:

As a programmer, you’re well aware of the importance of data visualization. A well-crafted plot can convey complex information with clarity and impact. In R, creating stunning plots is a breeze, especially when you’re armed with the versatile text() function. This little gem allows you to add custom text to your plots, enabling you to annotate and highlight essential details. Let’s dive into the world of text() and uncover its syntax and potential through some hands-on examples.

I’m also a big fan of geom_text_repel() in ggplot2’s ggrepel library. It is by no means perfect but it does do a good job of not overlapping important visual features like plotted lines.

Comments closed

GTIDs for Replication in MySQL

Aisha Bukar continues a series on replication in MySQL:

Welcome back to another replication series! As a quick reminder, we explored various methods of using MySQL’s replication capabilities in our previous discussions. Initially, we employed the traditional binary-log-based replication approach to set up our replication servers. This involved tracking the binary log file and its positions to facilitate replication.

In this article, we will dive into a more recent and acceptable approach to creating replication – using the Global Transaction Identifiers (GTID) based replication.

Click through to understand how they work and the trade-offs you’ll need to make regarding them.

Comments closed

Refreshing Excel in SharePoint without Power BI Gateways

Gilbert Quevauvilliers needs a refresh:

I have been answering some questions on the Power BI (Or is it now Fabric?) community forum, and I have found recently that there are some people asking how to get data from Excel that is stored in SharePoint Online.

In this blog post I will show you how to connect to a single Excel file using Power BI desktop, and then configure it in the Power BI Service where it does not require a gateway.

Read on to see how.

Comments closed

Delayed Transaction Durability in tempdb

Bob Ward does some digging:

I have found almost every day of my career at Microsoft, I learn something new about Microsoft SQL. It is one of the reasons I enjoy my job. In April of 2023, I was speaking at the MVP Summit in Redmond, Washington. One day I was spending time in a side conversation with MVP Rob Farley discussing some of his “wish list” items for SQL. One of these wish list items was delayed transaction durability for tempdb.

Good news for Rob on that front.

Comments closed

Recommendations for Snapshotting HDFS

Tsz Sze provides some recommendations:

The snapshots feature of the Apache Hadoop Distributed Filesystem (HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors.  This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP). Regardless of whether you’ve been using snapshots for a while or contemplating their use, this blog gives you the insights and techniques to make them look their best.  

Read on for those tips and tricks.

Comments closed

Key Metrics for Kafka Clusters

Paul Brebner tracks some measurements:

Like a steam locomotive, Apache Kafka also has lots of “moving parts”. A Kafka cluster is actually a distributed system, a cluster, which consists of multiple brokers (servers)—messages are sent to Kafka with producers, and consumers receive messages. Topics are used to direct messages between producers and consumers—producers write to selected topics, and consumers subscribe and read from selected topics. Topics are divided up into partitions, and the partitions are distributed over the available brokers for high availability and concurrency. Partitions are replicated to other partitions (followers) from the leader partition (3 is common for production clusters).  

Click through for the description of metrics on brokers, topics, consumer groups, and more, as well as an extended analogy to steam locomotives which works surprisingly well.

Comments closed

Row Rank Min in KQL

Robert Cain continues a series on KQL:

In this post we’ll cover the next in our series on KQL Windowing Functions, row_rank_min. The row_rank_min function will assign rank numbers to an incoming dataset, with the rank of 1 being assigned to the minimal value in the dataset and increasing as the values increase.

First though, if you’ve not read the introductory post on Windowing Functions, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these Windowing Functions work.

Read on for Robert’s explanation.

Comments closed