Curated SQL – Page 438 – A Fine Slice Of SQL Server

Key Metrics for Kafka Clusters

Published 2023-08-08 by Kevin Feasel

Like a steam locomotive, Apache Kafka also has lots of “moving parts”. A Kafka cluster is actually a distributed system, a cluster, which consists of multiple brokers (servers)—messages are sent to Kafka with producers, and consumers receive messages. Topics are used to direct messages between producers and consumers—producers write to selected topics, and consumers subscribe and read from selected topics. Topics are divided up into partitions, and the partitions are distributed over the available brokers for high availability and concurrency. Partitions are replicated to other partitions (followers) from the leader partition (3 is common for production clusters).

Click through for the description of metrics on brokers, topics, consumer groups, and more, as well as an extended analogy to steam locomotives which works surprisingly well.

Comments closed

Row Rank Min in KQL

Published 2023-08-08 by Kevin Feasel

Robert Cain continues a series on KQL:

In this post we’ll cover the next in our series on KQL Windowing Functions, row_rank_min. The row_rank_min function will assign rank numbers to an incoming dataset, with the rank of 1 being assigned to the minimal value in the dataset and increasing as the values increase.

First though, if you’ve not read the introductory post on Windowing Functions, Fun With KQL Windowing Functions – Serialize and Row_Number, you should do so now as it introduced several important concepts needed to understand how these Windowing Functions work.

Read on for Robert’s explanation.

Comments closed

Capturing Data in Microsoft Fabric Eventstream from Event Hub

Published 2023-08-08 by Kevin Feasel

Brian Bønk does some data shuffling:

The Fabric servive Eventstream can read data from an Event Hub service to help collect data from IoT devices and other streaming services. But how do you configure the Event Hub and Eventstream to work together?

I’ll try to help with that in this blogpost.

Click through for the article, which provides a how-to guide on the topic.

Comments closed

Azure Data Studio 1.45 Released

Published 2023-08-08 by Kevin Feasel

Erin Stellato gives us an update:

In this release we updated to VS Code version 1.79.2, from version 1.70, bringing ADS nearly current with VS Code. Keeping up with VS Code is an artifact of Azure Data Studio being a fork of VS Code, and the team accomplished a major feat with this latest merge. Many of the changes are not immediately obvious to users, but there are improvements across security, performance, and the application interface.

Read on for the highlights.

Comments closed

Execution Plans in Postgres

Published 2023-08-08 by Kevin Feasel

Henrietta Dombrovskaya shows us an execution plan in Postgres:

In the last blog (When PostgreSQL Parameter Tuning is not the Answer), we compared several execution plans for a SQL statement as we made changes to parameters and indexes. Still, there was no mention of what an execution plan is, how one can obtain an execution plan for a query, and how to interpret the result. In this blog, we will take a deep dive into this topic.

Read on for the full article.

Comments closed

Drawing Tubular Paths with Julia and R

Published 2023-08-07 by Kevin Feasel

Stephane Laurent writes some code:

I implemented the framed closed curves exposed in this blog post, in Julia and R. In fact it is useless with R, because the rgl function cylinder3d is faster and better.

Here is the Julia implementation:

Click through for that code, as well as equivalent R code. H/T R-Bloggers

Comments closed

Learning about Data in R with str()

Published 2023-08-07 by Kevin Feasel

Steven Sanderson explains the value of the str() function:

In a nutshell, str() stands for “structure” and offers a concise summary of the structure of an R object. It presents essential details about the object, including its data type, dimensions, and the first few values. By providing an overview of your data, str() allows you to grasp the fundamentals at a glance and proceed with a clearer understanding of what you’re working with.

str() is a really useful function and people who develop objects in R thoughtfully can pack a lot of useful data into the one call.

Comments closed

Thoughts on Partitioning in Postgres

Published 2023-08-07 by Kevin Feasel

Ryan Booz splits things out:

For 20+ years of database and application development, time-series data has always been at the heart of the products I work with. Inevitably, as the quantity of data grew over time, management became more difficult and query performance suffered. Over the years, the primary method for managing this growth in data effectively would be to partition it. The problem is, until recently, partitioning wasn’t easy to setup in most OLTP databases like PostgreSQL or SQL Server.

Fortunately, PostgreSQL has significantly improved its ability to partition large data tables over the last 6 years, starting with PostgreSQL 10.

Read on for Ryan’s recommendations around partitioning and a few thoughts on sharding.

Comments closed

Cross-Database Query Folding in Power Query

Published 2023-08-07 by Kevin Feasel

Chris Webb doesn’t want query folding limits:

A few months ago a new option was added to the Sql.Database and Sql.Databases functions in Power Query in Power BI and Excel which allows Power Query queries that combine data from different SQL Server databases to fold. Here’s a simple example showing how to use it.

Read on for the setup and the demonstration. This is pretty nice for cases where you have user data in one database and reference data in another, or you’re trying to join together data from two separate systems.

Comments closed

An Overview of Partitioning and Sharding in Postgres

Published 2023-08-07 by Kevin Feasel

Michael Christofides defines terms:

It has been possible to do partitioning in PostgreSQL for quite a while — splitting what is logically one large table into smaller physical tables. Since version 10, a huge leap was made with the introduction of declarative partitioning, and more improvements have come every year since.

Sharding is a different story — splitting what is logically one large database into smaller physical databases. The primary tool for this in the PostgreSQL ecosystem is the Citus extension. But you can also handle the sharding logic at the application level, as recent posts from the likes of Notion and Figma have described. Somewhat confusingly, some forms of sharding are sometimes referred to as vertical partitioning, including by the team at Figma.

Read on for a few thoughts on when to perform each action and what the costs and benefits are.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Curated SQL Posts