2023-09-22 – Curated SQL

Topic partitions are the main “unit of parallelism” in Kafka. What’s a unit of parallelism? It’s like having multiple cashiers in the same store instead of one. Multiple purchases can be made at once, which increases the overall amount of purchases made in the same amount of time (this is like throughput). In this case, the cashier is the unit of parallelism.

In Kafka, each partition leader can live on a different broker in a cluster, and a producer can send multiple messages, each with a different destination topic partition; that is, a producer can send them in parallel. While this is the main reason Kafka enables high throughput, compression can also be a tool to help improve throughput and efficiency by reducing network traffic due to smaller messages. A well-executed compression strategy also means better disk utilization in Kafka, since stored messages on disk are smaller.

Click through for the various options and some guidance on using each.

Comments closed

Spark Defaults for Core Count and Memory

Published 2023-09-22 by Kevin Feasel

The Big Data in Real World team gives us the defaults:

spark.executor.cores controls the number of cores available for the executors.

[…]

spark.executor.memory controls the amount of memory allocated for each executor.

I did helpfully take out the first answer, so you’ll have to click through to the post in order to see the answers., as well as how cluster mode vs client mode can change things.

Comments closed

Oracle: RMAN and Non-Synchronizing Standby Database

Published 2023-09-22 by Kevin Feasel

David Fitzjarrell proffers advice on recovering from a non-synchronizing standby database:

Occasionally the unthinkable can occur and the DBA can be left with a standby database that is no longer synchronizing with the primary. A plethora of “advice”will soon follow that discovery, most of it much like this:

“Well, ya gotta rebuild it.”

Of course the question to ask is “how far out of synch is the standby>” That question is key in determining how to attack this situation. Let’s go through the two most common occurrences of this and see how to address them.

Read on to see David’s advice.

Comments closed

Deployment Pipelines for Microsoft Fabric

Published 2023-09-22 by Kevin Feasel

Reitse Eskens crosses a line:

It’s a bit of a challenge to keep up with all the changes, updates and all the new stuff coming out for Fabric. As I’m not really invested in the PowerBI part of the data platform (yay pie charts ;)), some things that are very common for the PowerBI community are very new to me. I have it on good authority that this blog covers a feature that is well know within PowerBI but quite new in the data engineering part. When I say that, I need to add that at the time of writing, only the PowerBI side of things are fully supported but I have very good hopes that pipelines and notebooks will be supported as well.

Supporting pie charts are fightin’ words here. Nonetheless, read on to see how deployment pipelines work in Microsoft Fabric.

2 Comments

Finding SSAS Tabular Dimensions in Excel

Published 2023-09-22 by Kevin Feasel

Olivier Van Steenlandt has lost a few dimensions in the couch cushions:

A colleague reached out last week while connecting to one of our SQL Server Analysis Services models in Excel. He couldn’t find the expected Attribute folders in the model. He was looking for the following dimensions:

Of particular interest was that this colleague could not see them but Olivier could. The answer ends up being a bit surprising.

Comments closed

Query Execution Concepts and SQL Server

Published 2023-09-22 by Kevin Feasel

Erik Darling answers the question, why is it so hard to figure out why my query sometimes sucks:

Sometimes people will ask me penetrating questions like “why does SQL Server choose a bad execution plan?” or “why is this query sometimes slow?”

Like many things in databases, it’s an endless spiral of multiverses (and turtles) in which many choose your own adventure games are played and, well, sometimes you get eaten by a Grue.

In this post, I’m going to talk at a high level about potential reasons for both.

Read on for a smorgasbord of factors to consider based on the steps SQL Server takes.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Day: September 22, 2023

Kafka Message Compression

Spark Defaults for Core Count and Memory

Oracle: RMAN and Non-Synchronizing Standby Database

Deployment Pipelines for Microsoft Fabric

Finding SSAS Tabular Dimensions in Excel

Query Execution Concepts and SQL Server