Press "Enter" to skip to content

Curated SQL Posts

Apache Kafka 4.0 Now Available

David Jacot announces a milestone release for Apache Kafka:

Apache Kafka 4.0 is a significant milestone, marking the first major release to operate entirely without Apache ZooKeeper™️. By running in KRaft mode by default, Kafka simplifies deployment and management, eliminating the complexity of maintaining a separate ZooKeeper ensemble. This change significantly reduces operational overhead, enhances scalability, and streamlines administrative tasks. We want to take this as an opportunity to express our gratitude to the ZooKeeper community and say thank you! ZooKeeper was the backbone of Kafka for more than 10 years, and it did serve Kafka very well. Kafka would most likely not be what it is today without it. We don’t take this for granted, and highly appreciate all of the hard work the community invested to build ZooKeeper. Thank you!

There are some other big items in Kafka 4.0 and you can see more in the post’s changelog.

Comments closed

Building a Package in R

Ken Koon Wong builds a package:

Made a hilariously redundant R package for a simple OpenAI calls, but the real win was finally learning how to build an R package! Is it efficient? Absolutely not! Was it worth the time and experience? Yes! Will I do it again? Yes! Will it break? Yes! 

Package management in R is a bit of a challenge, but well worth it, even if the only consumers of your code will be in your own company. H/T R-Bloggers.

Comments closed

Generative AI Assistance in Building Power BI Custom Visuals

Kurt Buhler discusses a process:

In Power BI, advanced report creators often need to use custom visuals to fulfill their requirements or create certain designs. In previous articles that we published at SQLBI, we discussed the options available to make custom visuals, such as SVG visuals that you can make by using DAX. We also gave examples of when you might choose one approach over another, for example, if you want to make a bullet chart. However, creating custom visuals in Power BI is complex, and requires technical skills that most Power BI report creators do not have. In this article, we examine how you can use AI assistance to help you plan and create custom visuals.

A high-level overview of the process we will take and the desired result is below. It is important to emphasize that this article focuses on the general process, and not specific steps to obtain the result.

Click through for a long-form article on the subject. I’m generally fairly sour on relying too much on generative AI solutions for, well, much of anything. That’s part of why you see so few posts on the topic here. My main problem is that it works best in situations where you already know enough to separate wheat from chaff, or good code from broken/insecure/buggy code. I think Kurt strikes a good tone in this article and it’s well worth the read.

Comments closed

Finding Why a SQL Server Plan Failed to Go Parallel

Grant Fritchey looks for an answer:

Let’s face it, most of the time, you probably don’t want your SQL Server execution plans to go parallel. After all, that’s why, instead of adjusting the Cost Threshold for Parallelism (as you should have), you’ve set the Max Degree of Parallelism to 1 (I see you out there). However, some of you recognize that, in fact, some queries need to run in Parallel.

Yet, sometimes, a query you think should run in parallel doesn’t. Can you tell why a plan didn’t go parallel?

Read on to learn how.

Comments closed

Low-Code Tradeoffs and an Example of SFTP

Meagan Longoria talks trade-offs:

Low-code solutions often accelerate development and make tasks accessible to people who can’t or don’t want to write their own code. But it’s important to remember that it’s a trade-off. You are often trading decreased development and maintenance time for limited configuration options and minimal monitoring capabilities. Low-code solutions are great…until they aren’t.

This kind of thing can pop up in code-based libraries or APIs as well, though there’s typically another viable solution. With low-code solutions, veering from the happy path often lands you in a world of frustration.

Comments closed

Performance Profiles of SUM() and SUMX() in DAX

Eugene Meidinger blows the doors off of this conspiracy:

For years, I told people to avoid iterators. I compared them to cursors in SQL, which are really bad, or for loops in C# which are normally fine. I knew that DAX was column based and that it often broke down when doing row-based operations, but I couldn’t tell you why.

Advice to avoid iterators is often based on a misunderstanding and a misapprehension of how the Vertipaq engine works. If you are blindly giving this advice out, like I was, you are promoting a fundamental misunderstanding of how DAX works.

Eugene has since opened his eyes and can see through the lies.

Comments closed

The Design of DataDiluvium

Adron Hall wraps up a series on DataDiluvium. First up is some commentary on data generation:

In Parts 1 and 2, I set up the development environment and implemented the schema parsing functionality. Now, I’ll explore the data generation system and final implementation details that make DataDiluvium a complete solution.

And finally we have some additional notes and wrapup:

In my previous three posts, I covered the core functionality of DataDiluvium. In this follow-up post, I’ll explore the additional features, utilities, and implementation details that I’ve added to enhance the application’s functionality and developer experience.

Check out both posts. I do enjoy seeing people walk through and explain some of the key concepts and decisions they’ve made when developing solutions.

Comments closed

Speeding up Dataflow Validation and Publish Times

Chris Webb doesn’t want to wait:

If you’re working with slow data sources in Power BI/Fabric dataflows then you’re probably aware that validation (for Gen1 dataflows) or publishing (for Gen2 dataflows) them can sometimes take a long time. If you’re working with very slow data sources then you may run into the 10 minute timeout on validation/publishing that is documented here. For a Gen1 dataflow you’ll see the following error message if you try to save your dataflow and validation takes more than 10 minutes:

Click through for that common error message, as well as some tips to avoid this issue. There was also an interesting approach in the comments section that circumvented the problem as well.

Comments closed