Curated SQL – Page 610 – A Fine Slice Of SQL Server

Replacing ZooKeeper with Kafka Raft

Published 2022-09-02 by Kevin Feasel

Dave Shook shows off the Kafka Raft protocol:

Apache Kafka^® Raft (KRaft) is the consensus protocol that was introduced to remove Apache Kafka’s dependency on ZooKeeper™ for metadata management. This greatly simplifies Kafka’s architecture by consolidating responsibility for metadata into Kafka itself, rather than splitting it between two different systems: ZooKeeper and Kafka. KRaft mode is available in the Apache Kafka 3.1 release, though it is not yet ready for use in production environments. Refer to KIP-833 to learn more about when KRaft will be marked as production ready.
Below are several key resources to help you learn everything you need to know about the ins and outs of KRaft, its role in the Kafka architecture, and how you can get started in trying it out. These resources are followed by two others that let you “do stuff” with Kraft, i.e., run a KRaft mode cluster and observe how it works when various cluster controller related operations take place.

Eliminating the ZooKeeper dependency was a big goal for the Kafka team for several years. It’s not quite out yet but I’ll be interested to see how that migration works for companies.

Comments closed

Running Postman Tests in GitLab

Published 2022-09-02 by Kevin Feasel

Rahul Kumar automates Postman tests:

Hi folks, In this brief blog post, we’ll learn more about Gitlab CI and Postman, the API testing tool we use the most frequently. This article’s goal is to provide a quick process for automatically testing the service API response. The solution makes use of the capabilities provided by the Gitlab-integrated Continuous Integration tool.

Click through for the tutorial.

Comments closed

Creating a Parameterized Table in Tableau

Published 2022-09-02 by Kevin Feasel

Jordan Witcombe does some slicing and dicing:

Let’s say you’re planning to present a large table of information and you want the user to be able to filter based on the top or bottom X number of records. Due to the size of the dataset we would like the customer to have the ability to filter based on more than one column for their comparison. Making a large dataset much easier to digest.

Read on to see how you can accept user input in Tableau and simplify the viewable data.

Comments closed

PolyBase and Windows Authentication in SQL Server 2022

Published 2022-09-02 by Kevin Feasel

I have some bad news:

Back in the SQL Server 2019 days, I laid out a post which covered a really annoying bug around Windows authentication and PolyBase. The upshot is that SQL Server 2019 CU2 broke Windows authentication, meaning that you could not create external objects on a SQL Server instance while using a Windows authenticated session. In CU2, if you tried, you would get a Level 20 error which would kill your session.
Over the next several CUs, the SQL Server team improved things incrementally: first, the error that you got was a Level 16 error, so your statement would fail but the session would remain active. Then, in CU6, they made it so that if your SQL Server instance is on a server running on a domain, then you would be okay—you could create external objects while connected with a domain-authenticated Windows account.
I decided to give this a try in SQL Server 2022 RC0 and ran into some interesting results.

Click through for those results.

Comments closed

Deploying SQL Server via AKS

Published 2022-09-02 by Kevin Feasel

Rajendra Gupta needs to deploy a SQL Server container:

This article uses Azure Kubernetes Service (AKS) to deploy and manage the Kubernetes cluster. It is a fully managed service that offers serverless Kubernetes with integrated CI/CD solutions, enterprise-grade security, and governance.
You can navigate to https://azure.microsoft.com/en-in/services/kubernetes-service/#overview and try Azure Kubernetes Service (AKS).

Read on for an overview of Azure Kubernetes Service and how you can get a SQL Server on Linux container running atop it.

Comments closed

Plotting an ASCII Bar Chart using SQL

Published 2022-09-02 by Kevin Feasel

Lukas Eder is speaking my language:

No need for expensive Tableau subscriptions. Ditch Microsoft Excel. Just use native PostgreSQL to quickly visualise your data!
Here’s an idea I had for a while. As you may know, jOOQ can produce fancy charts from your jOOQ results. But that requires you use jOOQ, and you may not be using jOOQ, because you’re not coding in Java/Kotlin/Scala (otherwise, you’d be using jOOQ). That’s OK. I thought, why not do it with SQL (PostgreSQL, to be specific) directly, then?

This one’s getting my most coveted category: Wacky Ideas.

Comments closed

SSRS 2022 Updates

Published 2022-09-02 by Kevin Feasel

Cookie McCray shares some updates:

Back in 2020, we announced the deprecation of Report Server features Pin to Power BI, Mobile Reports, and Mobile Report Publisher. These features will be removed from versions of SQL Server starting with SQL Server 2022 and will no longer be supported. Only the last three releases, SQL Server 2016, SQL Server 2017, and SQL Server 2019, will be supported in maintenance mode until EOL (End of Life) for existing customers.

There are some interesting things in here but nothing that makes me say Reporting Services has a real future.

Comments closed

Working with List Data in R using rrapply

Published 2022-09-01 by Kevin Feasel

Joris Chau pivots, unpivots, and unnests data in R:

The previous post showcases the rrapply() function in the minimal rrapply-package as a revised and extended version of base rapply() in the context of nested list recursion in R. For quick data exploration of a nested list it can make sense to keep the list in its original nested format to reduce the number of processing steps and minimize code complexity. As part of a more elaborate data analysis, if there is no specific reason to keep the nested data structure, it is often more practical to transform the nested list into a more convenient rectangular format and work with the unnested object (e.g. a data.frame) instead. In this follow-up post, we review the available (how) options in rrapply() to unnest or melt nested lists into a rectangular format in more detail and highlight the similarities and differences with respect to several common alternatives in R.

There’s a lot going on in this demo-packed post. H/T R-Bloggers.

Comments closed

Creating Multiple Output Files per Spark Task

Published 2022-09-01 by Kevin Feasel

Dmitry Tolpeko has a quick but helpful post:

It is highly recommended that you try to evenly distribute the work among multiple tasks so every task produces a single output file and job is completed in parallel.
But sometimes it still may be useful when a task generates multiple output files with the limited number of records in each file […]

I had to cut it off right there to keep from spilling the beans here. Click through for Dmitry’s post to see what setting controls records per file, allowing you to keep opening those Spark output files in Excel.

Comments closed

The Importance of Data Shaping

Published 2022-09-01 by Kevin Feasel

Paul Turley shapes the youth of data:

Power BI is a new tool and dimensional modeling is an old idea. One of the challenges is that, like other modern self-service analytics products on the market, Power BI doesn’t force self-service data jockeys to transform their data before reporting with it. If you want to import a big, wide spreadsheet full of numbers and create charts in a Power BI report, knock yourself out. But, the solution won’t scale and you will inevitably run into walls when you try to make future enhancements. Similar problems arise from importing many tables from different sources and transactional systems. Several tables all chained together with creative mashups and relationships present their own set of problems. The first iteration of such an effort is usually a valuable discovery method and learning experience. Great… treat it as such; take notes, make note of the good parts and then throw it away and start over! In Fredrick Brooks’ “The Mythical Man Month“, he cites that for most engineering projects, the first six attempts should be abandoned before the team will be prepared to start over and complete the work successfully. He was a chemical engineer before working for IBM; and hopefully, our methods in the data engineering business are more effective then his 6-to-1 rule. But, this makes the case the prototypes and proof-of-concept projects are a critical part of the learning path.

The tools don’t make the rules.

Unless you’re talking about the lambda architecture, in which case that’s kind of accurate. But we’re not talking about that here.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Curated SQL Posts