Kevin Feasel – Page 835

Kafka is just the broker, the stage in which all the action takes place. The producers send messages to the world while the consumers read specific chunks of data. How do you differentiate one specific portion of data from the others? How do consumers know what data to consume? To understand this, you need a new actor in the play: the topics.
Kafka topics are the channels, the carriage that transport messages around. Kafka records produced by producers are organized and stored into topics.

This is a nice overview of Kafka followed by the basics of building a consumer and a producer in C#. I just wish that there was more community usage of Kafka so that the Confluent .NET driver would include some of the really cool stuff they’ve added to Kafka over the past couple of years.

Comments closed

T-SQL Tuesday 135 Roundup

Published 2021-02-18 by Kevin Feasel

Mikey Bronowski had a lot of work to do:

Two weeks ago I have sent out the invitation for February 2021 #tsql2sday and last week almost 30 people responded with their contributions.
I would like to thank you all of the contributors (see the posts’ list below) for their time.

Click through for the full list of contributors and links.

Comments closed

How to Update Statistics Manually

Published 2021-02-18 by Kevin Feasel

Matthew McGiffen takes us through the process of updating statistics:

At the heart of all the methods we’ll look at is the UPDATE STATISTICS command. There are a lot of options for using this command, but we’ll just focus on the ones you’re most likely to use. For full documentation here is the official reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql

Even if you have an automated system, knowing how to update statistics is a great thing because you might need to run a one-off update to help a poorly-performing query. Or you’re using PolyBase, which doesn’t have the capability to perform statistics updates automatically because the data isn’t actually in SQL Server.

Comments closed

Building a Function to Get the Next Date by Date Name or Offset

Published 2021-02-18 by Kevin Feasel

Louis Davidson has a function for us:

As I have been building my Twitter management software, I have been doing a lot more ad-hoc, repetitive coding using T-SQL directly. When I was generating new tweets for upcoming days, a bit of the process that got old quick was getting the date for an upcoming day (the primary key for my tweet table is the date, the type of the tweet, and a sequence number). After having to pick the date of next Tuesday… I had to write some more code (because a true programmer doesn’t do repetitive work when code can be written… even if sometimes the code doesn’t save you time for days or weeks.
So this following function was born from that need, and it is something I could imagine most anyone using semi-regularly, especially when testing software.

This is definitely fancy. My inclination would be to create a calendar table, as that’ll solve this particular issue as well as other complex variants (like, I want the next Tuesday which doesn’t fall on a holiday).

Comments closed

Azure Data Factory Pipeline Outcomes

Published 2021-02-18 by Kevin Feasel

We have proof that Meagan Longoria is a consultant:

Question: When an activity in a Data Factory pipeline fails, does the entire pipeline fail?
Answer: It depends

Read on to understand the circumstances of when a pipeline activity failure will doom an entire pipeline run and when recovery might be possible.

Comments closed

Index Creation with DROP_EXISTING

Published 2021-02-18 by Kevin Feasel

Monica Rathbun takes us through the DROP_EXISTING option when modifying an index:

When you are making changes to an existing Non-Clustered index SQL Server provides a wide variety of options. One of the more frequently used methods is DROP EXISTING; in this post you will learn all about that option. This option automatically drops an existing index after recreating it, without the index being explicitly dropped. Let us take a moment understand the behavior of this choice.

What I really want is DROP_IF_EXISTS. I want idempotent commands: if I run it once or a thousand times, I end up in the same state whether there was an index there at the start or not (or if attempt #793 failed due to running out of sort space in tempdb or something, leaving me with no index). DROP_EXISTING is only idempotent if the index already existed, but then you have to ask, why is it important if an index of that name is already there? The important part of the statement is that I want an end state which includes this index in this form.

Comments closed

Searching Text which Begins with a Wildcard

Published 2021-02-18 by Kevin Feasel

Chad Callihan is looking for some data:

Searching for a value or group of values with a wildcard is more than just putting a % on both sides of a text string. If you know you’re looking for all strings in a name field that start with the name “Chad” then you are you really shooting yourself in the foot by using ‘%Chad%’ instead of ‘Chad%’ in your query. SQL Server is going to be scanning the table instead of being able to use an index to seek to the data. While that may work to get your result, it’s likely going to take longer and be more invasive than needed. I want to go through an example of how using a reversed column can improve SQL Server’s ability to build a better execution plan.

Read on to see how a computed reversal column can help. For more complex scenarios, an n-gram table (for example, a trigram) might help, though there’s a lot of setup involved there.

Comments closed

So You’ve Run Out of Memory

Published 2021-02-18 by Kevin Feasel

Randolph West explains how the buffer pool handles low-memory situations:

One of the bigger clichés in the data professional vocabulary (behind “it depends”) is that you always give SQL Server as much RAM as you can afford, because it’s going to use it. But what happens when SQL Server runs out of memory?
Recently a question appeared on my post about how the buffer pool works, asking the following (paraphrased):
What happens if a data page doesn’t exist in the buffer pool, and the buffer pool doesn’t have enough free space? Does the buffer pool use TempDB, [and] does TempDB put its dirty pages into the buffer pool?
This is an excellent question (thank you for asking!). I spent 30 minutes writing my reply and then figured it would make a good blog post this week if I fleshed it out a little.

Read the whole thing.

Comments closed

RBAC in Hadoop with Kudu and Ranger

Published 2021-02-17 by Kevin Feasel

Attila Bukor takes us through the process of setting up role-based access controls on Impala tables:

After setting up the integration it’s time to create some policies, as now only trusted users are allowed to perform any action; everyone else is locked out. Resource-based access control (RBAC) policies can be set up for Kudu in Ranger, but Kudu currently doesn’t support tag-based policies, row-level filtering or column masking.

Click through for the process, as well as current limitations.

Comments closed

Join Algorithm Selection in Spark

Published 2021-02-17 by Kevin Feasel

The Hadoop in Real World team takes us through the selection criteria for join types:

There are several factors Spark takes into account before deciding on the type of join algorithm to use to join datasets at runtime.
Spark has the following 5 algorithms to choose from –
1. Broadcast Hash Join
2. Shuffle Hash Join
3. Shuffle Sort Merge Join
4. Broadcast Nested Loop Join
5. Cartesian Product Join (a.k.a Shuffle-and-Replicate Nested Loop Join)

Read on to learn which join types are supported in which circumstances, as well as rules of precedence.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Author: Kevin Feasel

Kafka in .NET

T-SQL Tuesday 135 Roundup

How to Update Statistics Manually

Building a Function to Get the Next Date by Date Name or Offset

Azure Data Factory Pipeline Outcomes

Index Creation with DROP_EXISTING

Searching Text which Begins with a Wildcard

So You’ve Run Out of Memory

RBAC in Hadoop with Kudu and Ranger

Join Algorithm Selection in Spark