2020-10-26 – Curated SQL

Apache Kafka^® is a distributed real-time processing platform that allows for the ingestion of huge volumes of data. ksqlDB is part of the Kafka ecosystem and offers a SQL-like language to query and process large-scale, real-time data. This blog post demonstrates how to quickly process network activity for detection intrusion using both Kafka and ksqlDB.
For testing purposes (and to avoid being banned from the enterprise network), a virtualized environment through Vagrant is used.

Click through for the scenario.

Comments closed

Adaptive Query Execution in Databricks

Published 2020-10-26 by Kevin Feasel

MaryAnn Xue and Allison Wang explain how Adaptive Query Execution works with Databricks:

One of the most important cost-based decisions made in the Spark optimizer is the selection of join strategies, which is based on the size estimation of the join relations. But since this estimation can go wrong in both directions, it can either result in a less efficient join strategy because of overestimation, or even worse, out-of-memory errors because of underestimation.
AQE offers a trouble-free solution here by switching to the faster broadcast hash join during execution time.

This is pretty similar to Adaptive Query Processing in SQL Server.

Comments closed

The Sequence Project Operator

Published 2020-10-26 by Kevin Feasel

Hugo Kornelis continues on a quest:

The Sequence Project operator computes values for the “ranking functions”: functions where the results depend on other rows in the result set, such as ROW_NUMBER, RANK, DENSE_RANK, and NTILE.
A Sequence Project can be considered as somewhat similar in function as Compute Scalar: both operators add new columns to the data based on expression. But Compute Scalar works on expressions other columns from the same row and constant values as input. Sequence Project computes expressions that are based on preceding rows in the data stream as their input.

Read on to learn more about what this operator does and how it works.

Comments closed

Finding Unique Words between Texts in Python

Published 2020-10-26 by Kevin Feasel

Pawan Khowal walks us through a Python puzzle:

In this puzzle we have to find un-common words between two strings
Sample Input List
a = ‘Hi I am Pawan here only’
b = ‘here only’

Click through for the solution.

Comments closed

SQL Server Replication Requires Actual Server Names

Published 2020-10-26 by Kevin Feasel

Steve Stedman walks us through a pain point when using replication:

SQL Server replication requires the actual server name to make a connection to the server. Specify the actual server name. (Replication.Utilities).
You might be thinking to yourself that you had a typo in the server name, but no, after checking the server name it matches what you can connect with.

When I’ve seen this error, often it will even tell me the server name it’s expecting, which then makes me ask why I have to type it in if it knows already.

Comments closed

Connection Failed: ADF with SSIS Integration Runtime

Published 2020-10-26 by Kevin Feasel

Andy Leonard diagnoses a problem:

I clicked the “More” link to display details in the blue box
To provision new Integration Runtime, Azure SQL DB server should not have existing SSISDB. If this SSISDB was once associated with an SSIS IR, you can re-associated it with another SSIS IR (for more info see here).

Click through for Andy’s solution to the problem.

Comments closed

SolarWinds Acquires SentryOne

Published 2020-10-26 by Kevin Feasel

Greg Gonzalez announces the news:

SolarWinds recently announced the intention to acquire SentryOne, an event that promises to make life even better for the IT and data professionals who use our products. Both companies have historically focused—in their own ways—on solving real-life problems for customers by providing high-quality solutions.

I do wonder what will happen given that SolarWinds already has a database monitoring tool; will they keep the two separate, combine them together, or do something else?

Comments closed

Memory-Optimized Table Types to Avoid tempdb Contention

Published 2020-10-26 by Kevin Feasel

Michael J. Swart uses In-Memory OLTP:

At D2L, we’re the perfect candidate customer for In Memory OLTP features, but we’ve held off adopting those features for years. Our servers handle tons of super quick but super frequent queries and so we find ourselves trying to address the same scaling challenges we read about in Microsoft’s customer case studies.
But there’s only one In Memory feature in particular that I care about. It’s the Memory Optimized Table Types. Specifically, I’ve always wanted to use that feature to avoid tempdb object allocation contention. Recently I finally got my chance with a lot of success. So even though I could say I’m happy with In Memory features, I think it’s more accurate to say that I feel relieved at having finally squashed my tempdb issues.

We’ve used memory-optimized table types for a couple of years to solve exactly this problem and the plan was pretty much the same as what Michael put into action.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Day: October 26, 2020

Intrusion Detection using ksqldb

Adaptive Query Execution in Databricks

The Sequence Project Operator

Finding Unique Words between Texts in Python

SQL Server Replication Requires Actual Server Names

Connection Failed: ADF with SSIS Integration Runtime

SolarWinds Acquires SentryOne

Memory-Optimized Table Types to Avoid tempdb Contention