Press "Enter" to skip to content

Author: Kevin Feasel

Batch Mode with Temp Tables

Erik Darling continues receiving big paydays from Big Temp Table:

When you have queries that need to process a lot of data, and probably do some aggregations over that lot-of-data, batch mode is usually the thing you want.

Originally introduced to accompany column store indexes, it works by allowing CPUs to apply instructions to up to 900 rows at a time.

It’s a great thing to have in your corner when you’re tuning queries that do a lot of work, especially if you find yourself dealing with pesky parallel exchanges.

Read on to see how you can create a temp table which triggers batch mode processing fairly easily.

Comments closed

Creating an Azure Purview Catalog Instance

Wolfgang Strasser wants to try out Azure Purview:

Basics – Resource group, purview account name (this cannot be changed afterwards) and the location.

As of today (2020-12-06), there are only 5 Azure regions you can choose from to store the Purview metadata. But – in-region scanning from 16 other Azure regions is available in the preview (source)

This is part one of a multi-part series, so stay tuned for more.

Comments closed

BULK INSERT and Advent of Code

Thomas Rushton performs BULK INSERT I imagine the way most of us do, through trial and lots of error:

Step one in solving these problems in SQL Server is loading the data into a database so you can do something with it.

I chose to use BULK INSERT, because, according to the documentation, it:

Imports a data file into a database table or view in a user-specified format in SQL Server

Ideal, right?

Click through for several tips around BULK INSERT.

Comments closed

Joining Data Streams in Flink

Kundan Kumarr crosses the streams:

Apache Flink offers rich sources of API and operators which makes Flink application developers productive in terms of dealing with the multiple data streams. Flink provides many multi streams operations like UnionJoin, and so on. In this blog, we will explore the Window Join operator in Flink with an example. It joins two data streams on a given key and a common window.

Click through for an example of the fluent API approach. It’s not as nice as proper SQL, but it does the job.

Comments closed

Spark Starter Guide: Data Standardization

Ladon Robinson continues the Spark Starter Guide:

Standardization is the practice of analyzing columns of data and identifying synonyms or like names for the same item. Similar to how a cat can also be identified as a kitty, kitty cat, kitten or feline, we might want to standardize all of those entries into simply “cat” so our data is less messy and more organized. This can make future processing of the data more streamlined and less complicated. It can also reduce skew, which we address in Addressing Data Cardinality and Skew.

We will learn how to standardize data in the following exercises.

Check it out. I’m excited to see the Spark Starter Guide get fleshed out and written.

Comments closed

Azure Synapse Analytics Goes GA

Sacha Tomey recaps some announcements:

After much anticipation, today, Microsoft have announced the general availability of Azure Synapse Analytics! Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing and Big Data analytics all into a single service, accelerating time to insights, enabling organisations to become data-driven. Azure Synapse combines capabilities spanning the needs of data engineering, machine learning, and BI without creating silos in processes and tools.

Read on for more info on this as well as info on Azure Purview.

Comments closed

Power BI Premium Per User

Adam Saxton is excited:

Are you curious what Power BI Premium Per User is all about? Adam walks you through how to get it and what it means from a user experience. Take advantage of Power BI Premium features without the Premium capacity price!

Click through for the video as well as a few links for more info.

Comments closed

Changing a Kubernetes Cluster to containerd

Andrew Pruski wants to get ahead of the game:

DISCLAIMER – You’d never do this for a production cluster. For those clusters, you’d simply get rid of the existing nodes and bring new ones in on a rolling basis. This blog is just me mucking about with my Raspberry Pi cluster to see if the update can be done in-place without having to rebuild the nodes (as I really didn’t want to have to do that).

Check it out. In addition to the Twitter thread Andrew mentions, the Kubernetes group has a full blog post with more details.

Comments closed

Use a Separate Deadlock Extended Events Trace

Kendra Little explains why it makes sense to have an extended events trace specifically for deadlocks:

We recently had customer ask why SQL Monitor creates an Extended Events session to capture deadlock graphs, when SQL Server has a built-in system_health Extended Events trace which also captures deadlock information?

There are a couple of reasons why a dedicated trace is desirable for capturing deadlock graphs, whether you are rolling your own monitoring scripts or building a monitoring application. I like this question a lot because I feel it gets at an interesting tension/balance at the heart of monitoring itself.

Click through for the answer.

Comments closed

Aggregate Functions in SQL Server

Hugo Kornelis takes us through the concept of aggregate functions:

SQL Server currently supports three operators that can compute aggregations: Hash MatchStream Aggregate, and Window Aggregate. These operators all use the same basic principle of maintaining internal counters as rows are processed, so that the final value of those internal counters is the expected value.

Read on to see the full list, as well as how they operate.

Comments closed