Press "Enter" to skip to content

Day: December 4, 2020

Joining Data Streams in Flink

Kundan Kumarr crosses the streams:

Apache Flink offers rich sources of API and operators which makes Flink application developers productive in terms of dealing with the multiple data streams. Flink provides many multi streams operations like UnionJoin, and so on. In this blog, we will explore the Window Join operator in Flink with an example. It joins two data streams on a given key and a common window.

Click through for an example of the fluent API approach. It’s not as nice as proper SQL, but it does the job.

Comments closed

Spark Starter Guide: Data Standardization

Ladon Robinson continues the Spark Starter Guide:

Standardization is the practice of analyzing columns of data and identifying synonyms or like names for the same item. Similar to how a cat can also be identified as a kitty, kitty cat, kitten or feline, we might want to standardize all of those entries into simply “cat” so our data is less messy and more organized. This can make future processing of the data more streamlined and less complicated. It can also reduce skew, which we address in Addressing Data Cardinality and Skew.

We will learn how to standardize data in the following exercises.

Check it out. I’m excited to see the Spark Starter Guide get fleshed out and written.

Comments closed

Azure Synapse Analytics Goes GA

Sacha Tomey recaps some announcements:

After much anticipation, today, Microsoft have announced the general availability of Azure Synapse Analytics! Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing and Big Data analytics all into a single service, accelerating time to insights, enabling organisations to become data-driven. Azure Synapse combines capabilities spanning the needs of data engineering, machine learning, and BI without creating silos in processes and tools.

Read on for more info on this as well as info on Azure Purview.

Comments closed

Power BI Premium Per User

Adam Saxton is excited:

Are you curious what Power BI Premium Per User is all about? Adam walks you through how to get it and what it means from a user experience. Take advantage of Power BI Premium features without the Premium capacity price!

Click through for the video as well as a few links for more info.

Comments closed

Changing a Kubernetes Cluster to containerd

Andrew Pruski wants to get ahead of the game:

DISCLAIMER – You’d never do this for a production cluster. For those clusters, you’d simply get rid of the existing nodes and bring new ones in on a rolling basis. This blog is just me mucking about with my Raspberry Pi cluster to see if the update can be done in-place without having to rebuild the nodes (as I really didn’t want to have to do that).

Check it out. In addition to the Twitter thread Andrew mentions, the Kubernetes group has a full blog post with more details.

Comments closed

Use a Separate Deadlock Extended Events Trace

Kendra Little explains why it makes sense to have an extended events trace specifically for deadlocks:

We recently had customer ask why SQL Monitor creates an Extended Events session to capture deadlock graphs, when SQL Server has a built-in system_health Extended Events trace which also captures deadlock information?

There are a couple of reasons why a dedicated trace is desirable for capturing deadlock graphs, whether you are rolling your own monitoring scripts or building a monitoring application. I like this question a lot because I feel it gets at an interesting tension/balance at the heart of monitoring itself.

Click through for the answer.

Comments closed

Aggregate Functions in SQL Server

Hugo Kornelis takes us through the concept of aggregate functions:

SQL Server currently supports three operators that can compute aggregations: Hash MatchStream Aggregate, and Window Aggregate. These operators all use the same basic principle of maintaining internal counters as rows are processed, so that the final value of those internal counters is the expected value.

Read on to see the full list, as well as how they operate.

Comments closed

Introducing Azure Purview

Wolfgang Strasser gives us a once-over on a new service:

Today, at the Azure Data and Analytics event, a new Azure data governance service called Azure Purview (https://aka.ms/AzurePurview) was presented and made available in a public preview.

I have not had a chance to try the actual service, but I found a very interesting video (Microsoft mechanics video) where I took the following screenshots from.

Read on for Wolfgang’s thoughts. It’s definitely a step up from Azure Data Catalog.

Comments closed