2022-05-24 – Curated SQL

Low-Latency Flink

Published 2022-05-24 by Kevin Feasel

Jun Qin and Nico Kruber have started a series on low-latency streaming in Apache Flink. The first two posts of the series are up, starting with the overview:

Latency can refer to different things. LatencyMarkers in Flink measure the time it takes for the markers to travel from each source operator to each downstream operator. As LatencyMarkers bypass user functions in operators, the measured latencies do not reflect the entire end-to-end latency but only a part of it. Flink also supports tracking the state access latency, which measures the response latency when state is read/written. One can also manually measure the time taken by some operators, or get this data with profilers. However, what users usually care about is the end-to-end latency, including the time spent in user-defined functions, in the stream processing framework, and when state is accessed. End-to-end latency is what we will focus on.

Part 2 discusses direct latency optimization techniques:

When interacting with external systems (e.g., RDBMS, object stores, web services) in a Flink job for data enrichment, the latency in getting responses from external systems often dominates the overall latency of the job. With Flink’s Async I/O API (e.g., AsyncDataStream.unorderedWait() or AsyncDataStream.orderedWait()), a single parallel function instance can handle many requests concurrently and receive responses asynchronously. This reduces latencies because the waiting time for responses is amortized over multiple requests.

Stay tuned for more posts in the series.

Comments closed

The Future Object in Scala

Published 2022-05-24 by Kevin Feasel

Gulshan Singh visits from the future:

You have units of work that you want to run asynchronously, so you don’t block while they’re running. A future gives you a simple way to run an algorithm concurrently. A future starts running concurrently when you create it and returns a result at some point, well, in the future. In Scala, we call that a future returns eventually.
The Future instance is a handle to an eventually available result. You can continue doing other work until the future completes, either successfully or unsuccessfully.

You may also know of Futures as Promises. It’s quite similar to async calls in .NET as well.

Comments closed

Minimum Viable Data Mesh in Azure

Published 2022-05-24 by Kevin Feasel

Paul Andrew was on a podcast:

For Paul, delivering a single data mesh data product on its own is not all that valuable – if you are going to go to the expense of implementing data mesh, you need to be able to satisfy use cases that cross domains. And the greater value is in cross-domain interoperability, getting to a data product that wasn’t possible before. And, you need to deliver the data platform alongside those first 2-3 data products, otherwise you create a very hard to support data asset, not really a data product.
When thinking about minimum viable data mesh, Paul views an approach leveraging DevOps and generally CI/CD – or Continuous Integration/Continuous Deliver – as very crucial. You need repeatability/reproducibility to really call something a data product.

Click through for the interview as well as Scott Hirleman’s summary.

Comments closed

Allowing for Cross-Database Access via Module Signing

Published 2022-05-24 by Kevin Feasel

Tom Zika enables cross-database access:

I’ve recently had to revisit this topic and spent a lot of time recalling the details. So I’m writing this blog post mainly as a reminder for myself.
The most helpful part will be the diagram detailing all the components and their relation and a comprehensive example anyone can follow.
I’m not going to cover Module Signing in general (I’ll leave that to Solomon Rutzky).
Nor will I cover other ways to achieve Cross DB access (like Cross DB Ownership chaining) because this is superior from the security standpoint.

Click through to see a good way of providing access to cross-database resources without explicitly granting rights to users.

Comments closed

Power BI Fields Parameters

Published 2022-05-24 by Kevin Feasel

Marco Russo and Alberto Ferrari explain the concept of fields parameters:

Fields parameters is a feature that allows users to choose which column to use to slice and dice values in a Power BI visual. By creating a fields parameter you can very easily build a report where the user can slice by Brand and Category, as in the following figure.

Click through to learn more.

Comments closed

Counting Employees by Period with DAX

Published 2022-05-24 by Kevin Feasel

Matt Allington solves a common problem:

I’m calling this article, “How many employees by period”. Staff come and go for different reasons. In some companies, the number of staff can change over time. The principles used in this article can also be used in other instances. There can be staff moving in and out of departments, on and off of projects, etc. The technique can also be used to work out how many staff were on leave, how many off sick, how many tickets were open in a support queue, or any other concept that has a start and end date in a transactional table.

Read on for Matt’s answer but be sure to check out the comments as there are some other good solutions in there.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: May 24, 2022

Low-Latency Flink

The Future Object in Scala

Minimum Viable Data Mesh in Azure

Allowing for Cross-Database Access via Module Signing

Power BI Fields Parameters

Counting Employees by Period with DAX