Press "Enter" to skip to content

Author: Kevin Feasel

Understanding Write-Ahead Logging

Kevin Sookocheff explains how write-ahead logging protects data in databases:

A central tenet of databases is that any committed data survives a crash or a failure. Write-ahead logging is a fundamental primitive that ensures all changes to data are first written safely to stable storage before being applied. Coupling that with some careful use of sequence numbers and we can guarantee that changes made to a database can survive system crashes.

This is a core feature in pretty much every relational database and Kevin dives into how one of the key algorithms behind it works.

Comments closed

RELATED() and RELATEDTABLE() in DAX

Alberto Ferrari and Marco Russo add some context:

RELATED is one of the most commonly used DAX functions. You use RELATED when you are scanning a table, and within that row context you want to access rows in related tables. RELATEDTABLE is the companion of RELATED, and it is used to traverse relationships in the opposite direction. When learning DAX, it is easy to get confused and use RELATED when it is not necessary, or to forget about RELATEDTABLE. In this article, we describe the most common uses of the two functions, along with common misperceptions.

Click through to learn more about these two functions.

Comments closed

KQL BETWEEN

Robert Cain proves it’s not the end of the line in his KQL series:

It’s not uncommon to want to use a range of values when creating a Kusto query. This might be a range of numeric values, or perhaps a range of dates.

Kusto provides this ability using the between operator. In this post we’ll see how to use it when authoring your Kusto queries.

Click through to see how you can use between as well as logical alterations such as not between.

Comments closed

NESTING_TRANSACTION_FULL Latches

Paul White dives into latch contention:

This design has its roots in SQL Server 7, where read-only query parallelism was introduced. SQL Server 2000 built on this with parallel index builds, which for the first time allowed multiple threads to cooperate to change a persistent database structure. Many improvements have followed since then, but the fundamental parent-child transaction design remains today.

Though lightweight, a latch can become a point of contention when requested sufficiently frequently in incompatible modes by many different threads. Some contention on shared resources is to be expected; it becomes a problem when latch waits start to affect CPU utilisation and throughput.

Read the whole thing, as Paul dives into the latch design, provides an alternative design, and tests the alternative.

Comments closed

DAX EVALUATEANDLOG() Function Outputs

Jeffrey Wang continues a series on EVALUATEANDLOG():

Last week, we learned how to interpret the output of the EvaluateAndLog function of scalar expressions. Albeit interesting, the new function would be much less useful if it could only return scalar values. After all, Power BI users can always define a scalar expression as a measure and then study its values in various contexts. The real power of EvaluateAndLog is that it can also wrap around any table expressions, the output of which was much harder to visualize until now.

This function exposes a lot of information, as you can see in the post.

Comments closed

Type 1 SCDs in Power BI

Soheil Bakhshi grabs some Excel data from SharePoint:

We have a retail company selling products. The company releases the list of products in Excel format, including list price and dealer price, every year. The product list is released on the first day of July when the financial year starts. We have to implement a Power BI solution that keeps the latest product data to analyse the sales transactions. The following image shows the Product list for 2013:

So each year, we receive a similar Excel file to the above image. The files are stored on a SharePoint Online site.

Read on to see how it works. Of course, the data source itself doesn’t affect how you implement slowly-changing dimensions, so the technique Soheil shows applies to a broad number of use cases.

Comments closed

Getting Status of Power BI Enhanced Refreshes

Chris Webb wants to know the situation, STAT:

So far in this series (see part 1part 2 and part 3) I’ve looked at how you can create a Power Automate custom connector that uses the Power BI Enhanced Refresh API to kick off a dataset refresh. That’s only half the story though: once the refresh has been started you need to know if it has finished and, if so, whether it finished successfully or not. In this post I’ll show how to do this.

Read on to see how.

Comments closed

Apache Flink Updates

Danny Cranmer announces Flink 1.15.2:

The Apache Flink Community is pleased to announce the second bug fix release of the Flink 1.15 series.

This release includes 30 bug fixes, vulnerability fixes, and minor improvements for Flink 1.15. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.

We highly recommend all users upgrade to Flink 1.15.2.

In addition to that, Jingsong Lee announces Flink Table Store 0.2.0:

Flink Table Store is a data lake storage for streaming updates/deletes changelog ingestion and high-performance queries in real time.

As a new type of updatable data lake, Flink Table Store has the following features:

– Large throughput data ingestion while offering good query performance.

– High performance query with primary key filters, as fast as 100ms.

– Streaming reads are available on Lake Storage, lake storage can also be integrated with Kafka to provide second-level streaming reads.

Read on for the changes in both platforms.

Comments closed

Ingestion from S3 into Azure Data Explorer

Anshul Sharma announces another source for Azure Data Explorer:

Today we are excited to launch the ability to ingest data from Amazon Simple Storage Service (S3)  into Azure Data Explorer (ADX) natively. 

Amazon S3 is one of the most popular object storage services. AWS Customers use Amazon S3 to store data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, applications, IoT devices, log analytics and big data analytics. 

Azure Data Explorer (ADX) is a fully managed, high-performance, big data analytics platform that makes it easy to analyze high volumes of data in near real time.  ADX supports ingesting data from a wide variety of sources such as Azure Blob, ADLS gen2, Azure Event Hub, Azure IoT Hub, and with popular open-source technologies such as Kafka, Logstash, Telegraph. With the new S3 support, customers can bring data from S3 natively without relying on complex ETL pipelines. 

Between this, ADF/Synapse pipelines, and SQL Server 2022, it seems that Microsoft got the message that people do use multiple clouds and do want to read AWS data in Azure. Which is good because that directly benefits me…

Comments closed