Press "Enter" to skip to content

Day: February 11, 2022

Apache Flink and Delta Lake

Max Fisher and Dylan Gessner use Flink to load data in Delta Lake:

As with all parts of our platform, we are constantly raising the bar and adding new features to enhance developers’ abilities to build the applications that will make their Lakehouse a reality. Building real-time applications on Databricks is no exception. Features like asynchronous checkpointingsession windows, and Delta Live Tables allow organizations to build even more powerful, real-time pipelines on Databricks using Delta Lake as the foundation for all the data that flows through the Lakehouse.

However, for organizations that leverage Flink for real-time transformations, it might appear that they are unable to take advantage of some of the great Delta Lake and Databricks features, but that is not the case. In this blog we will explore how Flink developers can build pipelines to integrate their Flink applications into the broader Lakehouse architecture.

Click through for two methods of doing so.

Comments closed

Troubleshooting Networking Issues with SNITrace

Bob Dorr digs into SNITrace:

For my specific issue I was attempting to debug why the SQLDriverConnectW errored with TCP 10054 because it was failing the pre-login handshake.  For that I first needed to understand and capture the flow of the handshake activities and this is where SNITrace is helpful.

Read on to see what it does and how it works, as well as some nice Wireshark screen shots.

Comments closed

Incremental Refresh in Power BI Desktop

Soheil Bakhshi starts off a series on incremental refresh in Power BI:

Incremental refresh, or in short, IR, refers to loading the data incrementally, which has been around in the world of ETL for data warehousing for a long time. Let us discuss incremental refresh (or incremental data loading) in a simple language to better understand how it works.

Read on for the explanation as well as how you can implement it in Power BI Desktop. There are a lot of instructions here but they include publication and testing as well as development.

Comments closed

Solutions for Matching Supply with Demand

Itzik Ben-Gan continues reviewing solutions to a tricky problem:

Last month I covered a solution based on interval intersections, using a classic predicate-based interval intersection test. I’ll refer to that solution as classic intersections. The classic interval intersections approach results in a plan with quadratic scaling (N^2). I demonstrated its poor performance against sample inputs ranging from 100K to 400K rows. It took the solution 931 seconds to complete against the 400K-row input! This month I’ll start by briefly reminding you of last month’s solution and why it scales and performs so badly. I’ll then introduce an approach based on a revision to the interval intersection test. This approach was used by Luca, Kamil, and possibly also Daniel, and it enables a solution with much better performance and scaling. I’ll refer to that solution as revised intersections.

Read on for one class of solution which performed quite well.

Comments closed

Database Schema Types

Steve Jones explains schema types:

OLTP/Relational

The type of schema that many of us work with is the standard OLTP or relational model. We have lots of transaction tables, most should have a PK, some of which have PKs. The schema expands to meet different needs and can have lots of entities.

It may just be the time of morning but “Galaxy schema” sounds dumb specifically because the Kimball style of star schema implicitly includes what the galaxy schema shows. Dimensions are conformed, which means they apply across facts, which implies that there may be multiple facts in the schema design. This means that galaxy schemas necessarily star schemas. For the sake of education, we tend to focus on one fact table but a star schema with two fact tables is still a star schema.

Anyhow, that’s my minor rant of the day. It’s not Steve’s fault somebody misunderstood the concept of star schemas and began promulgating this unnecessary term.

Comments closed

Dynamic SQL No-Go

Kenneth Fisher can’t go in dynamic SQL and neither can you:

This is one of those things that when I look back on it seems really obvious. Note: If at the end of this it isn’t overly obvious to you that’s ok too. I do a lot of dynamic SQL and GO is one of my favorite commands.

Read on to understand why. I was going to “One minor clarification…” Kenneth about it being an SSMS command (implying that it’s not available elsewhere) but he successfully parried the attack en passant.

1 Comment