Press "Enter" to skip to content

Curated SQL Posts

Documenting a Tabular Model

Olivier Van Steenlandt builds the docs:

A few months ago, I chatted with colleagues about our Tabular Model. More specifically the lack of Tabular Model documentation. Since we were thinking about replacing our current model, I started to think about how to integrate documentation easily.

Having documentation is 1 thing, making sure it’s used is something completely different. And then we’re not even talking about keeping it up to date. My initial idea was to include the documentation task during the development phase. That said, time to get the thoughts into practice.

Read on to see what Olivier did.

Comments closed

Joining on Overlapping Date Ranges in T-SQL

Daniel Hutmacher crosses the streams:

You can get into a situation where you have two tables with values associated with date ranges. What’s worse, those date ranges don’t necessarily have to align, which can make joining them a seemingly complex task, but it is surprisingly simple when you learn how to think of overlapping date ranges, along with this relatively simple T-SQL join pattern.

This problem gets even more challenging if you have the possibility of multiple overlaps and you want to find the combination with the biggest overlap for each individual item.

Comments closed

All about Lakehouses in Microsoft Fabric

Tomaz Kastrun gives us the skinny with multiple posts in his Advent of Microsoft Fabric. Day 3 introduces the lakehouse:

Lakehouse is cost-effective and optimised storage, supporting all types of data and file formats, structured and unstructured data, and helps you govern the data, giving you better data governance. With optimised and concurrent reads and writes, it gives outstanding performance by also reducing data movement and minimising redundant copy operations. Furthermore, it gives you a user-friendly multitasking experience in UI with retaining your context, not losing your running operations and working on multiple things, without accidentally stopping others.

Day 4 covers Delta format:

Yesterday we looked into lakehouse and learned that Delta tables are the storing format. So, let’s explore what and how we can go around understanding and working with delta tables. But first we must understand delta lake.

Day 5 covers data ingest:

We have learned about delta lake and delta tables. But since we have uploaded the file directly, let’s explore, how we can also get the data into lakehouse.

Click through for all three posts.

Comments closed

Producing Messages with librdkafka

Jakub Korab dives into a Kafka library:

In a previous blog post (How To Survive an Apache Kafka® Outage) I outlined the effects on applications during partial or total Kafka cluster outages and proposed some architectural strategies to handle these types of service interruptions. The applications most heavily impacted by this type of outage are external interfaces that receive data, do not control request flow, and possibly perform some form of business transaction with the outside world before producing to Kafka. These applications are most commonly found in finance and written in languages other than Java—mostly C and C++. 

librdkafka is the main underlying client library used in non-JVM environments and has wrapper libraries for Python, .Net, Go, and an ever-expanding list of clients. It has not been written about to the same extent as the Java client, and it is worth examining as its interface and underlying mechanics are fundamentally different. 

This library is quite useful and versatile.

Comments closed

Integrating Azure ML and Power BI

I have a new video:

In this video, I show off how easy it is to integrate Azure ML and Power BI, at least once you get past all of the trouble trying to integrate them.

I expected this to be easy. It turns out that the “make it look easy” depends on having several things in place already and using the correct (by which I mean “old”) deployment type.

Comments closed

Optimizing Sort Operators in Window Functions

Andy Brownsword talks about window function query tuning:

We’re on quite a roll with window functions these past few weeks. Last week we looked at the operators we’d see in execution plans when using a window function. This week I wanted to tackle one of the more troublesome ones specifically: the Sort operator.

We know that sort operators are expensive in our queries. To use a window function our data needs to be sorted. How about if we need multiple functions? What if we’d like the output sorted too? Can we optimise any of those out of the execution plan?

Read on for several tips.

Comments closed

Finding the Local Port Number for Power BI Desktop

Soheil Bakhshi updates an older post:

In March 2018, I wrote a blogpost called Four Different Ways to Find Your Power BI Desktop Local Port Number. Last week, Zoe Doughlas from Microsoft left a comment reminding me of a fifth method to get the port which encouraged me to write this quick tip. Thanks to Zoe!

As the name suggests, the blog was about finding Power BI Desktop’s local port number. If you do not have any clue what I mean by local port number, I strongly suggest reading that blog.

Read on to see what that fifth method is.

Comments closed

Accessing PostgreSQL from Python

Semab Tariq connects to Postgres:

Psycopg2, a PostgreSQL adapter for Python, implements the Python Database API Specification v2.0, acting as a bridge between Python applications and PostgreSQL databases. It leverages the libpq library, the official PostgreSQL C interface, to facilitate efficient communication. Psycopg2 provides a robust set of features, including transaction management, and support for PostgreSQL-specific data types. Its implementation of the Python DB API ensures seamless integration, enabling developers to execute SQL queries and transactions with precision in Python applications.

Read on to see how it works using a variety of examples.

Comments closed

ANSI and SET Options in Stored Procedures

Erik Darling starts a new series:

This aligns my stored procedures with the necessary settings to accomplish a couple things:

  • Allow the optimizer to use indexed views, filtered indexes, and computed columns
  • Avoid errors when modifying tables involved with indexed views, filtered indexes, and computed columns

Click through to see what Erik sets by default. It’s a good list, though a bit more than I think to do. Probably because I haven’t been burned enough yet.

Comments closed