Press "Enter" to skip to content

Curated SQL Posts

Showing KQL Queries

Dany Hoter looks at some KQL query plans:

Each visual on the page is going to summarize data from one or more queries and add the summarize part of the query.

If your model contains multiple tables in direct query with relations between them, the connector will generate joins between the tables.

Selecting values in filters will create multiple where conditions.

In order to see the final query and understand the performance implications of each query and the total query load created by a report, you need to use the command “.show queries” in the context of the database.

Click through for Dany’s notes on the topic, including a few tips on what to look for.

Comments closed

Listing Topics in Kafka without Zookeeper

The BIg Data in Real World team has a quick one for us:

Kafka uses Zookeeper to manage it’s internal state. So it is not possible to run Kafka without Zookeeper. Even if you don’t have access to Zookeeper in your organization, there is a Zookeeper cluster running which your Kafka cluster connects to.

So, how to list topics and execute other commands if we don’t have access to Zookeeper?

Eventually, this won’t even be a question, as Kafka already has production versions using KRaft, and by Kafka 4.0, there won’t be a Zookeeper to kick around anymore.

Comments closed

Data Inconsistency in Postgres HA Clusters

Umair Shahid gives us an overview:

While PostgreSQL is known for its robustness, scalability, and reliability, data inconsistency can occur in PostgreSQL clusters, which can cause issues and impact the overall performance of the system. In this blog, we’ll define data inconsistency in PostgreSQL clusters, discuss the challenges it poses, its causes, and provide some tips on how to prevent and resolve it if it occurs.

Click through for the article.

Comments closed

Building a Data Warehouse in Microsoft Fabric

Reza Rad continues a video series on Microsoft Fabric:

Microsoft Fabric Data Warehouse is a database system that stores data in OneLake and provides a medium to interact with the database using SQL commands. The Fabric Data Warehouse, which is also called Data Warehouse, or in short, Warehouse, also provides a powerful computing engine behind the scene to account for large volumes of data and support a fast-performing database system. The term Data Warehouse comes from the fact that this is not usually a place to store transactional data for an operational system (for that, you can use Azure SQL Database). A Data Warehouse, in generic Business Intelligence terminology, is a place where you would store the data that needs to be analyzed.

Reza also explains how the warehouse differs from a lakehouse.

Comments closed

Microsoft Fabric and Process Unification

Paul Andrew gets to the heart of things:

Moving on and assuming you have seen the event sessions, I want to give you my point of view to help explain what Microsoft Fabric is. Firstly, lets clear up call out was terminology to support this understanding. Is this software offering a resource, service, platform, or solution? To answer this question, perspective is key, perspective with a timeline (2018 to 2023). We could simply say that Microsoft Fabric is all these things. All things to all data professionals and beyond. But, to understand this, let’s consider the journey Microsoft has been on and how this technology has evolved. I believe this journey is the best way to help explain what Microsoft Fabric is, rather than focusing on all the new and shiny bits.

Click through for Paul’s take on the matter and how this whole area of “modern data warehousing” has evolved over the past several years in Azure.

Comments closed

Cosmos DB Serverless Scaling to 1TB

Hasan Savran shares the news:

Azure Cosmos DB’s Serverless option is a great way to save money if your application expects intermittent and unpredictable traffic with long idle times. I use serverless in developing, prototyping, and integrating with computing services such as Azure Functions.

     The limitation of Azure Cosmos DB serverless was a show-stopper if your solution needed scalability or a large storage. Cosmos DB announced that many of the limitations of the serverless option of Azure Cosmos DB are lifted in Build 2023.

Read on for the gist of these updates.

Comments closed

MVCC and Vacuuming in Postgres

Ryan Booz explains one area where Postgres’s implementation differs from most other vendors:

All relational databases handle transaction isolation in some way, typically with an implementation of Multi-version Concurrency Control (MVCC). Plain ‘ol, mainline SQL Server uses a form of MVCC, but all older rows (currently retained for ongoing transactions) are stored in TempDB. Oracle and MySQL also do something similar, storing (essentially) diffs of the modified data outside of the table that is merged at runtime for ongoing transactions that still need to see the older data.

Among these databases, PostgreSQL stands alone in the specific way MVCC is implemented. Rather than storing some form of the older data outside of the current table for transactions to query/merge/etc. at runtime, PostgreSQL always creates the newly modified row in-table alongside the existing, older versions that are still needed for running transactions. Yes, every UPDATE creates a new row of data in the table, even if you just change one column.

Read on to understand some of the implications of this and how it affects the way we manage databases.

Comments closed

Sketching before Charting

Alex Velez tries a few ideas:

It’s important to note that there isn’t a particular finding or insight that needs to be emphasized here. Instead, the goal for this visual is to provide the data in a digestible format, which will be part of a regularly updated report. That way, physicians and researchers can easily monitor any changes in the observations. 

I was unsure of the best way to approach this task, so I started sketching.

Click through to get Alex’s thought process while building a chart in Excel.

Comments closed

Data Governance and Microsoft Fabric

Matthew Roche digs deeper into data governance in Microsoft Fabric:

One of the most underappreciated benefits of Power BI as a managed SaaS data platform has been the “managed” part. When you create a report, dataset, dataflow, or other item in Power BI, the Power BI service knows everything about it. Power BI is the authoritative system for all items it contains, which means that Power BI can answer questions related to lineage (where does the data used by this report come from?) and impact analysis (where is the data in this dataset used?) and compliance (who has permissions to access this report?) and more.

If you’ve ever tried to authoritatively answer questions like these for a system of any non-trivial scope, you know how hard it is. Power BI has made this information increasingly available to administrators, through logs and APIs, and the community has built a wide range of free and paid solutions to help admins turn this information into insights and action. Even more excitingly, Power BI keeps getting better and better even as the newer parts of Fabric seem to be getting all of the attention.

Comments closed