Press "Enter" to skip to content

Curated SQL Posts

Building a Data Warehouse in Microsoft Fabric

Reza Rad continues a video series on Microsoft Fabric:

Microsoft Fabric Data Warehouse is a database system that stores data in OneLake and provides a medium to interact with the database using SQL commands. The Fabric Data Warehouse, which is also called Data Warehouse, or in short, Warehouse, also provides a powerful computing engine behind the scene to account for large volumes of data and support a fast-performing database system. The term Data Warehouse comes from the fact that this is not usually a place to store transactional data for an operational system (for that, you can use Azure SQL Database). A Data Warehouse, in generic Business Intelligence terminology, is a place where you would store the data that needs to be analyzed.

Reza also explains how the warehouse differs from a lakehouse.

Leave a Comment

Microsoft Fabric and Process Unification

Paul Andrew gets to the heart of things:

Moving on and assuming you have seen the event sessions, I want to give you my point of view to help explain what Microsoft Fabric is. Firstly, lets clear up call out was terminology to support this understanding. Is this software offering a resource, service, platform, or solution? To answer this question, perspective is key, perspective with a timeline (2018 to 2023). We could simply say that Microsoft Fabric is all these things. All things to all data professionals and beyond. But, to understand this, let’s consider the journey Microsoft has been on and how this technology has evolved. I believe this journey is the best way to help explain what Microsoft Fabric is, rather than focusing on all the new and shiny bits.

Click through for Paul’s take on the matter and how this whole area of “modern data warehousing” has evolved over the past several years in Azure.

Leave a Comment

Cosmos DB Serverless Scaling to 1TB

Hasan Savran shares the news:

Azure Cosmos DB’s Serverless option is a great way to save money if your application expects intermittent and unpredictable traffic with long idle times. I use serverless in developing, prototyping, and integrating with computing services such as Azure Functions.

     The limitation of Azure Cosmos DB serverless was a show-stopper if your solution needed scalability or a large storage. Cosmos DB announced that many of the limitations of the serverless option of Azure Cosmos DB are lifted in Build 2023.

Read on for the gist of these updates.

Leave a Comment

MVCC and Vacuuming in Postgres

Ryan Booz explains one area where Postgres’s implementation differs from most other vendors:

All relational databases handle transaction isolation in some way, typically with an implementation of Multi-version Concurrency Control (MVCC). Plain ‘ol, mainline SQL Server uses a form of MVCC, but all older rows (currently retained for ongoing transactions) are stored in TempDB. Oracle and MySQL also do something similar, storing (essentially) diffs of the modified data outside of the table that is merged at runtime for ongoing transactions that still need to see the older data.

Among these databases, PostgreSQL stands alone in the specific way MVCC is implemented. Rather than storing some form of the older data outside of the current table for transactions to query/merge/etc. at runtime, PostgreSQL always creates the newly modified row in-table alongside the existing, older versions that are still needed for running transactions. Yes, every UPDATE creates a new row of data in the table, even if you just change one column.

Read on to understand some of the implications of this and how it affects the way we manage databases.

Leave a Comment

Sketching before Charting

Alex Velez tries a few ideas:

It’s important to note that there isn’t a particular finding or insight that needs to be emphasized here. Instead, the goal for this visual is to provide the data in a digestible format, which will be part of a regularly updated report. That way, physicians and researchers can easily monitor any changes in the observations. 

I was unsure of the best way to approach this task, so I started sketching.

Click through to get Alex’s thought process while building a chart in Excel.

Leave a Comment

Data Governance and Microsoft Fabric

Matthew Roche digs deeper into data governance in Microsoft Fabric:

One of the most underappreciated benefits of Power BI as a managed SaaS data platform has been the “managed” part. When you create a report, dataset, dataflow, or other item in Power BI, the Power BI service knows everything about it. Power BI is the authoritative system for all items it contains, which means that Power BI can answer questions related to lineage (where does the data used by this report come from?) and impact analysis (where is the data in this dataset used?) and compliance (who has permissions to access this report?) and more.

If you’ve ever tried to authoritatively answer questions like these for a system of any non-trivial scope, you know how hard it is. Power BI has made this information increasingly available to administrators, through logs and APIs, and the community has built a wide range of free and paid solutions to help admins turn this information into insights and action. Even more excitingly, Power BI keeps getting better and better even as the newer parts of Fabric seem to be getting all of the attention.

Leave a Comment

Troubleshooting an Occasionally Poorly Performing Query in Oracle

Jura Bratina looks at the numbers:

A client asked us to verify why an INSERT AS SELECT statement, which was part of a scheduled ETL job executing on a single instance V11.2 database running as a data warehouse, suddenly started to take longer than usual. The database doesn’t have the Diagnostic and Tuning license, so Statspack and an external 3rd party application are being used to monitor the performance.

Okay, so I like making fun of Oracle licensing as much as (well, more than) the next guy, but seriously, a Diagnostic and Tuning license? Anyhow, the post isn’t about Oracle’s quest to find a more efficient way to vacuum money out of your bank account, but instead a performance troubleshooting approach when performance is intermittently bad. Spoilers: noisy neighbors. It’s usually noisy neighbors.

Leave a Comment

SSMS 19.1 Startup Performance Improvements

Glenn Berry breaks out the stopwatch:

SSMS 18.X cold startup performance is a big issue for me, since I regularly have to start SSMS 18.X on various small VMs running on older, slower server-class processors. BTW, “cold startup” is the first time that a program is loaded into a process since the last reboot of the operating system, so nothing from that program is already loaded in memory.

The hero image alone convinced me to install 19.1.

Leave a Comment

The Value of QUOTENAME

Quoth Chad Callihan, “Occasionally more”:

QUOTENAME can be used to make sure database objects are valid in your query. Most of the time, objects like table names only contain valid characters, so there’s nothing to worry about. But nobody’s perfect. Let’s look at an example of what can happen when somebody creates a table with a forward slash in the name and see how QUOTENAME can be used to query against it.

QUOTENAME is also a good way of preventing SQL injection, though I still prefer appropriate use of exec sp_executesql in any case in which it’s possible to use.

Leave a Comment

Databricks SQL Performance Tuning

Katie Cummiskey provides some tips for us:

We previously discussed how to use Power BI on top of Databricks Lakehouse efficiently. However, the well-designed and efficient Lakehouse itself is the basement for overall performance and good user experience.  We will discuss recommendations for physical layout of Delta tables, data modeling, as well as recommendations for Databricks SQL Warehouses.

These tips and techniques proved to be efficient based on our field experience. We hope you will find them relevant for your Lakehouse implementations too.

Read on for these tips.

Leave a Comment