Press "Enter" to skip to content

Curated SQL Posts

Dealing with the Lack of Identity Columns in Microsoft Fabric

Nikola Ilic forges a new identity:

If you’ve ever worked with traditional relational database management systems (RDBMS) and/or data warehouses, and you’re now trying to be a “modern data platform professional” and apply your skills in Microsoft Fabric, you may find yourself in uncharted territory. Not only because of the SaaS-ification of the environment, but also due to many puzzling “solutions”, or maybe it’s better to say – lack of the features that we were taking for granted in the “previous” (pre-Fabric) life.

The goal of this article is to introduce you with different approaches for overcoming the limitation of non-existency of the identity columns in Microsoft Fabric. Please keep in mind that all of these approaches are considered workarounds and it may happen that Microsoft in the future provide the out-of-the-box solution

Missing the identity column attribute can be a bit annoying when building out dimensions, so Nikola provides a few tips on how to emulate this functionality.

Comments closed

Tips for Orchestrating Fabric Notebooks

Stepan Resl talks orchestration:

Let’s start by introducing what orchestration is and why it’s important to talk about shared resources. Orchestration is a discipline focused on managing and coordinating individual items or control elements to collectively manage the flow of our data operations. In the context of Fabric, this involves managing notebooks, dataflows, pipelines, stored procedures, semantic model updates, and many other items, activities, and services that may even be outside of Fabric.

Read on for some of the options, how they work in Microsoft Fabric, and tips for success.

Comments closed

Invoking a Fabric Data Factory Pipeline from a Parent Pipeline

Andy Leonard takes us through a design pattern:

In an earlier post, I demonstrated one way to build a basic parent-child design pattern in Fabric Data Factory by calling one pipeline (child) from another (parent). In this post, I modify the parent and child pipelines to demonstrate calling a child pipeline that contains a parameter. In this post, we will:

  • Clone and edit the child pipeline
  • Clone and edit the parent pipeline
  • Test

Read on to see how it works.

Comments closed

Smoothing Functions in R

Ivan Svetunkov puts on the forecasting hat:

I have been asked recently by a colleague of mine how to extract the variance from a model estimated using adam() function from the smooth package in R. The problem was that that person started reading the source code of the forecast.adam() and got lost between the lines (this happens to me as well sometimes). Well, there is an easier solution, and in this post I want to summarise several methods that I have implemented in the smooth package for forecasting functions. In this post I will focus on the adam() function, although all of them work for es() and msarima() as well, and some of them work for other functions (at least as for now, for smooth v4.1.0). Also, some of them are mentioned in the Cheat sheet for adam() function of my monograph (available online).

Read on to learn more. H/T R-Bloggers.

Comments closed

Combining Data Frames with Differing Columns in R

Steven Sanderson does a bit of merging:

Combining data frames is a fundamental task in data analysis, especially when dealing with datasets that have different structures. In R, there are several ways to achieve this, using base R functions, the dplyr package, and the data.table package. This guide will walk you through each method, providing examples and explanations suitable for beginner R programmers. This article will explore three primary methods in R: base R functions, dplyr, and data.table. Each method has its advantages, and understanding them will enhance your data manipulation skills.

There are quite a few examples here, depending on whether you intend to join the datasets or perform a set operation such as union or intersect.

Comments closed

Parameterized Queries with dbatools

Thom Andrews builds a query:

Many of us are likely aware of PowerShell, even if we don’t use it too frequently, and I suspect that if you’re reading this post you’re also familar with things like sqlcmd. Hopefully, you have also heard of DbaTools, a module for PowerShell (and if you haven’t, hopefully that’s why you’re here). Today, I wanted to discuss running parametrised queries (including table type parameters) from PowerShell, which is notorious hard/impossible with sqlcmd (or invoke-SqlCmd), using the DbaTools module.

Click through for examples building up from zero parameters up to a table of parameters.

Comments closed

Reading Parquet Files in R with nanoparquet

Stephen Turner reads some data:

In these slides I also learned about the nanoparquet package — a zero dependency package for reading and writing parquet files in R. Besides all the benefits noted above, parquet is much faster to read and write. And, as opposed to saving as .rds, parquet can easily be passed back and forth between R, Python, and other frameworks.

Let’s take a look at how reading and writing parquet files compares with CSV, either with base R or readr.

Stephen shows one of the best-case scenarios for Parquet: lots of data (100 million rows), relatively few columns, no long strings, etc. That leads to a massive improvement over using CSVs, even if you ignore the metadata and formatting benefits. I wouldn’t expect the benefits to be nearly as significant with wide text columns and very little value overlap, but that’s also pretty uncommon for the type of dataset we’re analyzing in R.

Comments closed

Adding Row Numbers to a SQL Query

Steve Jones enumerates a result set:

I’m going to use some fun data for me. I’ve been tracking my travels, since I’m on the road a lot. I’m a data person and part of tracking is trying to ensure I’m not doing too much. Just looking at the data helps me keep perspective and sometimes cancel (or decline) a trip.

In any case, you don’t care, but I essentially have this data in a table. As you can see, I have the date of travel, the city, area, etc. I also have a few flags as to whether I was traveling that day, if I spent a night away from home, and how far I was.

Read on for a few trials with ROW_NUMBER().

Comments closed

Cloud Connections in Microsoft Fabric

Dennes Torres makes a connection:

wrote about cloud connections when they were in a very early stage.

Cloud connections evolved and are now sharable. We call the “regular” connection as “personal connection”.

The problem with the “personal connections” is the difficult to make teamwork. The personal connections belong to you and different developers can’t use them. When a different developer needs to work with the same objects, they are required to create their own connection.

Using cloud connections, we can create a single, reusable connection to the data source and share it with all the developers in the team.

Read on to learn more about how they work now that the feature is a bit more mature.

Comments closed