Press "Enter" to skip to content

Author: Kevin Feasel

OPENROWSET and External Tables in Fabric SQL Databases

Hugo Queiroz makes a connection:

Data Virtualization brings to Fabric SQL Database the same set of capabilities already available on Azure SQL Database, Azure SQL Managed Instance and SQL Server, customers can now use OPENROWSET and External Tables, with complete parity across SQL flavors, develop once deploy anywhere. Data Virtualization for Fabric SQL Databases directly supports Parquet and delimited text (CSV), but JSON files can also be read using functions like JSON_VALUE and OPENJSON.

This is currently in preview. Read on to see what’s in the preview.

Leave a Comment

The Basics of Framing in Window Functions

Jared Westover wants a range:

In this article, we’ll explore the concept of framing in window functions. We’ll compare the differences between the ROWS and RANGE clauses and discuss when to choose one over the other. We’ll also highlight common pitfalls of framing and whether it applies to all types of window functions. By the end, you’ll better understand how framing works with window functions, making it seem less complex.

Click through for a primer on frames in window functions. Admittedly, if I were writing this article, I’d toss out most of the “pitfalls” section, as pitfalls 2 and 3 aren’t particularly relevant or pitfall-y (because SQL Server always defines a frame on a window function if you don’t). Instead, I’d add that there are some annoying limitations on RANGE frames, where the ANSI SQL standard allows you to use intervals like date or time when defining frames, so you can get records ranging from three hours ago to right now, for example.

But that said, it’s a good overview if you’re fairly new to window functions.

Leave a Comment

Log Truncation in Distributed Availability Groups

Paul Randal phones a friend:

This is a question that came in through email from a prior student, which I’ll summarize as: with a distributed availability group, what are the semantics of log truncation?

As I’m not an AG expert (I just know enough to be dangerous ha-ha), I asked Jonathan to jump in, as he is most definitely an AG expert! I’ll use AG for availability group and DAG for distributed availability group in the explanation below. Whether the AGs in question have synchronous or asynchronous replicas is irrelevant for this discussion.

Read on for the answer.

Leave a Comment

Common Star Schema Mistakes

Ben Richardson gets back to basics:

Sometimes the culprit isn’t actually your DAX, it’s your data model.

Star schema mistakes are incredibly common in Power BI, and really hard to track down.

When your data model isn’t a clean star schema, you end up with broken filters, confusing relationships and slow visuals.

It’s important to get it right from the start! So we broke down the top 10 most common mistakes people make, how to identify them and how to fix them!

This is where reviewing (or reading) Ralph Kimball’s Data Warehouse Toolkit can save you a lot of time and stress. The Microsoft data analytics world is predicated so heavily on Kimball-style dimensional modeling that the choices tend to be building a proper star schema up-front or spend processing and developer time trying to fix it in post-production using DAX or trickery.

Leave a Comment

Orchestration Options in Microsoft Fabric

Reitse Eskens moves some data:

Well, unless you enjoy waking up every night to start your Extract-Transform-Load (ETL) process and manually running each process to do some work, it’s a smart move to automate this. Also, make sure everything always runs in the correct order. Additionally, there are situations where processes need to run in different configurations.

All these things can be done with what we call orchestration. It may sound a bit vague now, but we’ll get to the different moving parts of this, like parameterisation and pipelines.

Read on for a primer on the topic.

Leave a Comment

Idempotence and Durable Execution

Jack Vanlightly does some thinking:

Determinism is a key concept to understand when writing code using durable execution frameworks such as Temporal, Restate, DBOS, and Resonate. If you read the docs you see that some parts of your code must be deterministic while other parts do not have to be.  This can be confusing to a developer new to these frameworks. 

This post explains why determinism is important and where it is needed and where it is not. Hopefully, you’ll have a better mental model that makes things less confusing.

Some of the examples Jack includes are pretty tricky, showing just how difficult it can be to ensure that multiple, independent systems are all on the same page.

Leave a Comment

Generating Exponential Random Numbers in T-SQL

Sebastiao Pereira generates more artificial data:

Generating random numbers from an exponential distribution is essential for queuing theory, reliability engineering, physics, finance modeling, failure analysis, Poisson process, simulation and Monte Carlo methods, computer graphics, and games. Is it possible to have a Random Exponential Gaussian Numbers function in SQL Server without use of external tools?

As always, I love this series because these examples are complex enough not to be trivial, yet perform well enough to work in real-world environments.

Leave a Comment

Tips for the Import Data Option in SQL Server

Andy Brownsword doesn’t trust wizards, with their pointy caps and long beards:

If you need to create a copy of a table in another database, the ‘Import Data’ option may seem convenient. If you’ve used this method to copy to your dev environment and found things break, this post is for you.

Click through for some solid advice on how to import that data. Another thing I would sometimes do is coerce all of the input columns to long strings and load it into a staging table. Then, I could use T-SQL to re-shape the data however I needed it rather than trying to get a finicky SSIS flow to translate this date and time combination (or whatever) appropriately.

Leave a Comment

Copy Job in Fabric Data Factory Pipelines now GA

Jianlei Shen makes an announcement:

Copy Job Activity allows you to run Copy jobs as native activities inside Data Factory pipelines.

Copy jobs are created and managed independently in Data Factory for quick data movement between supported sources and destinations. With Copy job Activity, that same fast, lightweight experience is now embedded within pipelines, making it easier to automate, schedule, and chain Copy jobs as part of broader data workflows.

Read on for an overview of what’s in the activity and a few links on how to get started with it.

Leave a Comment