Press "Enter" to skip to content

Author: Kevin Feasel

Log Truncation in Distributed Availability Groups

Paul Randal phones a friend:

This is a question that came in through email from a prior student, which I’ll summarize as: with a distributed availability group, what are the semantics of log truncation?

As I’m not an AG expert (I just know enough to be dangerous ha-ha), I asked Jonathan to jump in, as he is most definitely an AG expert! I’ll use AG for availability group and DAG for distributed availability group in the explanation below. Whether the AGs in question have synchronous or asynchronous replicas is irrelevant for this discussion.

Read on for the answer.

Comments closed

Common Star Schema Mistakes

Ben Richardson gets back to basics:

Sometimes the culprit isn’t actually your DAX, it’s your data model.

Star schema mistakes are incredibly common in Power BI, and really hard to track down.

When your data model isn’t a clean star schema, you end up with broken filters, confusing relationships and slow visuals.

It’s important to get it right from the start! So we broke down the top 10 most common mistakes people make, how to identify them and how to fix them!

This is where reviewing (or reading) Ralph Kimball’s Data Warehouse Toolkit can save you a lot of time and stress. The Microsoft data analytics world is predicated so heavily on Kimball-style dimensional modeling that the choices tend to be building a proper star schema up-front or spend processing and developer time trying to fix it in post-production using DAX or trickery.

Comments closed

Orchestration Options in Microsoft Fabric

Reitse Eskens moves some data:

Well, unless you enjoy waking up every night to start your Extract-Transform-Load (ETL) process and manually running each process to do some work, it’s a smart move to automate this. Also, make sure everything always runs in the correct order. Additionally, there are situations where processes need to run in different configurations.

All these things can be done with what we call orchestration. It may sound a bit vague now, but we’ll get to the different moving parts of this, like parameterisation and pipelines.

Read on for a primer on the topic.

Comments closed

Idempotence and Durable Execution

Jack Vanlightly does some thinking:

Determinism is a key concept to understand when writing code using durable execution frameworks such as Temporal, Restate, DBOS, and Resonate. If you read the docs you see that some parts of your code must be deterministic while other parts do not have to be.  This can be confusing to a developer new to these frameworks. 

This post explains why determinism is important and where it is needed and where it is not. Hopefully, you’ll have a better mental model that makes things less confusing.

Some of the examples Jack includes are pretty tricky, showing just how difficult it can be to ensure that multiple, independent systems are all on the same page.

Comments closed

Generating Exponential Random Numbers in T-SQL

Sebastiao Pereira generates more artificial data:

Generating random numbers from an exponential distribution is essential for queuing theory, reliability engineering, physics, finance modeling, failure analysis, Poisson process, simulation and Monte Carlo methods, computer graphics, and games. Is it possible to have a Random Exponential Gaussian Numbers function in SQL Server without use of external tools?

As always, I love this series because these examples are complex enough not to be trivial, yet perform well enough to work in real-world environments.

Comments closed

Tips for the Import Data Option in SQL Server

Andy Brownsword doesn’t trust wizards, with their pointy caps and long beards:

If you need to create a copy of a table in another database, the ‘Import Data’ option may seem convenient. If you’ve used this method to copy to your dev environment and found things break, this post is for you.

Click through for some solid advice on how to import that data. Another thing I would sometimes do is coerce all of the input columns to long strings and load it into a staging table. Then, I could use T-SQL to re-shape the data however I needed it rather than trying to get a finicky SSIS flow to translate this date and time combination (or whatever) appropriately.

Comments closed

Thoughts on Data Modeling

Steve Jones has a two-fer. First up, he asks an opinion question about data modeling:

Recently, I had a few questions on database modeling. One was posted in the SQL Server Central forums, and a customer asked about ERD tooling on the same day. This came shortly after Redgate acquired Vertabelo (now Redgate Data Modeler). This stood out to me as very rarely in the last few years have I found people consulting and updating a diagram while performing database development.

Second, he takes a peek at a tool Redgate purchased:

Redgate acquired a data modeling tool from Vertabelo recently and I wanted to explore how it works. This is a short look at this tool and how it might be useful in working with databases.

My experience with data modeling has been that only the really large companies did a lot of work with upfront data modeling and keeping logical models up to date. It’s still quite useful for data warehouses, and that’s where the people I know who do a lot of data modeling make their living. But I find it’s too much of a hassle in fast-paced environments, especially when I can keep most or all of the data model in my head and I’m the person managing it all.

Essentially, data models are useful to the extent that they’re approximately true. But because they quickly get out of sync with reality, they quickly go from “quite useful” to “dirty lies.”

Comments closed

Copy Job in Fabric Data Factory Pipelines now GA

Jianlei Shen makes an announcement:

Copy Job Activity allows you to run Copy jobs as native activities inside Data Factory pipelines.

Copy jobs are created and managed independently in Data Factory for quick data movement between supported sources and destinations. With Copy job Activity, that same fast, lightweight experience is now embedded within pipelines, making it easier to automate, schedule, and chain Copy jobs as part of broader data workflows.

Read on for an overview of what’s in the activity and a few links on how to get started with it.

Comments closed

Row Counts and Execution Time for Active SQL Server Queries

Kendra Little wants to know what’s happening right now with this query:

I frequently need to see rowcounts and execution time for queries while they’re running. Maybe I’m troubleshooting a slow query that’s still executing, or I want to understand which operators are causing the slowdown before the query completes.

Last week at the PASS Summit I learned some little nuances about how this works that I’d missed.

Click through to learn what Kendra learned (and now what I learned).

Comments closed