Press "Enter" to skip to content

Author: Kevin Feasel

Prod Data in Dev

Brent Ozar looks at survey results:

No matter which way you slice it, about half are letting developers work with data straight outta production. We’re not masking personally identifiable data before the developers get access to it.

It was the same story about 5 years ago when I asked the same question, and back then, about 2/3 of the time, developers were using production data as-is:

Brent covers some of the challenges involved, and I can add one more: the idea of environments gets really squishy when talking about data science. My development model still needs production data (unless the dev data has the same structural attributes and data distributions as prod), and I don’t really want to train different models in dev/test/prod because, even with the same default data, many algorithms are stochastic in nature: if I run it multiple times, I can end up with different results. And even if I can get the same results by re-running and using a consistent seed, that also introduces a structural instability because I’m relying on a specific seed.

In short, I agree with Brent: this is a tough nut to crack.

Comments closed

A Primer on Medallion Architecture in Microsoft Fabric

Kenneth Omorodion builds a warehouse:

Data warehouses are essential components of modern analytics systems, offering optimized storage and processing capabilities for large volumes of data. When integrated with a Lakehouse architecture, you can combine the best of both worlds—structured, schema-enforced data storage with the flexibility and scalability of data lakes. Microsoft Fabric provides an excellent environment for implementing the Medallion Architecture, a design pattern for building efficient data processing pipelines by layering data into bronze, silver, and gold zones.

Click through for the process.

Comments closed

The Power of Pre-Attentive Attributes

Elena Drakulevska is seeing pink elephants:

In a world packed with data, how do you make sure your key points don’t get lost in the noise?

Enter the Pink Elephant Principle—a concept that makes sure your most important elements stand out, like a big pink elephant in the middle of a room. It’s impossible to ignore, and that’s exactly what you want for the critical parts of your report!

The irony of this is that the historical term of seeing pink elephants is a person so drunk that he’s hallucinating. Humor of the term aside, Elena drives home a very important principle around ensuring you take advantage of pre-attentive attributes to ensure users see what’s important with the least cognitive effort.

Comments closed

Backing up SQL Server via T-SQL

I have a new video:

In this video, I show how to perform a variety of database backup operations via T-SQL, as well as how (and why) to back up to NUL and how to back up a database to a network share.

This one is not quite as lengthy as the prior video in the series: just 20 minutes instead of 30. That said, I do cover quite a bit of content around taking backups, something that every infrastructure DBA should be familiar doing.

Comments closed

Configuring Database Mail in Azure SQL MI

Andy Brownsword sends an e-mail:

SQL Agent jobs allow us to schedule and automate tasks on a SQL Server instance. Crucially, when things go wrong we need to know about them. That’s why we use notifications.

Setting up Operators and job Notifications is as expected on a Managed Instance. However, when it comes to sending the notifications we may have a challenge, as shown in the SQL Agent Error Logs:

Read on for the solution.

Comments closed

Value Filter Behavior in Power BI

Jeffrey Wang digs into a new feature:

The October 2024 Power BI update introduces an inconspicuous yet significant preview feature: Value Filter Behavior. This feature is activated by setting a new model-level property, ValueFilterBehavior, to Independent. The default setting of Automatic preserves the existing behavior, at least during the public preview period. This property controls how the DAX SUMMARIZECOLUMNS function behaves, which is central to most DAX queries generated by Power BI visuals.

Don’t just take my world for it — create any Power BI visual by adding columns, filters, and measures. If you are familiar with the Performance Analyzer or other tools that capture the DAX query issued by the visual, you will see something like this:

Read on for Jeffrey’s example and a dive into what’s going on.

Comments closed

Viewing Total Storage Consumption in Microsoft Fabric

Gilbert Quevauvilliers builds a report:

One of the things I have found when working with my customers in Microsoft Fabric is that there is currently no way to easily view the total storage for the entire tenant.

Not only that, but it would also be time consuming and quite a challenge to then find out what is consuming the storage. Could it be large files or tables or warehouse tables?

In this blog post I will show you how using a Notebook you can get details of the storage across your Microsoft Fabric Tenant.

Click through for an image of the Power BI report and how you can get there.

Comments closed

The Importance of Planning before Power BI Data Modeling

Kelly Broekstra recommends against jumping right in:

Who has been told by a manager or business person to just connect to the source data and start creating a new report? Here is my tip:

DON’T DO IT

All Power BI and Fabric reports must have a semantic model, which Microsoft describes as “a logical description of an analytical domain, with metrics, business-friendly terminology, and representation, to enable deeper analysis.” – Source

Read on to learn why and what you should instead do if you want to have a better long-term experience with Power BI.

Comments closed

Supply Chain Analysis in R via planr

Matt Dancho shows off an R package:

Supply chain management is all about balancing supply and demand to ensure that inventory levels are optimized. Overestimating demand leads to excess stock, while underestimating it causes shortages. Accurate inventory projections allow businesses to plan ahead, make data-driven decisions, and avoid costly errors like over-buying inventory or getting into a stock-outage and having no inventory to meet demand.

Read on to learn more about the package and how it works. H/T R-Bloggers.

Comments closed