Press "Enter" to skip to content

Month: October 2024

Client Diagnostics in Cosmos DB

Arthur Daniels digs into diagnostic data:


This topic keeps coming up with my customers so the purpose of this blog post is to keep all the useful information in one post. Think of this post as a consolidation of different pieces of information. If you’re stuck on what “pipelined”, “created”, or “transit time” means, you’re in the right spot.

We are not talking about Diagnostic Settings/Log Analytics in this post, I’d describe those as server-side rather than client-side diagnostics. They are useful in different ways, our client diagnostics will help us understand the path that a request takes from the application to Cosmos DB and back, along with any latency on that path.

Note: before we get started, check to see if you’re using Direct or Gateway mode. Ideally sending your requests Direct mode in the .NET or Java SDKs will usually result in faster requests.

Read on to see how to retrieve and interpret this diagnostic data.

Leave a Comment

Exploring Semantic Model Relationships with Sempy

Prathy Kamasani builds a graph:

Understanding the relationships between datasets is crucial in data analytics, especially in the world of self-service BI. Sempy, a Python library unique to Microsoft Fabric, allows users to visualise these relationships seamlessly. This post explores using Sempy to visualise semantic model relationships and view them in a Power BI Report. Viewing them in Notebook is easy and has been documented on MS Docs.

Click through for a notebook and explanation of the underlying code.

Leave a Comment

Why Take Backups in SQL Server?

I’ve started a new video series:

In this video, I explain why it is so important to back up your databases. From there, we see one technique for taking these backups in SQL Server: using SQL Server Management Studio. We review some of the backup metadata you can collect and also see how to compress and encrypt backups.

The first few videos in the series have all gone a bit longer than I expected, and that’s even with me pointedly ignoring most everything to do with tape drives.

Leave a Comment

Data Masking in Snowflake

Kevin Wilkie puts on the mask:

The premise behind data masking is fairly simple – make it so that only certain people can see the data in a certain column. For example, social security numbers or credit card numbers probably shouldn’t be seen by everyone who has access to your database.

I like to set it up as follows: (Obviously, this can change depending on your business’ policies.)

  1. Admins have full access.
  2. Call Center Representatives have access to the last 4 digits of the social.
  3. All others see nothing useful.

Read on to see how. I’ll even save my normal “data masking isn’t really a security feature” rant. This time.

Leave a Comment

Data Types and Arithmetical Calculations in DAX

Marco Russo and Alberto Ferrari do some counting:

The VertiPaq engine is basically data type-independent. This means that it does not matter whether a column is a string, a floating point, or a date: because of the dictionary encoding happening inside VertiPaq, all these data types use around the same amount of memory and perform at nearly the same speed.

However, when mixing different data types in the same expression, DAX will likely need to perform conversions between data types. Some of these conversions are nearly free, whereas others require the intervention of the formula engine, with a related performance impact.

We have already written about possible errors occurring during data type conversion here: Understanding numeric data type conversions in DAX and here: Rounding errors with different data types in DAX. The issue with conversion errors is mostly due to the fact that the precisions of fixed decimals (also known as Currency) and decimals (also known as floating point) are different. This article starts with a focus on performance.

Read on to see what Marco and Alberto have for us this time.

Leave a Comment

The Most Important Tool for a Data Detective

Andy Yun wants you to use your earholes:

The All Powerful…

… Question. That is what I now believe is the most important tool for a Data Detective.

Asking Questions Effectively

This nuance involves HOW you ask a question. Some of this involves knowing your audience. Is this the right place or the right time? Sometimes there comes a point where asking questions is just counter-productive because your audience has no interest in answering. And it also means you need to make sure you’re asking the correct audience in the first place.

Asking questions is difficult, so instead, I just strawman my way to success.

Leave a Comment

Building a Data Detective Toolkit

Deb Melkin talks tools:

Happy T-SQL Tuesday! I wasn’t really sure I’d be able to crank something out for this one but somehow I managed to squeeze it in. Tim Mitchell ( b ) is hosting and he has a great topic for us: What’s in our Data Detective toolkit?

I love this topic for so many reasons. Partly because I feel like I’m asked to look at so many projects where I’m dropped in and asked to figure things out, usually performance related but occasionally new functionality or features. But as I’m asked to do this fairly often, I may have to see if Data Detective can be my new title… hmm…

Being a Data Detective in a film noir. On the one hand, that sounds like a really neat idea. On the other hand, things usually don’t turn out so well for the detective.

Leave a Comment

The Power of Computed Columns

Andy Brownsword speeds up a query:

Bad code smells can run through a system, and one instance can quickly spread when code is recycled. Simon recently looked at a non-sargable example and was able to fix it by using an index and refactoring the query.

I wanted to consider an alternative approach if we saw the same issue repeated consistently. We don’t want to fix each instance, we want a single fix. We’ll solve this with indexed computed column.

We can index computed columns to help resolve deterministic (i.e. won’t change, no GETDATE() allowed) clauses. Let’s get started.

Read on to learn more. This is a powerful approach to the problem of needing to perform some sort of data transformation before filtering your data.

Leave a Comment

Grouping Rows in R

Steven Sanderson needs a GROUP BY clause:

Combining rows with the same column values is a fundamental task in data analysis and manipulation, especially when handling large datasets. This guide is tailored for beginner R programmers looking to efficiently merge rows using Base R, the dplyr package, and the data.table package. By the end of this guide, you will be able to seamlessly aggregate data in R, enhancing your data analysis capabilities.

Click through for several code examples.

Leave a Comment

Referencing a Microsoft Fabric ML Model from another Workspace

Sandeep Pawar crosses workspaces:

I have written a couple of blogs about working with ML models in Microsoft Fabric. Creating experiments and logging and scoring models in Fabric is very easy, thanks to the built-in MLflow integration. However, the Fabric Data Science experience has one limitation. There are no model endpoints yet, and you cannot load a model from another workspace because the model URI, unlike in Databricks, does not reference a workspace. If you use MLFlowTransformer as shown in this blog, only the model from the workspace where the notebook is hosted is loaded. However, there is a workaround.

Read on for that workaround, as well as the core limitation associated with it.

Leave a Comment