Press "Enter" to skip to content

Month: August 2023

ML Model Interactions and hstats

Michael Mayer has a new R package for us:

This post is mainly about the third approach. Its beauty is that we get information about all interactions. The downside: it is as good/bad as partial dependence functions. And: the statistics are computationally very expensive to compute (of order n^2).

Different R packages offer some of these H-statistics, including {iml}, {gbm}, {flashlight}, and {vivid}. They all have their limitations. This is why I wrote the new R package {hstats}:

Click through for an overview of the package and an example of how it works.

Comments closed

Migrating Cosmos DB Tables API

Eitan Blumin handles a migration:

A few months ago, I was involved in an interesting project where a large customer (not to be named due to NDA) needed to migrate their entire Azure cloud subscription to another subscription. This was a difficult and arduous process that involved several PaaS technologies, besides SQL Server, that I didn’t have experience with before.

But it presented very interesting challenges and opportunities to learn new things.

One of these was the need to migrate an entire Azure Cosmos DB with Table Storage API account from one subscription to another.

Read on for the challenge, the intermediate solution using the Cosmos DB Data Migration Tool, and Eitan’s Powershell script to automate the process. I know and work with most of the people working on the DMT and they’re good folks.

Comments closed

The Hunt for Red Logtober

Erik Darling has a new stored procedure and a fancy Scottish accent:

SQL Server has incredibly verbose error logs, and sifting through them for the highlight reel can be a pain. They’re often full of messages that you aren’t actionable.

  • Failed logins
  • Successful logins
  • Successful backups

Making matters worse is that when you open large log files, the response time for loading all those messages can be painful, and even time out.

But Erik has a nice stored procedure to filter out the chaff. Read on to learn more about it.

Comments closed

Creating a Power BI VNet Data Gateway

Meagan Longoria rolls up her sleeves:

If you are using Power BI to connect to a PaaS resource on a virtual network in Azure (including private endpoints), you need a data gateway. While you can use an on-premises data gateway (the type of Power BI gateway we have had for years), there is an offering called a virtual network data gateway that is currently in preview.

The VNet data gateway securely communicates with the data source, executes queries, and transmits results back to the service, just like the on-premises data gateway. But it doesn’t require us to provision a virtual machine in the same network (or a peered network) of our Azure data source.

Read on to see some important caveats, as well as a step-by-step guide.

Comments closed

Row Compression and Datatypes

Chad Callihan notes a property of row compression:

Things don’t always go as planned. This quote can apply to many topics, including sizing up your data types. When you started out with a new table, there were no concerns with SMALLINT—until you realized you needed INT. Or maybe you were rolling along fine with INT until you needed to up your game to BIGINT.

If you don’t have much data in your tables and you need to change data types, you may not have much to be concerned about when it comes to the impact of such a change. But problems can arise when you have a table full of data and want to shave off some time to make a data type change as smooth as possible.

The secret here is that, when you’re using row compression, SQL Server maintains the smallest number of bytes for the numeric data. For example, if you have three bytes of 0 and one byte of actual data in your int (i.e., the number is between 0-255), row compression will store one byte. If you subsequently bump this up to a bigint, row compression stores one byte, whereas default non-compressed pages need to be re-written to store seven bytes of 0 and one byte of your number.

This can be a really smart idea if you’re planning a data type change on a large table, as there are extremely few cases in which row compression is bad (“extremely few” as in “I’ve never seen one myself but I have to hold out the possibility that it could happen”) and if you have Enterprise Edition, you can rebuild all of those indexes online and ensure that you aren’t blocking the world for very long.

Comments closed

Decoding URIs in Power Query

Imke Feldmann decodes a URI:

When working with cloud data sources you might encounter links where spaces show up as “%20” instead for example. This is called URI escaping or encoding. In Power Query there is a native function that does this kind of encoding: Uri.EscapeDataString – PowerQuery M | Microsoft Learn
However, a native function that does just the opposite is missing. Fortunately, this can be achieved with a custom function that I have found in the Power BI forum:

Click through for that function.

Comments closed

Secrets Management in Docker Swarm

Andrew Pruski has a secret:

In a previous post I went through how to run SQL Server in Docker Swarm and in this post I want to continue delving into the Docker Swarm platform by looking at secrets.

Like Kubernetes, Docker Swarm has functionality to store sensitive data in secrets. When running SQL Server in containers we usually have to manually type out the environment variable for the MSSQL_SA_PASSWORD which isn’t great if we want to store that code in a public repository so being able to obfuscate that password is great!

Read on for a failed first attempt, followed by analysis and then success.

Comments closed

Set Intersection of Vectors in R

Steven Sanderson performs set intersection of vectors:

Welcome to another exciting blog post where we delve into the world of R programming. Today, we’ll be discussing the intersect() function, a handy tool that helps us find the common elements shared between two or more vectors in R. Whether you’re a seasoned R programmer or just starting your journey, this function is sure to become a valuable addition to your toolkit.

Click through to see how the function works.

Comments closed

Cumulative Sums in KQL

Robert Cain continues a series on KQL:

Seeing cumulative totals on a row by row basis is a common need. Think of your bank account, when you look at your ledger it shows each individual transaction. Beside each individual transaction is the current amount in your account, as offset by the amount of the current transaction.

This technique is known as cumulative sums. The Kusto Query Language provides the ability to create cumulative sums using the Windowing Function, row_cumsum. In this post we’ll see how to implement it.

Read on to see how it works and what the results look like.

Comments closed

Comparing Non-Standard Time Periods in Power BI

Marco Russo and Alberto Ferrari are back in school:

A requirement to apply the following technique is that every day can belong to a term or not, but there are no overlaps between terms. Indeed, if we had overlapping periods, we should create a different solution based on the Comparing different time periods pattern. In the case of school terms, we consider the case of three terms per year, where the first term starts in September of one year, and the last term ends in July of the following year. Therefore, the academic year is identified by two consecutive numbers, such as 2016-2017 (often shortened to 2016-17).

The business requirement is to compare one term with the previous term (within the same academic year or the previous one when we compare the first term of a year) and one term with the same term in the previous year. The goal is to obtain a result similar to the following one.

This turns out to be a little bit of a challenge, though Marco and Alberto have the solution for us.

Comments closed