Press "Enter" to skip to content

Day: August 2, 2023

ML Model Interactions and hstats

Michael Mayer has a new R package for us:

This post is mainly about the third approach. Its beauty is that we get information about all interactions. The downside: it is as good/bad as partial dependence functions. And: the statistics are computationally very expensive to compute (of order n^2).

Different R packages offer some of these H-statistics, including {iml}, {gbm}, {flashlight}, and {vivid}. They all have their limitations. This is why I wrote the new R package {hstats}:

Click through for an overview of the package and an example of how it works.

Comments closed

Migrating Cosmos DB Tables API

Eitan Blumin handles a migration:

A few months ago, I was involved in an interesting project where a large customer (not to be named due to NDA) needed to migrate their entire Azure cloud subscription to another subscription. This was a difficult and arduous process that involved several PaaS technologies, besides SQL Server, that I didn’t have experience with before.

But it presented very interesting challenges and opportunities to learn new things.

One of these was the need to migrate an entire Azure Cosmos DB with Table Storage API account from one subscription to another.

Read on for the challenge, the intermediate solution using the Cosmos DB Data Migration Tool, and Eitan’s Powershell script to automate the process. I know and work with most of the people working on the DMT and they’re good folks.

Comments closed

The Hunt for Red Logtober

Erik Darling has a new stored procedure and a fancy Scottish accent:

SQL Server has incredibly verbose error logs, and sifting through them for the highlight reel can be a pain. They’re often full of messages that you aren’t actionable.

  • Failed logins
  • Successful logins
  • Successful backups

Making matters worse is that when you open large log files, the response time for loading all those messages can be painful, and even time out.

But Erik has a nice stored procedure to filter out the chaff. Read on to learn more about it.

Comments closed

Creating a Power BI VNet Data Gateway

Meagan Longoria rolls up her sleeves:

If you are using Power BI to connect to a PaaS resource on a virtual network in Azure (including private endpoints), you need a data gateway. While you can use an on-premises data gateway (the type of Power BI gateway we have had for years), there is an offering called a virtual network data gateway that is currently in preview.

The VNet data gateway securely communicates with the data source, executes queries, and transmits results back to the service, just like the on-premises data gateway. But it doesn’t require us to provision a virtual machine in the same network (or a peered network) of our Azure data source.

Read on to see some important caveats, as well as a step-by-step guide.

Comments closed

Row Compression and Datatypes

Chad Callihan notes a property of row compression:

Things don’t always go as planned. This quote can apply to many topics, including sizing up your data types. When you started out with a new table, there were no concerns with SMALLINT—until you realized you needed INT. Or maybe you were rolling along fine with INT until you needed to up your game to BIGINT.

If you don’t have much data in your tables and you need to change data types, you may not have much to be concerned about when it comes to the impact of such a change. But problems can arise when you have a table full of data and want to shave off some time to make a data type change as smooth as possible.

The secret here is that, when you’re using row compression, SQL Server maintains the smallest number of bytes for the numeric data. For example, if you have three bytes of 0 and one byte of actual data in your int (i.e., the number is between 0-255), row compression will store one byte. If you subsequently bump this up to a bigint, row compression stores one byte, whereas default non-compressed pages need to be re-written to store seven bytes of 0 and one byte of your number.

This can be a really smart idea if you’re planning a data type change on a large table, as there are extremely few cases in which row compression is bad (“extremely few” as in “I’ve never seen one myself but I have to hold out the possibility that it could happen”) and if you have Enterprise Edition, you can rebuild all of those indexes online and ensure that you aren’t blocking the world for very long.

Comments closed

Decoding URIs in Power Query

Imke Feldmann decodes a URI:

When working with cloud data sources you might encounter links where spaces show up as “%20” instead for example. This is called URI escaping or encoding. In Power Query there is a native function that does this kind of encoding: Uri.EscapeDataString – PowerQuery M | Microsoft Learn
However, a native function that does just the opposite is missing. Fortunately, this can be achieved with a custom function that I have found in the Power BI forum:

Click through for that function.

Comments closed

Secrets Management in Docker Swarm

Andrew Pruski has a secret:

In a previous post I went through how to run SQL Server in Docker Swarm and in this post I want to continue delving into the Docker Swarm platform by looking at secrets.

Like Kubernetes, Docker Swarm has functionality to store sensitive data in secrets. When running SQL Server in containers we usually have to manually type out the environment variable for the MSSQL_SA_PASSWORD which isn’t great if we want to store that code in a public repository so being able to obfuscate that password is great!

Read on for a failed first attempt, followed by analysis and then success.

Comments closed