Press "Enter" to skip to content

Day: May 7, 2024

Model Selection with AIC

Steven Sanderson talks about the Akaike Information Criterion:

In the world of data analysis and statistics, one of the key challenges is selecting the best model to describe and analyze your data. This decision is crucial because it impacts the accuracy and reliability of your results. Among the many tools available, the Akaike Information Criterion (AIC) stands out as a powerful method for comparing different models and choosing the most suitable one.

Today we will go through an example of model selection using the AIC, specifically focusing on its application to various statistical distributions available in the TidyDensity package. TidyDensity, a part of the healthyverse ecosystem, offers a comprehensive suite of tools for data analysis in R, including functions to compute AIC scores for different probability distributions.

Read on for a quick primer on the AIC itself and how you can use it in TidyDensity.

Comments closed

Dealing with Page Latch Weights

Jared Poche talks latches:

Reducing waits is a great way to improve the performance of your SQL Servers. Minimizing PAGELATCH_EX and PAGELATCH_SH wait types are more involved than most. There are generally two causes; one of which is largely solved in recent versions, and one which requires real thought and planning to resolve.

Read on for both of those causes, as well as several tips you can use to minimize the risk of waiting.

Comments closed

Using KEEPFILTERS in DAX

Marco Russo and Alberto Ferrari share some advice:

In the article Using KEEPFILTERS in DAX, we described how to use KEEPFILTERS, which is a function that preserves the existing filter on columns affected by a new filter applied by CALCULATE or CALCULATETABLE. We suggest reading that article if you are not familiar with KEEPFILTERS. However, we wanted to clarify a rule of thumb you can apply to quickly decide when to use KEEPFILTERS or not in most cases. This will allow you to only invest more time when you are dealing with specific requirements.

Read on for common use cases of KEEPFILTERS, as well as recommendations on when and how you should use the function.

Comments closed

Building Metadata-Driven Pipelines in Microsoft Fabric

Koen Verbeeck lays out a process:

The goal of metadata driven code is that you build something only once. You need to extract from relational databases? You build one pipeline that can connect to a relational source, and you parameterize everything (server name, database name, source schema, source table, destination server name, destination table et cetera). Once this parameterized piece of code is ready, all you must do is enter metadata about the sources you want to extract. If at a later point an additional relational source needs to be extracted, you don’t need to create a brand-new pipeline. All you need to do is enter a new line of data in your metadata repository.

Aside from speeding up development – after you’ve made the initial effort of creating your metadata driven pipeline – is that everything is consistent. You tackle a certain pattern always in the same way. If there’s a bug, you need to fix it in one single location.

Read on to see how this works. The idea is certainly not new, as Koen mentions, but there are some specific factors that come into play for Microsoft Fabric pipelines.

Comments closed

Missing Log Information after a VM Rollback

Jonathan Kehayias (via Paul Randal) advises caution:

Recently I received an email with a very vague description of a problem that went something along the lines of:

“After some issues with Windows patching and rolling back from snapshots, our cluster resources won’t come online. In the process of troubleshooting the issue and validating the fix, we rolled back a few times. We can’t find any details about the issue in the cluster logs, Windows event logs, or SQL Server error log to investigate further.”

Read on for more information about the immediate problem, the root cause, and the actual issue the customer ran into before compounding the problem.

Comments closed

Security and the Weakest Link

Brian Kelley explains:

When I look at a system and think about its security model, the first thing I start poking around at is where I think security is weakest. For instance, if my target is a Microsoft SQL Server box, I don’t generally look for a weakness in SQL Server itself. I start looking at the operating system, I look at accounts that may have access, and since I’m really worried about the data being taken, I look to see how backups are handled and where they are written to.

Read on for more information, as well as a link to Brian’s 2019 video on attacking SQL Server.

Comments closed

SQL Server Drivers and Access Tokens for Connection Pooling

David Engel gives us a deep dive:

We often get support escalations related to failures around expired access tokens when using Microsoft Entra ID (formerly Azure Active Directory) authentication. There is a lot of nuance in the various drivers about how and when access tokens are used and when they get renewed after expiring. I’m going to do a deep-dive and try to explain some of the details here.

This post is mainly targeted at people who have to troubleshoot issues in this area or those who are building large services and need to understand the impact of their authentication choices.

Read on to learn more about how the ODBC, JDBC, and .NET drivers differ.

Comments closed