Press "Enter" to skip to content

Day: September 10, 2024

Dealing with Collinearity using Lasso Regression

Vinod Chugani always moves in the same direction:

One of the significant challenges statisticians and data scientists face is multicollinearity, particularly its most severe form, perfect multicollinearity. This issue often lurks undetected in large datasets with many features, potentially disguising itself and skewing the results of statistical models.

In this post, we explore the methods for detecting, addressing, and refining models affected by perfect multicollinearity. Through practical analysis and examples, we aim to equip you with the tools necessary to enhance your models’ robustness and interpretability, ensuring that they deliver reliable insights and accurate predictions.

Read on to learn a bit more about how collinearity works and how you can use lasso regression (instead of ridge regression) to deal with the problem.

Leave a Comment

Point-in-Time Recovery with Postgres

Grant Fritchey restores some backups:

PostgreSQL has the capabilities to support backups as I already described in my first article on the topic. PostgreSQL also has the capability to restore to a point in time. However, that does require you to change the way you’re performing your backups. This article advances our understanding of how to better protect your PostgreSQL databases by expanding on the database backups and restores into a more full-blown disaster recovery process through point in time restores.

While the important part is the restore, in a classic chicken or egg conundrum, we can’t talk about restoring until we first have a backup, so I’ll start with how you need to backup your databases in preparation for a point in time restore.

Click through for the process and to see it in action.

Leave a Comment

Discerning a Star Schema from an Existing Report

Kelly Broekstra describes a common flow for business intelligence projects:

I have worked as a business intelligence developer for several years, and I’m always asked: “How do you convert user requirements to a functioning data model?”

I follow the Kimball methodology. For more information, check out the official pages.

But, here are some specific tips on what works for me.

Click through for those tips.

Leave a Comment

Microsoft Fabric Direct Lake and Reframing Operations

Reza Rad changes the frame:

Power BI offers a new type of connection to Microsoft Fabric Lakehouse or Warehouse, called Direct Lake. The Direct Lake connection acts like DirectQuery and won’t need the data to be refreshed. However, the Power BI semantic model has refresh settings that can be turned on or off. In this article and video, you will learn about the Refresh settings for the Power BI semantic model that is connected using a Direct Lake connection, what that is, and why it is called Reframe.

Read on to learn more, or to check out the video.

Leave a Comment

mssql-tools 18 and Two Common Errors

Vlad Drumea covers a pair of errors you might run into with mssql-tools version 18:

In this post I cover the 0A000086 and “command not found” errors that you might encounter with the new version of SQL Server command-line tools, namely sqlcmd and bcp, for Linux.

While the latest version of SQL Server command-line tools, based on Microsoft ODBC 18, brings improvements, it also brings some gotchas that can break your automations.

Read on to learn more about each.

Leave a Comment

The Brokenness of TABLESAMPLE

Paul White walks us through some issues:

Initial testing went well, which was a pleasant surprise. Soon enough though, errors started to appear in the tool’s output. That’s not entirely unexpected since ensuring consistent results under high concurrency tends to expose all sorts of niggly edge cases. It’s still an annoyance because debugging edge cases in trigger code can be tricky and laborious.

What was a surprise though was the nature of the error messages.

Read on for the full story. Paul has also created a feedback issue covering a problem with the function.

Leave a Comment

Microsoft Fabric Capacities and Reserved Instances

Marc Lelijveld shares an experience:

Last week, I had a situation in which a client wanted to purchase a reserved instance Fabric capacity. Me being me, I assumed it would be super straight forward to purchase through Azure. However, at some point I was lost in the process where the official documentation confused even more. In the end, I figured out and managed to get a capacity running based on the Reserved Instance pricing. I didn’t find any other blogs or articles describing this confusion or specific case. Therefore, I decided to write down my thoughts and findings in a blog.

This blog is not only relevant if you work with Microsoft Fabric, but also for anyone currently working with Power BI Premium. Given the deprecation of Power BI Premium capacities, you have to switch to Fabric capacities sooner or later.

Read on to learn more about the differences between pay-as-you-go and reserved instance capacities, the process to make a reservation, and what comes after that before you have a Microsoft Fabric capacity ready to go.

Leave a Comment