Press "Enter" to skip to content

Curated SQL Posts

Linear Regression with R in Excel

Adam Gladstone continues a series on working with R in Excel via the ExcelRAddIn component:

In the first part of this series, I looked at using R in Excel to obtain descriptive statistics. In this second part of the series I am going to look at using R in Excel to perform linear regression, specifically using the lm() functionlm() is a real workhorse function. It can be used to carry out both single and multiple regression and different types of analysis of variance. For this demonstration I will only focus on single and multiple regression.

The workbook for this part of the series is: “Part II – R in Excel – Linear Regression.xlsx”. As before, the ‘References’ worksheet lists links to external references. The ‘Libraries’ worksheet loads additional (non-default) packages. In this demonstration, I use the datarium and broom packages. The ‘Datasets’ worksheet contains the data referenced in the worksheets.

Click through to see how you can perform ordinary least squares regression, multiple linear regression, and even logistic regression in Excel with a bit of R code. H/T R-Bloggers.

Leave a Comment

Building a Vector Data Demo Database for SQL Server 2025

Andy Yun has a new demo database:

Today, I have the honor and pleasure of debuting a new presentation for MSSQLTips: A Practical Introduction to Vector Search in SQL Server 2025 (you can watch the recording here too). To accompany that new presentation, I opted to create a new demo database instead of retrofitting one of my existing demo databases. And I’m sharing it with you so you don’t have to go through the headache of taking an existing database and creating vector embeddings.

Click through for Andy’s demo database, which is approximately 16 GB in size, so not a tiny one.

Leave a Comment

Fabric Studio 2.0 Released

Gerhard Brueckl has an update:

7 months after the first official release of Fabric Studio, I am very happy to share the I just released the next major version with a lot of new features that make working with Microsoft Fabric from VSCode better and more intuitive than ever! The release includes some new capabilities that I wanted to get into the tool since the very beginning but back then the APIs just weren’t there yet. Finally they are and I integrated them into Fabric Studio v2.0!

Click through for a quick changelog, a link to the full changelog, and where you can grab a copy of the Visual Studio Code extension.

Leave a Comment

Updates to sp_CheckBackup

Jeff Iannucci announces some updates to a free tool:

Since we introduced sp_CheckBackup last year we’ve gotten some great feedback on ways to improve this tool that helps you quickly and easily review the backup history for your SQL Server databases. We’ve even noticed a few things ourselves that we wanted to change as we used it more internally here at Straight Path Solutions.

Today we’re announcing a new version that includes some additions, corrections, and a few other adjustments that should be helpful. Here’s what new!

Read on for that changelog.

Leave a Comment

Using R for Descriptive Statistics in Excel

Adam Gladstone shows off an Excel add-in:

The purpose of this series of posts is to demonstrate some use-cases for R in Excel using the ExcelRAddIn component (disclaimer: I am the developer of this add-in: ExcelRAddIn). The fundamental rationale for the add-in is that it allows access to the extensive R ecosystem within an Excel worksheet. Excel provides many excellent facilities for data wrangling and analysis. However, for certain types of statistical data analysis, the limitations of the built-in functions even alongside the Analysis ToolPak is not sufficient, and R provides superior facilities (for example, for performing LDA, PCA, forecasting and time series analysis to mention a few).

Click through for examples of how it all works. H/T R-Bloggers.

Leave a Comment

Ownership Chaining in SQL Server

Jon Russell breaks the chain:

Designing a reporting layer that protects sensitive data takes more than hiding tables behind a view. You must understand how schema permissions and ownership chaining interact, or a well‑meant deny can suddenly block your users—or worse, let them see columns you thought were private. The walk‑through below shows the entire life‑cycle of a common scenario:

  1. Build an HR table that holds confidential columns.
  2. Expose a summary view in a separate schema.
  3. Grant a reporting role access to the view but explicitly deny access to the HR schema.
  4. Break the ownership chain, observe the failure, diagnose the cause, and repair it by realigning ownership.

Read on to learn more. Note that this is all within a single database, so we aren’t talking about the cross-database ownership chaining setting. That setting being on immediately raises red flags for me.

Leave a Comment

Measuring Delay in Availability Groups

Matt Gantz wants to see how far behind we are:

Availability Groups are useful for creating Highly Available (HA) database systems, but that doesn’t mean they are entirely immune to performance problems. In busy systems, limitations in the infrastructure can introduce replication lag that is severe enough to affect database performance in ways that aren’t immediately obvious.

In SQL Server Availability Groups, the relevant difference between synchronous and asynchronous replicas comes down to how and when a transaction on the primary server is considered “committed”: Although it is easy to measure the lag between asynchronous replicas by using the dashboard in SSMS or by querying the DMVs (Dynamic Management Views), it takes more work to find the latency between synchronous replicas. This article explains how to measure that latency using internal performance counters, offering a simple technique for monitoring the cost of synchronous replication.

Click through to read the whole thing.

Leave a Comment

Making PostgreSQL Slower

Jacob Jackson takes on a unique challenge:

Everyone is always wondering how to make Postgres fastermore efficient, etc, but nobody ever thinks about how to make Postgres slower. Now, of course, most of those people are being paid to focus on speed, but I am not (although, if you wanted to change that, let me know). As I was writing a slightly more useful guide, I decided someone needed to try to create a Postgres configuration optimized to process queries as slowly as possible. Why? I am not sure, but this is what came of that thought.

I spent a few moments thinking about an equivalent sort of torture test on SQL Server, doing things like forcing CPU affinity through one core, monkeying with cost threshold for parallelism, and using trace flags to turn off different join optimizations (like, say, hash matches and merge joins, forcing everything to be nested loops). It’s a fun thought experiment.

Leave a Comment