2024-08-23 – Curated SQL

Interpreting Linear Regression Model Coefficients

Published 2024-08-23 by Kevin Feasel

Vinod Chugani looks at a linear regression:

Linear regression models are foundational in machine learning. Merely fitting a straight line and reading the coefficient tells a lot. But how do we extract and interpret the coefficients from these models to understand their impact on predicted outcomes? This post will demonstrate how one can interpret coefficients by exploring various scenarios. We’ll delve into the analysis of a single numerical feature, investigate the role of categorical variables, and unpack the complexities introduced when these features are combined. Through this exploration, we aim to equip you with the skills needed to leverage linear regression models effectively, enhancing your analytical capabilities across different data-driven domains.

Click through for details, with examples in Python.

Comments closed

Time Series Anomaly Detection in Microsoft Fabric

Published 2024-08-23 by Kevin Feasel

Adi Eldar talks anomaly detection:

Anomaly Detector, one of Azure AI services, enables you to monitor and detect anomalies in your time series data. This service is based on advanced algorithms, SR-CNN for univariate analysis and MTAD-GAT for multivariate analysis. This service is being retired by October 2026, and as part of the migration process

The algorithms were open sourced and published by the new time-series-anomaly-detector · PyPI package.

We offer a time series anomaly detection workflow in Microsoft Fabric data platform.

Read on to see what replacements exist and how you can use the time-series-anomaly-detector package in Microsoft Fabric.

Comments closed

Reviewing the SQL Server Error Log

Published 2024-08-23 by Kevin Feasel

Jim Evans digs into the logs:

In SQL Server there are two primary sets of error logs. One for the database engine and a second for SQL Server Agent. Reviewing these logs is routine for Database Administrators and sometimes Developers when troubleshooting issues. What are the different ways to view these error logs? Are there different scenarios when you would use one view other another? Do any other error logs exist that SQL Server Professionals should review?

Read on for three ways to do this, including one outside of SQL Server itself.

Comments closed

Storing and Parsing JSON in SQL Server

Published 2024-08-23 by Kevin Feasel

Ed Pollack talks JSON:

Like XML, JSON is an open standard storage format for data, metadata, parameters, or other unstructured or semi-structured data. Because of its heavy usage in applications today, it inevitably will make its way into databases where it will need to be stored, compressed, modified, searched, and retrieved.

Even though a relational database is not the ideal place to store and manage less structured data, application requirements can oftentimes override an “optimal” database design. There is a convenience in having JSON data close to related relational data and architecting its storage effectively from the start can save significant time and resources in the future.

Read on for plenty of examples and tips. Ideologically, I have no problem parsing JSON to load data into SQL Server. I have no real problem storing data in JSON if the calling application takes that JSON as-is and does not expect the database to modify or shred that JSON. I have no problem taking relational data and creating JSON structures to send out to calling applications. My problem comes when you store the data as JSON but then expect the database to manage data. Treat the JSON blob as atomic and we’re fine; otherwise, I want to make that data relational, as befits a relational database.

2 Comments

Uniquifiers Doing Heavy Lifting

Published 2024-08-23 by Kevin Feasel

Michael J. Swart is one of a kind:

If you define a clustered index that’s not unique, SQL Server will add a hidden 4-byte column called UNIQUIFIER. You can’t see it directly but it’s there. When you add a row whose key is a duplicate of an existing row, the new row gets a new unique value for it’s uniqueifier. If you add over 2.1 billion rows with the same key, the uniquifier value exceeds the limit and you will see error 666.

A while ago, we nearly got into trouble because of a bad choice for clustering key that went undetected for so long.

Click through for a query to see how many clustered indexes need uniquifiers and which have the most duplication of key fields.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Day: August 23, 2024

Interpreting Linear Regression Model Coefficients

Time Series Anomaly Detection in Microsoft Fabric

Reviewing the SQL Server Error Log

Storing and Parsing JSON in SQL Server

Uniquifiers Doing Heavy Lifting