Press "Enter" to skip to content

Month: May 2024

Cross-Correlation of Time Series to Identify Time Lags in SAS

Kevin Scott and David Frede notice the pattern:

Batch manufacturing involves producing goods in batches rather than in a continuous stream. This approach is common in industries such as pharmaceuticals, chemicals, and materials processing, where precise control over the production process is essential to ensure product quality and consistency. One critical aspect of batch manufacturing is the need to manage and understand inherent time delays that occur at various stages of the process.

In the glass manufacturing industry, which operates under the principles of batch manufacturing, precisely controlling the furnace temperature is essential for producing high-quality glass. The process involves melting raw materials like silica sand, soda ash, and limestone at high temperatures, where maintaining the correct temperature is crucial.

Read on to see an example of how you can automate the identification of a time lag using cross-correlation techniques.

Comments closed

Handling Imbalanced Data in Classification Algorithms

Matthew Mayo shares a few tips:

Imperfect data is the norm rather than the exception in machine learning. Comparably common is the binary class imbalance when the classes in a trained data remains majority/minority class, or is moderately skewed. Imbalanced data can undermine a machine learning model by producing model selection biases. Therefore in the interest of model performance and equitable representation, solving the problem of imbalanced data during training and evaluation is paramount.

This article will define imbalanced data, resampling strategies as solution, appropriate evaluation metrics, kinds of algorithmic approaches, and the utility of synthetic data and data augmentation to address this imbalance.

Read on for five recommendations, starting with what you should know and then offering up four options for what you can do.

Comments closed

Building Charts from Powershell

Chad Callihan builds a chart:

There are plenty of ways to visualize data. There’s PowerBI, Tableu, and a plethora of other options. What about taking the results of a SQL query and creating a graph in PowerShell? Probably not ideal, but is it possible? Let’s see what this might look like.

The thought occurred to me more out of curiosity than it being something I’d use. Admittedly, I’m not proficient enough in PowerShell to quickly build something from scratch. To get an idea of how it might look, I took this as an opportunity to outsource most of the work to Microsoft Copilot to see if I would get anything useful.

If you want to get fancy, I’d recommend Plotly, which has support for the best .NET language (F#) and you can also use it with those other .NET languages (C#, Powershell). There’s no explicit quickstart for Powershell but you can Powershell-itize the C# code pretty easily.

Comments closed

Trigger Order: Oracle vs Postgres

Deepak Mahto talks trigger ordering:

Database triggers allow encapsulation of multiple functionalities that are automatically invoked on specific events or DML operations like INSERT, UPDATE, or DELETE. The invocation of triggers can be controlled as BEFORE or AFTER the triggering events, either for each change or per statement. In migrating from Oracle to PostgreSQL, it is important to be aware of triggers conversion gotchas.

In this blog, we discuss the default order of processing when multiple triggers are defined on the same table for the same events with the same characteristics. We will explore how this works in Oracle, how to alter the order, and how it operates in PostgreSQL.

This is good to know, but I’d also consider it a major code smell to have multiple triggers on the same table.

Comments closed

Parallel SQL Agent Jobs: The Next Step

Aaron Bertrand wraps up a series on parallel SQL Agent jobs:

Imagine you’ve split up this chunk of work so that, collectively, it finishes much faster. When all of the threads are done, you want to kick off the next round of work. For example, if the threads are writing data to flat files, when they’re all done, you might want to then kick off a process that zips them, or moves them, or bulk loads them into a different system, or all of the above.

Read on for some of the challenges around this, as well as what you can do about them. Also, Aaron mentions SQL Agent tokens en passant, and that’s something that will probably be new to most people.

Comments closed

Concatenating Strings and (N)VARCHAR Truncation

Vlad Drumea troubleshoots a common problem:

The code in this case is the GetStatsInfoForWholeDB.sql script that’s part of PSBlitz’s resources.
This script is used for, you wouldn’t believe by the name alone, getting statistics information for a specific database.

Due to the fact that it might be ran on Azure or on older versions of SQL Server, as well as on databases with incremental statistics, the best option for it was to use dynamic SQL.

In this case it uses a variable @SQL defined as NVARCHAR(MAX) to store the query that’s built at runtime and execute it via EXEC.

Read on for one of the most common issues you may run into around generating dynamic SQL.

Comments closed

Module.Versions in Power Query

Chris Webb gives Internet sleuths something to chew on:

The ever-vigilant folks on the internet have spotted that there’s a new M function in the latest versions of Power BI and Excel: Module.Versions. This function, at the time of writing, returns a record with a single field in that contains the version number of the Power Query engine currently in use.

Click through for an example of calling the function and what it returns as of right now.

Comments closed

Building a Power Query Template

Nikola Ilic looks at Power Query templates:

In this “ocean” of innovations, there are certain features that don’t get the deserved limelight – as they somehow go under the radar. Some of them, I really consider “hidden gems” – you might not use them in each and every solution, but in some scenarios, they can be of immense help.

A hidden gem that I’m introducing today is called Power Query Template. As of today, this feature is still in preview (the same as many others in Microsoft Fabric), but this doesn’t minimize its potential.

Read on to see why Nikola likes capability this so much.

Comments closed

Splitting a Number into Component Digits in R

Steven Sanderson does a bit of splitting:

Splitting numbers into individual digits can be a handy trick in data analysis and manipulation. Today, we’ll explore how to achieve this using base R functions, specifically gsub() and strsplit(). Let’s walk through the process step by step, explain the syntax of each function, and provide some examples for clarity.

Click through for a pair of examples.

Comments closed

An Overview of Classification Algorithms

Matthew Mayo explains several algorithms:

Classification algorithms are at the heart of data science, helping us categorize and organize data into pre-defined classes. These algorithms are used in a wide array of applications, from spam detection and medical diagnosis to image recognition and customer profiling. It is for this reason that those new to data science must know about and understand these algorithms: they lay foundations for more advanced techniques and provide insight into how those data-driven decisions are made.

Let’s take a look at 5 essential classification algorithms, explained intuitively. We will include resources for each to learn more if interested.

Click through for five algorithms and a couple of paragraphs describing how the algorithm works. For a little bit of self-promotion on my end, I have a series on YouTube running right now on the topic of classification where I cover a variety of algorithms. As a spoiler, 4 of the 5 on Matthew’s list will have their own videos, and there are several other algorithms to boot.

Comments closed