2024-09-27 – Curated SQL

Boosting versus Bagging in Tree Models

Published 2024-09-27 by Kevin Feasel

Vinod Chugani compares two techniques for working with trees:

Ensemble learning techniques primarily fall into two categories: bagging and boosting. Bagging improves stability and accuracy by aggregating independent predictions, whereas boosting sequentially corrects the errors of prior models, improving their performance with each iteration. This post begins our deep dive into boosting, starting with the Gradient Boosting Regressor. Through its application on the Ames Housing Dataset, we will demonstrate how boosting uniquely enhances models, setting the stage for exploring various boosting techniques in upcoming posts.

Read on for more information. The neat part about the “boosting versus bagging” debate is that both techniques are quite useful. Although boosting (via algorithms like XGBoost or LightGBM) is the more popular technique, bagging (random forest) is extremely powerful in its own right.

Comments closed

Creating Horizontal Box Plots in R

Published 2024-09-27 by Kevin Feasel

Steven Sanderson tips the box plot over:

A boxplot, also known as a whisker plot, displays the distribution of data based on a five-number summary: minimum, first quartile, median, third quartile, and maximum. It highlights the data’s central tendency and variability, making it easier to identify outliers.

Read on for examples in base R and ggplot2.

Comments closed

Rebuilding a Transaction Log

Published 2024-09-27 by Kevin Feasel

David Fowler fixes a large-scale oopsie:

“Could you help me, we deleted the database’s transaction log file and now that database is stuck in ‘Recovery Pending’?”

This was a panicked call that I received a few weeks ago.

“Sure, no problem” said I, “we’ll have to restore back to your last backup”

And then things went silent for a while before the inevitable, “it’s only a development database, we don’t take backups”.

I can feel the face-palm from here. Read on to learn what you can do if you’re in that situation, as well as David’s important note about taking backups so that you don’t end up in this situation to begin with.

Comments closed

Creating Vector Indexes in Oracle

Published 2024-09-27 by Kevin Feasel

Brendan Tierney continues a series on vector storage:

Building upon the previous posts, this post will explore how to create Vector Indexes and the different types. In the next post I’ll demonstrate how to explore wine reviews using the Vector Embeddings and the Vector Indexes.

Read on for a bit of prep work, followed by a description of both vector index types in Oracle and how you can create them.

Comments closed

Connecting to Azure Storage from SSIS

Published 2024-09-27 by Kevin Feasel

Andy Brownsword makes a connection:

Migrating to the cloud can be disruptive to existing processes. Moving storage to Azure isn’t a simple configuration change for SSIS packages.

SSIS doesn’t have native connections for Azure. That doesn’t mean we need to completely re-engineer the process or change technology though.

How can we take the simple package below and move to using Azure storage?

Read on for the answer. Also, I am 100% on Team SAS Token. They are easy to create and give you a lot of control over who gets access to what.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Day: September 27, 2024

Boosting versus Bagging in Tree Models

Creating Horizontal Box Plots in R

Rebuilding a Transaction Log

Creating Vector Indexes in Oracle

Connecting to Azure Storage from SSIS