Press "Enter" to skip to content

Day: March 20, 2025

Time-Saving Features in Scikit-Learn

Cornelius Yudha Wijaya describes a half-dozen functions:

For many people studying data science, Scikit-Learn is often the first machine learning library they encounter. It’s because Scikit-Learn offers various APIs that are useful for model development while still being easy for beginners to use.

As helpful as they may be, many features from Scikit-Learn are rarely explored and have untapped potential. This article will explore six lesser-known features that will save you time.

The calibration curve function, in particular, drew my attention, especially as I had written that by hand in the past.

Leave a Comment

Common Power BI Mistakes

Koen Verbeeck lays out some common mistakes people make when developing Power BI reports:

What are some of the most common mistakes when working with Power BI? For example, when a junior colleague starts on a Power BI project for the first time, what are the pitfalls you try to warn them about? What advice would you give them?

The last one hurts me in particular because .pbip and TMDL aren’t compatible with Power BI Report Server.

Leave a Comment

RegEx Performance in Azure SQL DB

Brent Ozar breaks the bad news:

Regular expressions are a way of doing complex string searches. They can be really useful, but they have a reputation: they’re hard to write, hard to read, and they’re even harder to troubleshoot. Once you master ’em, though, they come in handy for very specific situations.

This post isn’t about their complexity, though. This post is about Azure SQL DB & SQL Server 2025’s regex performance.

Brent’s testing hurts, because I want to use regular expressions, and based on what he’s seen so far, we’re probably still better off using CLR-based regex in SQLSharp.

Leave a Comment

An Overview of DataDiluvium

Adron Hall has a new tool and a new blog series. The first post is a product overview:

DataDiluvium is a web-based tool available at datadiluvium.com that helps developers, database administrators, and data engineers generate realistic test data from SQL schema definitions. Whether you’re setting up a development environment, creating test scenarios, or preparing data for demonstrations, DataDiluvium streamlines the process of data generation.

The second covers some of the development precepts Adron used:

DataDiluvium is a web-based tool I’ve built designed to help developers, database administrators, and data engineers generate realistic test data based on SQL schema definitions. The tool takes SQL table definitions as input and produces sample data in various formats, making it easier to populate development and testing environments with meaningful data.

The tool is free, so if you’re looking for a sample data generator, check it out.

Leave a Comment

COPY and \COPY in PostgreSQL

Dave Stokes runs two commands:

PostgreSQL is equivalent to a Swiss Army Knife in the database world. There are things in PostgreSQL that are very simple to use, while in another database, they take many more steps to accomplish. But sometimes, the knife has too many blades, which can cause confusion. This is one of those cases.

Read on to understand what the difference is between these two commands.

Leave a Comment

T-SQL Tuesday 184 Round-Up

Deborah Melkin casts a wide net:

There were a lot of themes that I noticed throughout everyone’s posts. First were the number of people who mentioned that mentoring doesn’t have to be formal or even a 1:1 relationship. Mentoring isn’t just for adults and careers, but for the next generation too. Mentoring has helped their careers or become part of a core tenant in their company and how they run their business. It’s a place to grow our community, and not just for those who look like us. We all talked about how we have grown from mentoring, not just as mentees but as mentors.

Click through for a dozen-and-a-half responses to the T-SQL Tuesday call.

Leave a Comment