2025-08-22 – Curated SQL

Tips for Working with Pandas

Published 2025-08-22 by Kevin Feasel

Matthew Mayo has a few tips when working with Pandas for data preparation:

If you’re reading this, it’s likely that you are already aware that the performance of a machine learning model is not just a function of the chosen algorithm. It is also highly influenced by the quality and representation of the data that said model has been trained on.

Data preprocessing and feature engineering are some of the most important steps in your machine learning workflow. In the Python ecosystem, Pandas is the go-to library for these types of data manipulation tasks, something you also likely know. Mastering a few select Pandas data transformation techniques can significantly streamline your workflow, make your code cleaner and more efficient, and ultimately lead to better performing models.

This tutorial will walk you through seven practical Pandas scenarios and the tricks that can enhance your data preparation and feature engineering process, setting you up for success in your next machine learning project.

Click through for those tips and tricks.

Comments closed

Handling Missing Data in R

Published 2025-08-22 by Kevin Feasel

M. Fatih Tüzen fills in the gaps:

Data preprocessing is a cornerstone of any data analysis or machine learning pipeline. Raw data rarely comes in a form ready for direct analysis — it often requires cleaning, transformation, normalization, and careful handling of anomalies. Among these preprocessing tasks, dealing with missing data stands out as one of the most critical and unavoidable challenges.

Missing values appear in virtually every domain: surveys may have skipped questions, administrative registers might contain incomplete records, and clinical trials can suffer from dropout patients. Ignoring these gaps or handling them naively does not just reduce the amount of usable information; it can also introduce bias, decrease statistical power, and ultimately compromise the validity of conclusions. In other words, missing data is not just an inconvenience — it is a methodological problem that demands rigorous attention.

Quite often, we gloss over what to do with missing data when explaining or working through the data science process, in part because it’s a hard problem. This post digs into the specifics of the matter, taking us through eight separate methods. H/T R-Bloggers.

Comments closed

Lessons Learned on Migrating to Apache Kafka

Published 2025-08-22 by Kevin Feasel

Ravi Teja Thutari shares some advice:

The legacy e-commerce platform was a PHP-based monolith handling catalog, orders, inventory, and customer data. With business growth, the monolith could not scale further. Maintaining feature velocity was hard because every change risked the entire system. We needed scalability, resilience, and faster releases. Shifting to event-driven microservices promised to address these issues. In practice we adopted Kafka on Kubernetes, similar to other online retailers .

Our priorities were (1) decoupling services so each team could deploy independently, (2) modeling business events consistently across domains, and (3) ensuring reliable delivery at scale (with retries and DLQs for failures). As a starting point, we documented key domain events (e.g. OrderCreated, PaymentProcessed, InventoryAllocated) and sketched a target architecture. Like other high-traffic systems, we planned horizontal scaling: adding Kafka brokers and topic partitions to match consumer parallelism. We also planned for observability from Day 1 (metrics, logs, traces) to monitor performance and troubleshoot issues.

Read on for more information about how that migration went.

Comments closed

Configuring Alerts in Azure SQL Managed Instance

Published 2025-08-22 by Kevin Feasel

Aleksey Vitsko wants an alert:

You have an Azure SQL Managed Instance and you want to set up SQL Server alerts for errors with severity 17-25, similar what you would do for an on-prem SQL Server. You go to the SQL Server Agent folder in Object Explorer, expand it, and whoops – there is no Alerts folder.

As of time of writing this article (June 2025), Azure SQL Managed Instance doesn’t have this functionality, and we don’t have any ETA on when it will be implemented. So, how can we setup alerts in Azure SQL MI to notify us when there are issues?

Read on for a workaround and a warning.

Comments closed

Pattern Matching with REGEXP_LIKE() in SQL Server 2025

Published 2025-08-22 by Kevin Feasel

Koen Verbeeck writes a regular expression:

I need to do some data validation in our SQL Server database. However, the validation rules are too complex for the T-SQL LIKE function, and I can’t seem to get it done either with PATINDEX or something similar. I’d like to use regular expressions as they’re more powerful. SQL Server 2025 now has a regex function regexep_like to use regular expressions.

Read on for some examples, advice on validating e-mail addresses, and more.

Comments closed

Item History in the Microsoft Fabric Capacity Metrics App

Published 2025-08-22 by Kevin Feasel

Ope Aladekomo announces a new feature:

We’re thrilled to announce the Preview of the Item History page in the latest version of the Microsoft Fabric Capacity Metrics App. The Item History page provides a 30-day compute usage analysis through dynamic visuals and slicers, enabling users to explore both high-level consumption trends and granular item-level metrics. This page helps you understand how individual items and operations contribute to overall capacity usage.

Click through to see a picture of the page, as well as some of the information you can glean from it.

Comments closed

Sorting a Visual by a Field Not on the Visual

Published 2025-08-22 by Kevin Feasel

Nikola Ilic does a bit of sorting:

Recently, I was dealing with a Power BI report where the client had a very specific requirement – to sort the data in the visual based on a particular field from the semantic model. The only “issue” was that this particular field wasn’t part of the visual. So, while figuring out how this can be accomplished (because, yes, everything can be accomplished when the client needs it, hehe), I decided to write it down and share it with everyone who might find it useful.

Nikola successfully uses machine trickery to solve the problem.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Day: August 22, 2025

Tips for Working with Pandas

Handling Missing Data in R

Lessons Learned on Migrating to Apache Kafka

Configuring Alerts in Azure SQL Managed Instance

Pattern Matching with REGEXP_LIKE() in SQL Server 2025

Item History in the Microsoft Fabric Capacity Metrics App

Sorting a Visual by a Field Not on the Visual