Press "Enter" to skip to content

Day: February 28, 2024

Melting Datasets in R

Steven Sanderson performs a melt():

The melt() function in the data.table package is an extremely useful tool for reshaping datasets in R. However, for beginners, understanding how to use melt() can be tricky. In this post, I’ll walk through several examples to demonstrate how to use melt() to move from wide to long data formats.

“Melting,” by the way, is the R term for unpivoting data.

Comments closed

Generators in Python

Jack Wallen generates some values:

Within the realm of programming, a generator is a routine that is used to control interaction within a loop. Generators are useful when you want to produce a large sequence of values without storing them in memory at once. The ability to create a large sequence without using up memory is important, especially when dealing with Python programs that generate a large amount of information, such as a long sequence of numbers.

Read on to see how generators work and how to build a custom generator.

Comments closed

Apache Kafka 3.7 Released

Stanislav Kozlovski makes an announcement:

We are proud to announce the release of Apache Kafka® 3.7.0. This release contains many new features and improvements. This blog post will highlight some of the more prominent features. For a full list of changes, be sure to check the release notes.

See the Upgrading to 3.7.0 from any version 0.8.x through 3.6.x section in the documentation for the list of notable changes and detailed upgrade steps.

Read on to see what’s new. Looks like they’ve taken care of a couple dozen items in this release, so plenty to read there.

Comments closed

Classical Methods for Outlier Detection

Vinod Chugani is speaking my language:

Outliers are unique in that they often don’t play by the rules. These data points, which significantly differ from the rest, can skew your analyses and make your predictive models less accurate. Although detecting outliers is critical, there is no universally agreed-upon method for doing so. While some advanced techniques like machine learning offer solutions, in this post, we will focus on the foundational Data Science methods that have been in use for decades.

Vinod looks at a few techniques, including inter-quartile range and comparing results to an expected distribution. If you’re really excited about this topic, I know a guy who’s written a bit about it.

Comments closed

Row Constructors in Postgres

Deepak Mahto builds some rows:

A ROW expression allows you to construct ROW values, which can represent anonymous records, specific table row types, or custom composite types. Its uses include processing records within multiple expressions (using operators like =, <>, <, <=, >, or >=), evaluation with subqueries, and combining ROW values to build composite types. You can even export these to JSON format (using functions like rowtojson). Some key operations you can perform with a ROW constructor in Conversion or migration are :

Click through for that list and several examples.

Comments closed

A Cheat Sheet for Joins

Erik Darling does the academically correct thing and makes all of his joins CROSS JOINS and then performs all filters in the WHERE clause:

Think of your standard CRUD operations. Most don’t even require a join; they’re very straightforward. And hey, if you completely denormalize all your data to one huge table, you’ll never have to think about a lot of this stuff anyway.

It’s only when developers are forced to think about things that things start to go wrong. I don’t mean to pick on developers specifically. It’s the human condition. Thinking often leads to poor choices.

In this post, I’m going to give you some basic guidance on when to use various T-SQL facilities, based on years of finding, fixing, and writing queries.

You get a free second quotation because I love this line:

I’m not going to talk about right outer joins, because that’s the foolish domain of characterless buffoons who use Venn diagrams to explain join results.

If I were emotionally capable of crying, that would bring a tear to my eye because it’s savagery against two things I hate.

Comments closed

“License-Free” Managed Instance Requirements

Arun Sirpal reads the fine print:

This is the managed instance link feature; I really like this, if you know about Distributed AGs then you may know they are tricky to setup (well I found this) but Microsoft takes care of this out of the box.

The point of this quick blog is not how to set this up but the benefit of enabling the Managed Instance as “ license free “ via the hybrid failover rights option – do not forget about this.

Read on for the list of requirements.

Comments closed

Auto-Commit in Postgres vs Oracle

Akhil Reddy Banappagari explains one difference between Oracle and Postgres:

Oracle and PostgreSQL differ significantly in their transaction models, and AUTOCOMMIT is one of those differences. We see many of our customers successfully migrate code, but still encounter variations in behavior and even runtime errors related to transaction control. Many issues happen because AUTOCOMMIT settings differ between Oracle and PostgreSQL. It is definitely necessary to understand the distinctions in AUTOCOMMIT between Oracle and PostgreSQL to ensure a successful migration.

SQL Server also has auto-commit on by default for implicit transactions like in these examples.

Comments closed