Press "Enter" to skip to content

Author: Kevin Feasel

Avoid aggregate in R on Wide Matrices

Ali Oghabian shares some hard-earned advice:

The aggregate function can be very useful in R, allowing one to run a function (e.g. mean) within groups of rows, in each column in a matrix/data-frame and organize the results in an easy-to-read table. However, the function takes long to run for very wide matrices and data frames, where the number of the columns are large. I this post I demonstrate the issue and show a couple of nice solutions that at least for the example cuts down the time to 15% and even less, compared to the run-time of the aggregate function.

Click through for a demo. Granted, this is a matrix with 10,000 columns, so I’m not sure how this applies to narrower matrices. H/T R-Bloggers.

Leave a Comment

Locking Down a PostgreSQL Database

Thom Brown shares some advice:

As you may have heard, there are reportedly over 1,500 PostgreSQL servers that have been exploited to mine Bitcoin. And your server could be next if you haven’t taken precautions. Firstly, you need to update to the latest minor release, just so no known exploitable bugs exist on your system. But regardless of whether you update, your PostgreSQL instance could still be misconfigured in a way that would allow unwelcome visitors access, so you need to make sure you shore up your defenses. Here are some steps you should take.

Click through for some solid guidance.

Leave a Comment

FabCon Announcements for DAX and Semantic Models

Marco Russo summarizes the announcements:

I usually do not write about announcements and new features until we have had time to try and test them in the real world. However, there are always exceptions, and some of the announcements at the Microsoft Fabric Conference 2025 fall into this category because I have worked with them enough to provide hands-on feedback.

In short, these are the topics I am covering in this blog post:

  • Direct Lake and Import mode
  • Calendars in DAX
  • User-Defined Functions (UDF) in DAX

These weren’t the headline-grabbers of the conference, but Marco explains the importance behind each of them.

Leave a Comment

400 Bad Request when Debugging a Data Factory Pipeline

Koen Verbeeck runs into a problem:

I recently had a new pipeline fail. It was actually a copy of an old pipeline where I had made some adjustments into as part of a database migration. When triggered during an execution run, it failed saying some expression could not be parsed. When I went into the pipeline and triggered a debug, it immediately failed with the following helpful error message:

Click through for the error message and how Koen was able to fix the issue.

Leave a Comment

Dealing with Time in PostgreSQL

Radim Marek clocks in for the day:

The next possible source of confusion when working with time in PostgreSQL is presence of two distinct data types:

  • timestamp (or timestamp without time zone)
  • timestamptz (or timestamp with time zone) Despite what the names suggest, the key difference isn’t whether they store timezone information, but rather how they handle it during storage and retrieval.

Click through for quite a bit of detail and plenty of examples on how to handle date and time data in PostgreSQL. And I’m still jealous about support for intervals, especially in window functions.

Leave a Comment

Handling a Sort Operation in SQL Server Integration Services

Andy Brownsword knows that sometimes, the only winning move is not to play:

Last time out we discussed blocking transformations, what they are, the impact of them, and touched on how to deal with them. In this post we’re going a step further to tackle one of them head on.

Here we’ll demonstrate the impact of blocking caused by the Sort transformation, and look at two options for solving this and slashing execution time.

Sorts aren’t the only blocking transformation that you should push back down to your source (if possible), but it is the most common example.

Leave a Comment

Loading Data from Pandas into Snowflake

Anil Kumar Moka loads some data:

Loading data into Snowflake is a common need. Using Python and pandas is a common go-to solution for data professionals. Whether you’re pulling data from a relational database, wrangling a CSV file, or prototyping a new pipeline, this combination leverages pandas’ intuitive data manipulation and Snowflake’s cloud-native scalability. But let’s be real—data loading isn’t always a simple task.

Files go missing, connections drop, and type mismatches pop up when you least expect them. That’s why robust error handling isn’t just nice-to-have; it’s essential for anything you’d trust in production. In this guide, we’ll walk through the fundamentals of getting data into Snowflake, explore practical examples with pandas and SQLAlchemy, and equip you with the tools to build a dependable, real-world-ready pipeline. Let’s dive in and make your data loading process as smooth as possible!

Read on for a quick primer around data loading and some of the sanity checking we should be doing along the way.

Leave a Comment

What’s New with OneLake Shortcuts

Miquella de Boer gives us an update:

Microsoft Fabric shortcuts enable organizations to unify their data across various domains and clouds by creating a single virtual data lake. These shortcuts act as symbolic links to data in different storage locations, simplifying access and reducing the need for multiple copies.

OneLake serves as the central hub for all analytics data. By using OneLake shortcuts, organizations can connect to existing data sources like Azure and AWS through a unified namespace, streamlining workflows and enhancing collaboration.

Click through for several feature improvements for shortcuts.

Leave a Comment

Finding Oracle Blogs

Brendan Tierney theoretically makes my life easier:

A regular question I get asked is, “is there a list of Oracle-related Blogs?” and “what people/blogs should I follow to learn more about Oracle Database?” This typically gets asked by people in the early stages of their careers and even by those who have been around for a while.

You could ask these questions to twenty different (experienced) people, and you’d get largely the same answers with some variations. These variations would be down to their preferences on how certain people cover certain topics. This comes down to experience of following lots of people and learning over time.

Click through for a list of top Oracle blogs. I should probably check out some of them, though I don’t tend to do much with Oracle here. But if you do, there’s a starting point with 100 separate blogs to check out.

Leave a Comment