Press "Enter" to skip to content

Author: Kevin Feasel

Removing Elements from a Vector in R

Steven Sanderson wants to leave one of these things out:

Working with vectors is one of the fundamental aspects of R programming. Sometimes, you need to remove specific elements from a vector to clean your data or prepare it for analysis. This post will guide you through several methods to achieve this, using base R, dplyr, and data.table. We’ll look at examples for both numeric and character vectors and explain the code in a straightforward manner. By the end, you’ll have a clear understanding of how to manipulate your vectors efficiently. Let’s dive in!

Read on for three pairs of examples, one for numeric vectors and one for character vectors.

Comments closed

Accuracy is Not Enough for Classification

I have a new video:

In this video, I explain why accuracy is not the be-all, end-all measure for classification. After that, I introduce the confusion matrix, a mechanism for tracking predicted versus actual values. Then, I talk about a variety of measures and how we can derive them from the confusion matrix.

The trickiest part of the confusion matrix measures is just remembering which measures comport to which combinations in the matrix. The second-trickiest part of the confusion matrix is that R and Python invert them, so reading across the top row in R is equivalent to reading down the first column in Python.

Comments closed

Diagnosing DirectQuery Connection Limit Performance Problems

Chris Webb does a bit of sleuthing:

A few months ago I blogged about the new limits available for the Maximum Connections Per Data Source property in Premium and why the number of connections that a DirectQuery semantic model can open to a data source is so important for report performance. At that time, however, there was no way for you to know whether the performance of your reports was being affected by a lack of available connections. The good news is that, with the announcement this week of the new Execution Metrics event in Log Analytics and Profiler, you can now see when this is happening.

Read on for an illustration of the problem and how you can resolve it.

Comments closed

Window Functions and DAX

Marco Russo and Alberto Ferrari explain how window functions work in DAX:

Window functions like OFFSET and WINDOW return rows from a table based on the current row. For example, OFFSET (-1) returns the previous row. The main question is: how does OFFSET determine the current row? Intuitively, it searches in the row context and in the filter context for values of columns, and it determines the current row. Unfortunately, that intuitive understanding is missing many details. Although it is easy to intuit the current row in simple queries, things are sometimes more complex. To make sense of the current row in a non-trivial scenario, you need to understand apply semantics.

Read on to learn more, especially when there are partition groups (think the PARTITION BY section in a T-SQL window clause).

Comments closed

Data Connections in Microsoft Fabric

Soheil Bakhshi takes us through the ins and outs of data connections:

Managing data connections in Microsoft Fabric can be challenging if you’re unsure where to start. This blog post and its detailed YouTube video will help you find, manage, and share the existing data connections, making your workflow more efficient and streamlined. A meaningful use case for this feature is to reuse the existing connections leading to more controlled connections to the data sources. More on this later in this blog.

Click through for the article and link to the video.

Comments closed

An Introduction to pg_timeseries

Samay Sharma and Jason Petersen have an announcement:

We are excited to launch pg_timeseries: a PostgreSQL extension focused on creating a cohesive user experience around the creation, maintenance, and use of time-series tables. You can now use pg_timeseries to create time-series tables, configure the compression and retention of older data, monitor time-series partitions, and run complex time-series analytics functions with a user-friendly syntax. pg_timeseries is open-sourced under the PostgreSQL license and can be added to your existing PostgreSQL installation or tried as a part of the Timeseries Stack on Tembo Cloud.

Read on to learn more about how it works. The syntax and concepts do remind me a good bit of InfluxDB, as well.

Comments closed

The Power of the Check Constraint

Joe Celko defends the honor of the check constraint:

The problem is that newer SQL programmers (and even some more experienced ones) don’t take advantage of these constraints. Consider this slight rewrite of an actual posting on an SQL forum. In the original, the poster thought that identifiers should be integers and that ISO–11179 naming rules do not apply, so I cleaned it up a little, but this will make my point.

Click through to learn more about check constraints.

Comments closed

Building a Table from a List in DAX

Marco Russo provides an option:

DAX is not like M when it comes to data manipulation, and it is not supposed to do that. However, if you need something in DAX similar to Table.FromList in M, this blog post is for you.

If you have a list of values in a string in DAX and you want to obtain a table with one row for each item in the list, you can do the following:

Definitely an example of “There’s a better way to do this than to use DAX” but sometimes you’re stuck, I suppose.

Comments closed

The Postgres Portal: Executor and Process Utility

Cary Huang digs into Postgres:

I wrote another blog to briefly explain the responsibility of each query processing stage. You can find it here. In this blog. There is one that focuses primarily on the planner module that you can find here. In this blog, we will focus on the executor part of the query processing, which is the entry point to “communicate” with other PostgreSQL internal modules to collect the correct data for the query.

In fact, executor may not be the best term to describe this stage. In the PostgreSQL source code, it is actually “packaged” in another object called “portal”. Often, and it has 2 potential paths to take, the “executor” path, or the “process utility” path. Very often, we just call this stage as “executor” rather than “portal”, because executor handles most of the DML query types that involve SELECT, INSERT, UPDATE, DELETE…etc and it has to process them according to the query plan created by the “planner” module.

Read on to learn more about the portal and its two component parts.

Comments closed