Press "Enter" to skip to content

Day: January 12, 2024

An Overview of Clustering Techniques in R

Peter Laurinec gives us an overview:

Clustering is a very popular technique in data science because of its unsupervised characteristic – we don’t need true labels of groups in data. In this blog post, I will give you a “quick” survey of various clustering methods applied to synthetic but also real datasets.

Read on for a quick description of what clustering is and a few use cases. Then, Peter dives into a variety of techniques and important things you should know about them. H/T R-Bloggers.

Comments closed

Benchmarking Cumulative Function Speed in TidyDensity

Steven Sanderson charts performance:

Statistical analysis often involves calculating various measures on large datasets. Speed and efficiency are crucial, especially when dealing with real-time analytics or massive data volumes. The TidyDensity package in R provides a set of fast cumulative functions for common statistical measures like mean, standard deviation, skewness, and kurtosis. But just how fast are these cumulative functions compared to doing the computations directly? In this post, I benchmark the cumulative functions against the base R implementations using the rbenchmark package.

Click through for the functions under test and how they fare.

Comments closed

Canceling a Power BI Dataflow Gen2 Refresh

Sandeep Pawar has a script for us:

At the time of writing this blog, it is not possible to cancel a Dataflow Gen2 (DFg2) refresh using the UI. This is a temporary limitation that I expect will be resolved soon. DFg2 can be resource intensive, and if the refresh takes longer than expected, it may consume a significant amount of CUs. Thankfully, you can use the Power BI Rest API to cancel it. My friend Alex Powers already has a PowerShell script that you can use. You can also use the Power BI VS Code extension by Gerhard Brueckl.

But I would like to show you how you can do this using the PowerBIRestClient in the latest version of Semantic-Link (v0.5.0).

Read on to see what this Python script does and how you can use it.

Comments closed

Create and Connect to a Fabric Data Warehouse

Olivier Van Steenlandt builds a warehouse:

In this data recipe series, Microsoft Fabric – Data Warehouse will be explored. As a starting point, a blank Fabric workspace is used. You can sign up for a free Fabric trial by using the following URL: Data Analytics | Microsoft Fabric

In this data recipe, we will create a brand-new Data Warehouse in Fabric. Once created, we will connect to our Data Warehouse using Azure Data Studio.

Click through for the step-by-step process.

Comments closed

Heps, Clustered Indexes, and Non-Clustered Indexes

Erik Darling starts a new series:

Some of the best questions I get some clients, conference attendees, and random email, are about how to design indexes.

A lot of developers out there have a rather foggy picture of exactly how indexes work. They’re all seen phone books, and drawings of B-Tree indexes, but some common things still escape them.

In this post, I’m going to talk about a few things like I’m speaking to someone who has never created a table before.

The problem with the phone book analogy is that there’s an entire generation of people who haven’t used phone books.

Also, Erik has his own spin on the classic NUSE for cluster indexing.

2 Comments