Press "Enter" to skip to content

Category: Data Science

New Video: Multi-Class Classification

I have a new video:

In this video, I get past two-class classification and explain how things differ in the multi-class world.

What’s really interesting is that, in many cases, when it comes to code, the answer is “not much.” That’s because libraries like scikit-learn do a lot to smooth over differences between single-class and multi-class classification. But there are still differences that can bite you if you don’t understand how the cases differ.

Comments closed

An Introduction to the healthyR.ai Package

Steven Sanderson explains the purpose of a package:

The ultimate goal really is to make it easier to do data analysis and machine learning in R. The package is designed to be easy to use and to provide a wide range of functionality for data analysis. The package is also meant to help and provide some easy boilerplate functionality for machine learning. This package is in its early stages and will be updated frequently.

It also keeps with the same framework of all of the healthyverse packages in that it is meant for the user to be able to use the package without having to know a lot of R. Many rural hospitals do not have the resources to perform this sort of work, so I am working hard to build these types of things out for them for free.

Read on to see how it works, including several examples of the package in action.

Comments closed

Practical healthyR.ts Examples

Steven Sanderson provides some examples:

Today I am going to go over some quick yet practical examples of ways that you can use the healthyR.ts package. This package is designed to help you analyze time series data in a more efficient and effective manner.

Let’s just jump right into it!

Read on for a few common time series activities, such as testing for stationarity, extracting tends from noise, and performing lagged correlation.

Comments closed

New Video: Online Passive-Aggressive Algorithms

I have a new video:

In this video, I cover the series of classification algorithms with the best possible name: online passive-aggressive algorithms.

I remember, when reading up on this, being incredulous that the idea even worked. But it turns out that it’s actually pretty good in practice, especially on constrained hardware. Still, this is definitely an algorithm you’d want to test in comparison to others before jumping right in, as there’s a risk you can end up with terrible results.

Comments closed

An Introduction to healthyR

Steven Sanderson covers a package:

This article will introduce you to the healthyR package. healthyR is a package that provides functions for analyzing and visualizing health-related data. It is designed to make it easier for health professionals and researchers to work with health data in R. It is an experimental package that is still under active development, so some functions may change in the future along with the package structure and scope.

Unfortunately, the package needs some love and attention. Which I am trying to give it. Given that information, I will be updating the package to include more functions and improve the existing ones. I will also be updating the documentation and adding more examples to help users get started with the package.

So let’s get started!

Read on for that overview, including an explanation of why the package exists and several examples of how to use it.

Comments closed

New Video: The Naive Bayes Set of Algorithms

I have a new video:

In this video, I cover a class of algorithm that is neither particularly naive nor particularly Bayesian: Naive Bayes.

I am a bit tongue in cheek with that description, as technically I’ll give you that the class of algorithms is “naive.” But I do still have some fun with the name and then show how we can use Naive Bayes to build a quick-and-dirty model that’s at least somewhat effective.

Comments closed

Tweedie Distributions and Generalized Linear Modeling

Christian Lorentzen talks about Tweedie distributions:

Tweedie distributions and Generalised Linear Models (GLM) have an intertwined relationship. While GLMs are, in my view, one of the best reference models for estimating expectations, Tweedie distributions lie at the heart of expectation estimation. In fact, basically all applied GLMs in practice use Tweedie distributions with three notable exceptions: the binomial, the multinomial and the negative binomial distribution.

Read on for a bit more about its history and how it ties in with several other distributions.

Comments closed

Distribution Parameter Wrangling in TidyDensity

Steven Sanderson introduces a new set of functions:

Greetings, fellow data enthusiasts! Today, we’re thrilled to unveil a fresh wave of functionalities in the ever-evolving TidyDensity package. Buckle up, as we delve into the realm of distribution statistics!

This update brings a bounty of new functions that streamline the process of extracting key parameters from various probability distributions. These functions adhere to the familiar naming convention util_distribution_name_stats_tbl(), making them easily discoverable within your R workflow.

Read on for the list and an example of how to use them.

Comments closed

Book Review of Bernoulli’s Fallacy

John Mount reviews a book:

First the conclusion: this is a well researched and important book. My rating is a strong buy, and Bernoulli’s Fallacy is already influencing how I approach my work.

My initial “judge the book by its back cover” impression of Bernoulli’s Fallacy was negative. The back cover writes some very large checks that I was initially (and wrongly) doubtful that “its fists could cash.” The thesis is that frequentist statistics (the dominant statistical practice) is far worse than is publicly admitted, and that Bayesian methods are the fix. However, other reviews and the snippets by people I respect (such as Andrew Gelman and Persi Diaconis) convinced me to buy and read the book. And I am glad that I read it. The back cover was, in my revised opinion, fully justified.

Read on for John’s full review of a book that is quite critical of frequentist statistics in favor of Bayesian statistics—so that already makes the book a winner for me.

Comments closed

An Overview of Logistic Regression

I have a new video:

In this video, I provide a primer on logistic regression, including a demystification of the name. Is it regression? Is it classification? Find out!

I have a lot of fun with this “Is logistic regression actually a regression technique, or is it secretly a classification technique?” I think this video is the single clearest explanation I’ve given on that question, which probably says something about my prior explanations.

Comments closed