R – Page 46 – Curated SQL

Trying out AutoML in R

Published 2022-05-04 by Kevin Feasel

In this fourth (and hopefully final) entry in my “Icing the Kicker” series of posts, I’m going to jump back to the first post where I used tidymodels to predict whether or not a kick attempt would be iced. However, this time I see if using the h2o AutoML feature and the SuperLearner package can improve the predictive performance of my initial model.

The results are just about what I would have expected: they provide a good floor but a human with knowledge of the data and skill with techniques can still beat out-of-the-box AutoML processes. Still, knowing what that floor is can help a lot: run some AutoML tool for a few minutes/hours/days and you have an easy way of letting the business side know the expected model quality. If AutoML already exceeds expectations, you’re golden. If AutoML is close to expectations (on either end, just above or just below), you as a skilled human should be able to improve things a bit more, especially once you have a chance to analyze what the AutoML processes did. If AutoML is way below business expectations of quality, perhaps this isn’t the best project to spend time on. H/T R-Bloggers.

Comments closed

Playing with gganimate

Published 2022-05-02 by Kevin Feasel

Tomaz Kastrun tries out gganimate:

I firmly believe that animation and transition between different data states can give end-users much better insights and understanding of the data, than a single table with data points or correlation metrics.
With help of ggplot, gganimate, you can quickly create an animation based on your needs. This is a simple IRIS dataset example.

You can find more at the gganimate website. The real downside is that I don’t think it’s being maintained any longer, as the last commit was a year ago.

Comments closed

Comparing R Package Versions with Diffify

Published 2022-05-02 by Kevin Feasel

Clarissa Barratt and Parisa Gregg announce an interesting tool:

You know that sinking feeling that you get when you’re months into a big project and you log in one day and nothing works? Turns out something has updated and things have been removed that you needed and now you need to spend hours-days figuring out what’s changed and your masters deadline is getting closer and … ok, apparently this took me back to a very specific event.
But I’m sure most of that sounds familiar to you if you’ve ever programmed something over a longer period of time.
Over the last few months, Jumping Rivers have been working on a tool that will make it easier to see differences between R package versions: Diffify.

Read on to see it in action. It looks quite useful for troubleshooting issues in which a package suddenly changed API functionality, something which tends to happen frequently in the R and Python worlds.

Comments closed

Combining flashlight and plotly in R

Published 2022-04-26 by Kevin Feasel

Michael Mayer analyzes candidate models:

Since almost all plots in flashlight are constructed with ggplot, it is super easy to turn them into interactive plotly objects: just add a simple ggplotly() to the end of the call.
However… it is not straightforward to show interactive plots in a blog! Thus, we show only screenshots of the resulting plots here and refer to the complete HTML report here: https://mayer79.github.io/flashlight_plotly/flashlight_plotly.html
We will use a sweet dataset with more than 20’000 houses to model house prices by a set of derived features such as the logarithmic living area. The location will be represented by the postal code.

Click through for the blog post or check out the report.

Comments closed

Updates to overviewR

Published 2022-04-25 by Kevin Feasel

Cosima Meyer has updates to the overviewR package:

The package is meant to serve as a Swiss army knife for exploratory data analysis. The basic functions allow you to investigate sample coverage across different time points, missing values across your variables, and also the overlap among two data sets.

Read on for a fairly substantial set of changes. H/T R-Bloggers.

Comments closed

String Concatenation in R

Published 2022-04-22 by Kevin Feasel

Benjamin Smith creates a function:

While it is possible to use the paste() or paste0() for string concatenation. I do understand how it can be messy to deal with, especially when working with loops and/or nested functions. In this short blog I share a remedy for this by writing a special function which can lend for cleaner code as opposed to using paste() or paste0().

It’s not quite as nice as a here string (e.g., @"{FirstName} just referenced the name here string at {UserTime}" user.FirstName DateTime.UtcNow) but this is a good reminder that operator creation in R is pretty easy. H/T R-Bloggers.

Comments closed

Building a 3D Regression Plane with a Scatter Plot in R

Published 2022-04-21 by Kevin Feasel

Sebastian Sauer embraces the third dimension:

Here’s the linear model with 2 predictors, giving us a model that can be visualized in 3D:
lm1 <- lm(mpg ~ hp + disp, data = mtcars)
As is standard, we’ll predict mpg.

Click through for a simple implementation using plotly.

Comments closed

SQL Joins and R Merges

Published 2022-04-18 by Kevin Feasel

Tomaz Kastrun explains how to combine data in two languages:

Merge statement in R language is a powerful, simple, straightforward method for joining data frames. Nevertheless, it also serves with some neat features that give R users fast data wrangling.
I will be comparing this feature with T-SQL language, to show the simplicity of the merge method.

Click through for the comparison.

Comments closed

R and Python Interop via Reticulate

Published 2022-04-18 by Kevin Feasel

Fabian Scheler combines R and Python:

I am way more experienced with R than with Python and prefer to code in this language when possible. This applies, especially when it is about visualizations. Plotly and ggplot2 are fantastic packages that provide a lot of flexibility. However, every language has its limitations, and the best results stem from their efficient combination.
This week, I created the candlestick below, and I think it’s an excellent case study to illustrate a few things:

Read on to learn more about using reticulate to execute Python code and interact with the results in R.

Comments closed

Kubernetes for the R User

Published 2022-04-18 by Kevin Feasel

Roel M. Hogervorst provides an introduction to Kubernetes for R users:

Many R users come from an academic background, statistics and social sciences. That makes you an excellent problem solver with a deep knowledge of problems and a nuanced understanding of the world. You actually know what you are talking about!
But there is a different world, a world where the most important thing is showing an advertisement to as many people as possible. I’m joking, but the computer science world is where ideas like kubernetes were born. And like every other group specific words are used that can be hard to understand without context. That means that you have to use those words to find answers on your questions. This post will introduce some of those words and I have a list at the bottom. And now it is available to all of us, through a cloud provider in your area.

Read on for a light introduction to using Kubernetes.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: R