R – Page 18 – Curated SQL

Working with Date Sequences in R

Published 2024-02-15 by Kevin Feasel

Steven Sanderson isn’t satisfied with a single date:

In the world of data analysis and manipulation, working with dates is a common and crucial task. Whether you’re analyzing financial data, tracking trends over time, or forecasting future events, understanding how to generate date sequences efficiently is essential. In this blog post, we’ll explore three powerful R packages—lubridate, timetk, and base R—that make working with dates a breeze. By the end of this guide, you’ll be equipped with the knowledge to generate date sequences effortlessly and efficiently in R.

Click through for several ways to generate date sequences, including weekly sequences.

Comments closed

Checking for Date Columns in R

Published 2024-02-12 by Kevin Feasel

Steven Sanderson is looking for a date:

As an R programmer, you may often encounter datasets where you need to determine whether a column contains date values. This task is crucial for data cleaning, manipulation, and analysis. In this blog post, we’ll explore various methods to check if a column is a date in R, with a focus on using the lubridate package and the ts_is_date_class() function from the healthyR.ts package.

Click through to see how, using lubridate and healthyR.

Comments closed

Finding the Week Number in R

Published 2024-02-08 by Kevin Feasel

Steven Sanderson checks the week:

When working with dates in R, you may need to extract the week number for any given date. This can be useful for doing time series analysis or visualizations by week.

In this post, I’ll demonstrate how to get the week number from dates in R using both base R and the lubridate package. I’ll provide simple examples so you can try it yourself.

Steven also makes a good point about ISO weeks (which are common in Europe) versus calendar weeks.

Comments closed

Translating Excel Date Values into R Dates

Published 2024-02-07 by Kevin Feasel

Steven Sanderson reads an Excel file:

Have you ever battled with Excel’s quirky date formats in your R projects? If so, you’re not alone! Those cryptic numbers can be a real headache, but fear not, fellow R warriors! Today, we’ll conquer this challenge and transform those numbers into beautiful, usable dates.

This is a common pain point in a lot of libraries and Steven shows how to solve it in R using a pair of functions.

Comments closed

New Features in data.table

Published 2024-02-06 by Kevin Feasel

John MacKintosh takes us through what’s new in R’s data.table package:

The newest version of data.table has hit CRAN, and there are lots of great new features.

Among them, a %notin% function, a new let function that can be used instead of := ( I wasn’t too fussed about this originally but have tried it a few times today and I may well adopt it – although I do like that := really stands out in my code when assigning / updating variables).

Read on for the big changes. H/T R-Bloggers.

Comments closed

Contrasting RDS and Parquet in R

Published 2024-02-05 by Kevin Feasel

Colin Gillespie contrasts two file formats:

The RDS format is a binary file format, native to R. It has been part of R for many years, and provides a convenient method for saving R objects, including data sets.

The obvious question is which file format should you use for storing tabular data? RDS or parquet? For this comparison, I’m interested in the following characteristics:

the time required to save the file;

the file size;

the time required to load the file.

Read on for the throw-down.

Comments closed

Where the Bayesian and Frequentist Approaches Meet

Published 2024-02-02 by Kevin Feasel

Sebastian Sauer bridges the gap:

However, a disadvantage of Bayes analysis, at least at its current state, is that it has higher technical and computational demands. For beginners in particular, this may present a substantial (entry) burden. Teaching statistics, I have found that students (and many colleagues) have had difficulties installing Stan (particularly the C++ compiler needed in order to run Stan); Stan is the probabilistic programming language which many front-end Bayes engines use such as brms in R.

Thus, the installation process being not so user-friendly, a burden is placed for beginners which may prevent using Bayes methods.

In that light, this post explores the numerical simarilities of Bayes regression models and Frequentis models. The idea is to use a Frequentist regression model as a proxi for a full Bayesian analysis. The value added is the quick computation and the simple technical setup.

Click through for the conditions where you’ll find very similar results, as well as a few examples of it in action.

Comments closed

Row Re-Ordering in Shiny Apps

Published 2024-02-02 by Kevin Feasel

Stephane Laurent does a bit of work:

The ‘RowReorder’ extension of datatables is available in the DT package. This extension allows to reorder the rows of a DT table by dragging and dropping. However, if you enable this extension in a Shiny app for a table using the server-side processing (option server=TRUE in renderDT), that won’t work: each time the rows are reordered, they will jump back to their original locations.

Read on to see what you need to do in that case, as well as an example of how to do it. H/T R-Bloggers.

Comments closed

Extracting the Month from a Date with R

Published 2024-02-01 by Kevin Feasel

Steven Sanderson asks what month it is:

Greetings fellow R enthusiasts! Today, we’re diving into a fundamental task: extracting the month from a date in R. Whether you’re new to R or a seasoned pro, understanding how to manipulate dates is essential. We’ll explore two popular methods: using base R and the powerful lubridate package. So, let’s roll up our sleeves and get started!

Read on for several examples across two solution spaces.

Comments closed

Preserving Non-Occurring Levels in R

Published 2024-01-31 by Kevin Feasel

Sebastian Sauer saves the levels:

The summary table does not show the level TRUE, as it is not occuring in the data. This can be problematic: If the data is unknown before summarizing and you would expect that both/all levels (TRUE, FALSE) occur. Just imagine that a subsequent function will count the level TRUE and the level FALSE. If one level is missing, your system may brake down.

Click through for a solution, where, even if your dataset is missing a particular level (value of a categorical variable), you will still see it in the final output. That way, if you train a model on this data and the new level shows up in your test dataset or in the wild, it won’t cause an error.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: R