Press "Enter" to skip to content

Category: R

Logging in R

Himanshu Gupta walks us through the log4r package:

One of the most important aspect of an application is Logging. Since logs provide visibility into the behavior of a running app. Hence logs play a vital role in maintenance and enhancement of an application.

However, most of us are already aware with the importance of logging. That’s why we add them in our applications. But one thing that we are not aware of is that, the application should never be concerned with routing or storage of logs, i.e., it should not attempt to write to or manage logs or log files. Instead, each running process, within the application, writes logs to a stdout. In local environment, we can view the logs in the console whereas in staging/production environment, logs can be collated together in .log file(s).

Hence, in this blog post we will learn – how to collect, customize, and standardize R logs using log4r? But first let’s know what log4r is.

Read on for a demonstration of log4r and some of the settings you can choose.

Leave a Comment

VARCHAR Columns and Bytecode Version Mismatch in R

Dave Mason runs through a tricky problem with SQL Server Machine Learning Services:

During my testing, I’ve found R handles CHAR and VARCHAR data within the input data set as long as the ASCII codes comprising the data is in the range from 0 to 127. This much is not surprising–those are the character codes for the ASCII table. Starting with character code 128, R begins having some trouble. 

Read on to see the problem. Dave’s advice at the end is sound (and frankly, my advice for any string data in SQL Server).

Leave a Comment

Generating Random Numbers with R

The folks at Data Sharkie walk us through random number generation in R:

Why is random numbers generation important and where is it used?

Random numbers generations have application in various fields like statistical sampling, simulation, test designs, and so on. Generally, when a data scientist is in need of a set of random numbers, they will have in mind

R programming language allows users to generate random distributed numbers with a set of built-in functions: runif()rnorm()rbinom().

Read on to generate random numbers across two separate distributions.

Leave a Comment

Visualizing a Single Variable in R

Michaelino Mervisiano takes us through the types of visuals we can create to understand a single variable in R:

How to create a histogram in R? And what information that we can get from histogram?
Histogram shows a frequency distribution. It is a great graph for showing the mode, the spread, and the symmetry (skewness) of your data. Here is a histogram of 1,000 random points drawn from a normal distribution with a mean of 2.5

Of course I don’t like option number 4 and would replace it with something else (column/bar charts, Cleveland dot plots, or stacked column/bar depending on what you’re trying to observe). But this is a good way of thinking about how you can visualize a variable.

Leave a Comment

Tuning Random Forest HyperParameters with R

Julia Silge gives us an idea of how to tune random forest hyperparameters in R:

Our modeling goal here is to predict the legal status of the trees in San Francisco in the #TidyTuesday dataset. This isn’t this week’s dataset, but it’s one I have been wanting to return to. Because it seems almost wrong not to, we’ll be using a random forest model! 🌳

Let’s build a model to predict which trees are maintained by the San Francisco Department of Public Works and which are not. We can use parse_number() to get a rough estimate of the size of the plot from the plot_size column. Instead of trying any imputation, we will just keep observations with no NA values.

Click through to some data exploration, the initial model, and a process for using Grid Search with the caret package.

Leave a Comment

Using the Tune Package in R for Hyperparamter Optimization

Abderrahim Lyoubi-Idrissi takes us through a Bayesian approach to tune hyperparameters:

In contrast to the model parameters, which are discovered by the learning algorithm of the ML model, the so called Hyperparameter(HP) are not learned during the modeling process, but specified prior to training.

Hyperparameter tuning is the task of finding optimal hyperparameter(s) for a learning algorithm for a specific data set and at the end of the day to improve the model performance.

Abderrahim contrasts two different methods here: Grid Search and Bayesian Optimization. Definitely an interesting read if you develop data science models.

Leave a Comment

Faster Package Installation in R

Colin Gillespie has a few tips for making package installation in R a bit faster:

The bigger picture is that package installation time is starting to become more of an issue for a number of reasons. For example, packages are getting larger and more complex (tidyverse and friends), so installation just takes longer. Or we are using more continuous integration strategies such as Travis or GitLab-CI, and want quick feedback. Or we are simply updating a large number of packages via update.packages(). This is a problem we often solve for our clients – optimising their CI/CD pipelines.

The purpose of this blog post is to pull together a few different methods for tackling this problem.

Click through for the guidance.

Leave a Comment

Color Palettes in R

Paul van der Laken talks to us about paleteer:

I often cover tools to pick color palettes on my website (e.g. herehere, or here) and also host a comprehensive list of color packages in my R programming resources overview.

However, paletteer is by far my favorite package for customizing your colors in R!

The paletteer package offers direct access to 1759 color palettes, from 50 different packages!

Just make sure to run your graphics through something like Coblis afterward to ensure that they’re CVD-friendly. H/T R-Bloggers.

Leave a Comment

Correlation in easystats

The easystats team announces a new R package:

The easystats project continues to grow with its more recent addition, a package devoted to correlations. Check-out its webpage here!

It’s lightweight, easy to use, and allows for the computation of many different kinds of correlations, such as partial correlations, Bayesian correlations, multilevel correlations, polychoric correlations, biweightpercentage bend or Sheperd’s Pi correlations (types of robust correlation), distance correlation (a type of non-linear correlation) and more, also allowing for combinations between them (for instance, Bayesian partial multilevel correlation).

I’d recommend reading the examples on the GitHub repo due to formatting. Looks quite interesting. H/T R-Bloggers.

Leave a Comment