Press "Enter" to skip to content

Category: R

Creating Map Plots With ggmap

Laura Ellis shows how to use the ggmap package to create choropleth maps in R:

In the last map, it was a bit tricky to see the density of the incidents because all the graphed points were sitting on top of each other.  In this scenario, we are going to make the data all one color and we are going to set the alpha variable which will make the dots transparent.  This helps display the density of points plotted.

Also note, we can re-use the base map created in the first step “p” to plot the new map.

Check it out.  This is an introduction to creating choropleths, making it a good start.

Comments closed

R 3.5.0 Released

Tal Galili announces that R 3.5.0 is now available:

  • By default the (arbitrary) signs of the loadings from princomp() are chosen so the first element is non-negative.

  • If –default-packages is not used, then Rscript now checks the environment variable R_SCRIPT_DEFAULT_PACKAGES. If this is set, then it takes precedence over R_DEFAULT_PACKAGES. If default packages are not specified on the command line or by one of these environment variables, then Rscript now uses the same default packages as R. For now, the previous behavior of not including methods can be restored by setting the environment variable R_SCRIPT_LEGACY to yes.

  • When a package is found more than once, the warning from find.package(*, verbose=TRUE) lists all library locations.

  • POSIXt objects can now also be rounded or truncated to month or year.

Click through for the long, long list of changes.  H/T R-Bloggers

Comments closed

Issues Starting ML Services

Jen Stirrup has a quick rundown of some reasons why Machine Learning Services might give you an error when you try to start it up:

Msg 39023, Level 16, State 1, Procedure sp_execute_external_script, Line 1 [Batch Start Line 3]

‘sp_execute_external_script’ is disabled on this instance of SQL Server. Use sp_configure ‘external scripts enabled’ to enable it.

Msg 11536, Level 16, State 1, Line 4

EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.

Grr! What’s happened here? We had installed R as part of the SQL installation, and we had run the command to enable it, too.

Click through for reasons.  One thing which might affect a small percentage of people is that ML Services creates a folder each time you run an R query.  Those folders are easy to clean up, and each time the Launchpad service starts up, it deletes the old folders as a step.  The problem is that if you have a huge number (tens or hundreds of thousands), it might not get finished deleting in time and the service will fail.  Deleting the folders manually does the trick and the service can start up once more.

Comments closed

Using Have I Been Pwned In R

Maelle Salmon shows us how to use the HIBPwned library in R:

The alternative title of this blog post is HIBPwned version 0.1.7 has been released! W00t!. Steph’s HIBPwned package utilises the HaveIBeenPwned.com API to check whether email addresses and/or user names have been present in any publicly disclosed data breach. In other words, this package potentially delivers bad news, but useful bad news!

This release is mainly a maintenance release, with some cool code changes invisible to you, the user, but not only that: you can now get account_breaches for several accounts in a data.frame instead of a list, and you’ll be glad to know that results are cached inside an active R session. You can read about more functionalities of the package in the function reference.

Wouldn’t it be a pity, though, to echo the release notes without a nifty use case? Another blog post will give more details about the technical aspects of the release, but here, let’s make you curious! How many CRAN package maintainers have been pwned?

Read on to find out that answer.

Comments closed

Tidy Anomaly Detection With Anomalize

Abdul Majed Raja walks us through an example using the anomalize package:

One of the important things to do with Time Series data before starting with Time Series forecasting or Modelling is Time Series Decomposition where the Time series data is decomposed into Seasonal, Trend and remainder components. anomalize has got a function time_decompose() to perform the same. Once the components are decomposed, anomalize can detect and flag anomalies in the decomposed data of the reminder component which then could be visualized with plot_anomaly_decomposition() .

btc_ts %>% 
  time_decompose(Price, method = "stl", frequency = "auto", trend = "auto") %>%
  anomalize(remainder, method = "gesd", alpha = 0.05, max_anoms = 0.2) %>%
  plot_anomaly_decomposition()

As you can see from the above code, the decomposition happens based on ‘stl’ method which is the common method of time series decomposition but if you have been using Twitter’s AnomalyDetection, then the same can be implemented in anomalize by combining time_decompose(method = “twitter”) with anomalize(method = "gesd"). Also the ‘stl’ method of decomposition can also be combined with anomalize(method = "iqr") for a different IQR based anomaly detection.

Read on to see what else you can do with anomalize.

Comments closed

Uploading Data Sets To Azure ML From R

Leila Etaati continues her series on the Azure ML R package by showing how to upload a data set:

There is a function in AzureML package name “workspace” that creates a reference to an AzureML Studio workspace by getting the authentication token and workspace id as below:

to work with other AzureML packages you need to pass this object to them.

for instance for exploring the all experiments in Azure ML there is a function name “experiments” that gets the “ws” object as input to connect the desire azure ml environment and also a filter.

Click through for  more.

Comments closed

The Theory Behind ARIMA

Bidyut Ghosh explains how the ARIMA forecasting method works:

The earlier models of time series are based on the assumptions that the time series variable is stationary (at least in the weak sense).

But in practical, most of the time series variables will be non-stationary in nature and they are intergrated series.

This implies that you need to take either the first or second difference of the non-stationary time series to convert them into stationary.

Bidyut ends with a little bit of implementation in R, but I’d guess that’ll be the focus of part 2.

Comments closed

Slicing In R

John Mount recommends learning about the array slicing system in R:

R has a very powerful array slicing ability that allows for some very slick data processing.

Suppose we have a data.frame “d“, and for every row where d$n_observations < 5 we wish to “NA-out” some other columns (mark them as not yet reliably available). Using slicing techniques this can be done quite quickly as follows.

library("wrapr")

d[d$n_observations < 5, 
  qc(mean_cost, mean_revenue, mean_duration)] <- NA

Read on for more.  In general, I prefer the pipeline mechanics offered with the Tidyverse for readability.  But this is a good example of why you should know both styles.

Comments closed

Matching Order In R

John Mount shows off the match_order function in wrapr:

Often we wish to work with such data aligned so each row in d2 has the same idx value as the same row (by row order) as d1. This is an important data wrangling task, so there are many ways to achieve it in R, such as base::merge()dplyr::left_join(), or by sorting both tables into the same order and then using base::cbind().

However if you wish to preserve the order of the first table (which may not be sorted), you need one more trick.

Click through to see that one additional trick.

Comments closed

Faceting With R And SQL Server ML Services

Marlon Ribunal has a quick example showing how to build faceted plots with SQL Server ML Services and ggplot2:

In my previous post, I have demonstrated how easy it is to create a bar graph in SQL Server 2017 In-Database Machine Learning using  R.

We’re going to build upon that basic graph.

Sometimes doing data analysis would require us to look at an overview of our data across specific partitions, say a year. For example, we want to see how our product groups fare on month-to-month basis across the last 4 years.

In a data analytics perspective, there are quite a handful of data points in this requirement – data aggregate (quantity), monthly periods, and year partitions.

One of the approaches to handle such requirement is by using a facet. Faceting is a way of plotting subsets of data into a matrix of panels based on one or more variables – or facets.

Click through for the example and code.  Facets are quite useful, but they run the risk of misleading if you squeeze too many onto the screen.  The same line can look quite different with a “tall” facet versus a “wide” facet, and that can change how people interpret your visual.

Comments closed