Category: R

This results in a row per ride and visualises pretty well in SSMS. If you are familiar with the geography of London you can make out the river Thames toward the centre of the image and Regents Park towards the top left:

This could be overlaid on a shape file of London or a map from another provider such as Google Maps or Mapbox.

However, when you try to load the dataset into Power BI, you find that Power BI does not natively support Geography data types. There is an idea you can vote on here to get them supported: https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/12257955-support-sql-server-geometry-geography-data-types-i

Hit up that idea link if you want to see geography type support within Power BI.

Comments closed

Installing R Tools for Visual Studio 2017

Published 2018-05-07 by Kevin Feasel

Tech Junkie Blog shows how to install R tooling within Visual Studio 2017:

In this post we are going to go over the steps to install R Tools For Visual Studio 2017. RStudio has a development environment that is bare bones for the free version. Visual Studio 2017 offers a more robust development environment if you download the R Tools feature.

Here are the steps to install R Tools for Visual Studio:

R Tools for Visual Studio 2015 is still a separate download.

Comments closed

Microsoft R Open 3.4.4

Published 2018-05-03 by Kevin Feasel

David Smith announces Microsoft R Open 3.4.4:

An update to Microsoft R Open (MRO) is now available for download on Windows, Mac and Linux. This release upgrades the R language engine to version 3.4.4, which addresses some minor issues with timezone detection and some edge cases in some statistics functions. As a maintenance release, it’s backwards-compatible with scripts and packages from the prior release of MRO.

MRO 3.4.4 points to a fixed CRAN snapshot taken on April 1 2018, and you can see some highlights of new packages released since the prior version of MRO on the Spotlights page. As always, you can use the built-in checkpoint package to access packages from an earlier date (for reproducibility) or a later date (to access new and updated packages).

David also spills the beans on when we’ll see MRO 3.5.0.

Comments closed

Toward Interpretable Machine Learning

Published 2018-05-02 by Kevin Feasel

Cristoph Molnar shows off a couple of R packages which help interpret ML models:

Machine learning models repeatedly outperform interpretable, parametric models like the linear regression model. The gains in performance have a price: The models operate as black boxes which are not interpretable.

Fortunately, there are many methods that can make machine learning models interpretable. The R package imlprovides tools for analysing any black box machine learning model:

Feature importance: Which were the most important features?

Feature effects: How does a feature influence the prediction? (Partial dependence plots and individual conditional expectation curves)

Explanations for single predictions: How did the feature values of a single data point affect its prediction? (LIME and Shapley value)

Surrogate trees: Can we approximate the underlying black box model with a short decision tree?

The iml package works for any classification and regression machine learning model: random forests, linear models, neural networks, xgboost, etc.

This is a must-read if you’re getting into model-building. H/T R-Bloggers

Comments closed

Creating Seaborn Plots With R

Published 2018-04-26 by Kevin Feasel

Abdul Majed Raja shows how to call Python from R and build plots using the Seaborn Python package:

The reticulate package provides a comprehensive set of tools for interoperability between Python and R. The package includes facilities for:

Calling Python from R in a variety of ways including R Markdown, sourcing Python scripts, importing Python modules, and using Python interactively within an R session.

Translation between R and Python objects (for example, between R and Pandas data frames, or between R matrices and NumPy arrays).

Flexible binding to different versions of Python including virtual environments and Conda environments.

Reticulate embeds a Python session within your R session, enabling seamless, high-performance interoperability.

The more common use of reticulate I’ve seen is running TensorFlow neural networks from R.

Comments closed

Creating Map Plots With ggmap

Published 2018-04-25 by Kevin Feasel

Laura Ellis shows how to use the ggmap package to create choropleth maps in R:

In the last map, it was a bit tricky to see the density of the incidents because all the graphed points were sitting on top of each other. In this scenario, we are going to make the data all one color and we are going to set the alpha variable which will make the dots transparent. This helps display the density of points plotted.

Also note, we can re-use the base map created in the first step “p” to plot the new map.

Check it out. This is an introduction to creating choropleths, making it a good start.

Comments closed

R 3.5.0 Released

Published 2018-04-24 by Kevin Feasel

Tal Galili announces that R 3.5.0 is now available:

By default the (arbitrary) signs of the loadings from princomp() are chosen so the first element is non-negative.
If –default-packages is not used, then Rscript now checks the environment variable R_SCRIPT_DEFAULT_PACKAGES. If this is set, then it takes precedence over R_DEFAULT_PACKAGES. If default packages are not specified on the command line or by one of these environment variables, then Rscript now uses the same default packages as R. For now, the previous behavior of not including methods can be restored by setting the environment variable R_SCRIPT_LEGACY to yes.
When a package is found more than once, the warning from find.package(*, verbose=TRUE) lists all library locations.
POSIXt objects can now also be rounded or truncated to month or year.

Click through for the long, long list of changes. H/T R-Bloggers

Comments closed

Issues Starting ML Services

Published 2018-04-20 by Kevin Feasel

Jen Stirrup has a quick rundown of some reasons why Machine Learning Services might give you an error when you try to start it up:

Msg 39023, Level 16, State 1, Procedure sp_execute_external_script, Line 1 [Batch Start Line 3]

‘sp_execute_external_script’ is disabled on this instance of SQL Server. Use sp_configure ‘external scripts enabled’ to enable it.

Msg 11536, Level 16, State 1, Line 4

EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.

Grr! What’s happened here? We had installed R as part of the SQL installation, and we had run the command to enable it, too.

Click through for reasons. One thing which might affect a small percentage of people is that ML Services creates a folder each time you run an R query. Those folders are easy to clean up, and each time the Launchpad service starts up, it deletes the old folders as a step. The problem is that if you have a huge number (tens or hundreds of thousands), it might not get finished deleting in time and the service will fail. Deleting the folders manually does the trick and the service can start up once more.

Comments closed

Using Have I Been Pwned In R

Published 2018-04-19 by Kevin Feasel

Maelle Salmon shows us how to use the HIBPwned library in R:

The alternative title of this blog post is HIBPwned version 0.1.7 has been released! W00t!. Steph’s HIBPwned package utilises the HaveIBeenPwned.com API to check whether email addresses and/or user names have been present in any publicly disclosed data breach. In other words, this package potentially delivers bad news, but useful bad news!

This release is mainly a maintenance release, with some cool code changes invisible to you, the user, but not only that: you can now get account_breaches for several accounts in a data.frame instead of a list, and you’ll be glad to know that results are cached inside an active R session. You can read about more functionalities of the package in the function reference.

Wouldn’t it be a pity, though, to echo the release notes without a nifty use case? Another blog post will give more details about the technical aspects of the release, but here, let’s make you curious! How many CRAN package maintainers have been pwned?

Read on to find out that answer.

Comments closed

Tidy Anomaly Detection With Anomalize

Published 2018-04-18 by Kevin Feasel

Abdul Majed Raja walks us through an example using the anomalize package:

One of the important things to do with Time Series data before starting with Time Series forecasting or Modelling is Time Series Decomposition where the Time series data is decomposed into Seasonal, Trend and remainder components. anomalize has got a function time_decompose() to perform the same. Once the components are decomposed, anomalize can detect and flag anomalies in the decomposed data of the reminder component which then could be visualized with plot_anomaly_decomposition() .
btc_ts %>% 
  time_decompose(Price, method = "stl", frequency = "auto", trend = "auto") %>%
  anomalize(remainder, method = "gesd", alpha = 0.05, max_anoms = 0.2) %>%
  plot_anomaly_decomposition()
As you can see from the above code, the decomposition happens based on ‘stl’ method which is the common method of time series decomposition but if you have been using Twitter’s AnomalyDetection, then the same can be implemented in anomalize by combining time_decompose(method = “twitter”) with anomalize(method = "gesd"). Also the ‘stl’ method of decomposition can also be combined with anomalize(method = "iqr") for a different IQR based anomaly detection.

Read on to see what else you can do with anomalize.

Comments closed