Press "Enter" to skip to content

Category: R

HIBPwned

Steph Locke has a new CRAN package out:

HIBPwned is a feature complete R package that allows you to use every (currently) available endpoint of the API. It’s vectorised so no need to loop through email addresses, and it requires no fiddling with authentication or keys.

You can use HIBPwned to do things like:

  1. Set up your own notification system for account breaches of myriad email addresses & user names that you have

  2. Check for compromised company email accounts from within your company Active Directory

  3. Analyse past data breaches and produce charts like Dave McCandless’ Breach chart

The regular service is extremely useful and Steph’s wrapper looks like it’s worth checking out.

Comments closed

Power BI + Interactive R Charts

Leila Etaati shows us how to integrate R charts in the Power BI experience:

In previous videos you’ve learned that we can demonstrate R visualization in Power BI, In this video you will learn how R visualization is working interactively with other elements in Power BI report. In fact Power BI works with R charts as a regular visualization and highlighting and selecting items in other elements of report will effect on that. Here is a quick video about this functionality

Check out the five-minute video.

Comments closed

Finding Malicious Domains

Rafael San Miguel Carrasco uses dimensionality reduction to figure out if a domain is malicious:

Dimensionality reduction is a common techique to visualize observations in a dataset, by combining all features into two, that can then be used to draw the observation in an scatter plot.

One popular algorithm that implements this technique is PCA (Principal Components Analysis), which is available in R through the prcomp() function.

The algorithm was applied to observations of sthe dataset, and ggplot2’s geom_point() function was used to draw the results in a 2D chart.

I would want to see this done for a couple hundred thousand domains, but I do like the idea of taking advantage of statistical modeling tools to find security threats.

Comments closed

Machine Learning Packages In R

Khushbu Shah discusses good R packages to help with your machine learning projects:

If missing values are something which haunts you then MICE package is the real friend of yours.

When we face an issue of missing values we generally go ahead with basic imputations such as replacing with 0, replacing with mean, replacing with mode etc. but each of these methods are not versatile and could result into a possible data discrepancy.

MICE package helps you to impute missing values by using multiple techniques, depending on the kind of data you are working with.

I’d heard of a couple of these, but most of them are new to me.

Comments closed

San Francisco Crime Analysis

Vimal Natarajan shows off some R charts using crime incident data:

By analyzing the plot above, we can arrive at the following insights:

  • The number of crimes steadily decline from midnight and are at the lowest during the early morning hours and then they start increasing and peak around 6 PM in the evening. This is the same insight we arrived in my previous analysis but here we have categorized by the Police district and still see the same pattern.

  • As seen in the previous plot, Park and Richmond districts have the lowest number of crimes throughout the day.

  • As highlighted in red in the plot above, the maximum number of crimes happens in Southern district around 6 PM in the evening.

I would prefer to see code here, but it does serve to give you an idea of what R can do.

Comments closed

Flood Visualization

David Smith points out an animated flood chart using R:

As more settlements in Texas and France are impacted by severe flooding, this is a good time to thank the hydrologists at the NOAA who forecast river level rises in advance and give residents in affected areas time to move to higher ground. Along with topgraphic, rainfall, and weather data, monitoring stations maintained by NOAA and the USGS along rivers provide critical real-time information about river levels. NOAA scientists access this data using the dataRetrieval package for R, which they then incorporate into flood prediction models and use to generate animations like this one of the flood of the Delaware in February this year

Looks like I’ve got a new blog to follow…

Comments closed

R: Using Images As Labels

Jonathan Carroll shows how to use images as labels in R:

There are probably very few cases for which this is technically a good idea (trying to be a featured author on JunkCharts might very well be one of those reasons). Nonetheless, there are at least a couple of requests for this floating around on stackoverflow; here and here for example. I struggled to find any satisfactory solutions that were in current working order (though perhaps my Google-fu has failed me).

Jonathan is rather against this idea, and it does seem like the answer is a hack.  I suppose the real answer is “sometimes an image isn’t worth a thousand words.”

Comments closed

R For The Microsoft Developer

Ginger Grant has a great intro on setting up R Tools for Visual Studio:

For those people who haven’t made the decision as far as which tool to use, let me offer two compelling reasons to pick Visual Studio [VS] instead of R Studio: Intellisense and Improved Debugging Tools. R studio does not have intellisense and it is not possible to debug your code by stepping through it in the manner that many developers of VS are already quite familiar. You will need to configure VS to use R tools, which are detailed below.

Those are nice features.  I’m still a big fan of R Studio, but have seen big improvements in R Tools for Visual Studio, so I imagine I’ll make the switch by the end of the year.

Comments closed

Crime Analysis

Raghavan Madabusi combines Zeppelin, R, and Spark to perform crime analysis:

Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and R.

This links to a rather long post on how to set up and use all of these pieces.  I’m more familiar with Jupyter than Zeppelin, but regardless of the notebook you choose, this is a good exercise to become familiar with the process.

Comments closed

Predictive Maintenance

David Smith shows off a predictive maintenance gallery for dealing with aircraft engines:

In each case, a number of different models are trained in R (decision forests, boosted decision trees, multinomial models, neural networks and poisson regression) and compared for performance; the best model is automatically selected for predictions.

On a related note, Microsoft recently teamed up with aircraft engine manufacturer Rolls-Royceto help airlines get the most out of their engines. Rolls-Royce is turning to Microsoft’s Azure cloud-based services — Stream Analytics, Machine Learning and Power BI — to make recommendations to airline executives on the most efficient way to use their engines in flight and on the ground. This short video gives an overview.

Check out the data set and play around a bit.

Comments closed