Reticulate: Python-R Interop

Adnan Fiaz walks us through an example of using the reticulate library to call Python from R:

So what exactly does reticulate do? It’s goal is to facilitate interoperability between Python and R. It does this by embedding a Python session within the R session which enables you to call Python functionality from within R. I’m not going to go into the nitty gritty of how the package works here; RStudio have done a great job in providing some excellent documentation and a webinar. Instead I’ll show a few examples of the main functionality.

Just like R, the House of Python was built upon packages. Except in Python you don’t load functionality from a package through a call to librarybut instead you import a module. reticulate mimics this behaviour and opens up all the goodness from the module that is imported.

This is a good intro to a package which is already useful but I think will be even better over time as R & Python interoperability becomes the norm.  H/T R-Bloggers

Building A Neural Network In R With Keras

Pablo Casas walks us through Keras on R:

One of the key points in Deep Learning is to understand the dimensions of the vector, matrices and/or arrays that the model needs. I found that these are the types supported by Keras.

In Python’s words, it is the shape of the array.

To do a binary classification task, we are going to create a one-hot vector. It works the same way for more than 2 classes.

For instance:

  • The value 1 will be the vector [0,1]
  • The value 0 will be the vector [1,0]

Keras provides the to_categorical function to achieve this goal.

This example doesn’t include using CUDA, but the data sizes are small enough that it doesn’t matter much.  H/T R-Bloggers

Running Google Translate Inside SQL Server ML Services

David Fowler shows us how to use SQL Server Machine Learning Services to call the Google Translate API via Python:

In my recent post, Installing External Modules into SQL Server’s Python I had a look at just how simple it is to import external modules into Python so that they can be used within SQL Server.

In this post I’d like to show you a little something to demonstrate how we can integrate one of these modules into SQL Server and just how powerful this can be.

This is really just for fun and it may not really be something that you’d want to put out into production but when I happened to notice that there’s a Python module that interfaces with Google Translate, I wondered to myself if it’d be possible to write a procedure that could take a string and automatically translate it into our native language.

There are a couple of setup steps but once you get past them, it’s easy going.

Network Analysis With Python In Power BI

Tori Tompkins shows us how to use the NetworkX package in Power BI:

The data I used was created to demonstrate this task in Power BI but there are many real-world network datasets to experiment with provided by Stanford Network Analysis Project. This small dummy dataset represents a co-purchasing network of books.

The data I loaded into Power BI consisted of two separate CSVs. One, Books.csv, consisted of metadata pertaining to the top 40 bestselling books according to Wikipedia and their assigned IDs. The other, Relationship.csv, was an edgelist of the book IDs which is a popular method for storing/ delivering network data. The graph I wanted to create was an undirected, unweighted graph which I wanted to be able to cross-filter accurately. Because of this, I duplicated this edgelist and reversed the columns so the ToNodeId and FromNodeId were swapped. Adding this new edge list onto the end of the original edgelist has created a dataset with can be filtered on both columns later down the line. For directed graphs, this step is unnecessary and can be ignored.

Once loaded into Power BI, I duplicated the Books table to create the following relationship diagram as it isn’t possible to replicate the relationship between FromNodeId to Book ID and ToNodeId to Book ID with only one Books table.

Read on for an example using this data set.

Installing External Python Modules In SQL Server

Kevin Feasel

2018-09-04

Python

David Fowler shows how to import an external Python module into SQL Server Machine Learning Services:

But how do we go about installing them into SQL Server?  Now I’m a DBA and not a Python wizz so had to do a little digging to figure it out but to be honest, it’s fairly easy.

I don’t know how many other DBAs know that we can install these modules or even how to do it so I thought I’d write up a quick post explaining it.

First things first, you’re going to need to do this from your SQL Server.

Read on for the instructions.

Explaining Keras Models With LIME

Shirin Glander shares her slide deck on explaining Keras image classification models with LIME:

Here I am sharing the slides for a webinar I gave for SAP about Explaining Keras Image Classification Models with LIME.

Slides can be found here: https://www.slideshare.net/ShirinGlander/sap-webinar-explaining-keras-image-classification-models-with-lime

Read on for links to additional resources as well.

Examples Of Charts In Different Languages

David Smith points out a great repository of information on generating different types of charts in different libraries:

The visualization tools include applications like Excel, Power BI and Tableau; languages and libraries including R, Stata, and Python’s matplotlib); and frameworks like D3. The data visualizations range from the standard to the esoteric, and follow the taxonomy of the book Data Visualisation (also by Andy Kirk). The chart categories are color coded by row: categorical (including bar charts, dot plots); hierarchical (donut charts, treemaps); relational (scatterplots, sankey diagrams); temporal (line charts, stream graphs) and spatial (choropleths, cartograms).

Check out the Chartmaker Directory.

Performing Linear Regression With Power BI

Jason Cantrell shows how to create a simple linear regression in Power BI:

Linear Regression is a very useful statistical tool that helps us understand the relationship between variables and the effects they have on each other. It can be used across many industries in a variety of ways – from spurring value to gaining customer insight – to benefit business.

The Simple Linear Regression model allows us to summarize and examine relationships between two variables. It uses a single independent variable and a single dependent variable and finds a linear function that predicts the dependent variable values as a function of the independent variables.

If you want real linear regression, drop in an R or Python script.

Naive Bayes In Python

Kislay Keshari explains the Naive Bayes algorithm and shows an implementation in Python:

Naive Bayes in the Industry

Now that you have an idea of what exactly Naive Bayes is and how it works, let’s see where it is used in the industry.

RSS Feeds

Our first industrial use case is News Categorization, or we can use the term ‘text classification’ to broaden the spectrum of this algorithm. News on the web is rapidly growing where each news site has its own different layout and categorization for grouping news. Companies use a web crawler to extract useful text from HTML pages of news articles to construct a Full Text RSS. The contents of each news article is tokenized (categorized). In order to achieve better classification results, we remove the less significant words, i.e. stop, from the document. We apply the naive Bayes classifier for classification of news content based on news code.

It’s a good overview of the topic and a particular implementation in Python.  Naive Bayes is a technique which you want in the bag:  there are a lot of techniques which tend to be better in specific domains, but Naive Bayes is easy to implement and usually provides acceptable performance.

A Geometric Depiction Of Covariance

Nikolai Janakiev explains the concept of the covariance matrix using a bit of Python and some graphs:

In this article we saw the relationship of the covariance matrix with linear transformation which is an important building block for understanding and using PCASVD, the Bayes Classifier, the Mahalanobis distance and other topics in statistics and pattern recognition. I found the covariance matrix to be a helpful cornerstone in the understanding of the many concepts and methods in pattern recognition and statistics.

Many of the matrix identities can be found in The Matrix Cookbook. The relationship between SVD, PCA and the covariance matrix are elegantly shown in this question.

Understanding covariance is critical for a number of statistical techniques, and this is a good way of describing it.

Categories

September 2018
MTWTFSS
« Aug  
 12
3456789
10111213141516
17181920212223
24252627282930