Author: Kevin Feasel

We have some different languages and we need to first do language detection before we can analyse the sentiment of our phrases
# Construct a request
response<-POST(cogapi, 
               add_headers(`Ocp-Apim-Subscription-Key`=cogapikey),
               body=toJSON(mydata))
Now we need to consume our response such that we can add the language code to our existing data.frame. The structure of the response JSON doesn’t play well with others so I use data.table’s nifty rbindlist. It is a **very good* candidate for purrr but I’m not up to speed on that yet.

Check out the whole post; Steph makes it look easy.

Comments closed

Building A Neural Net

Published 2017-03-01 by Kevin Feasel

Shirin Glander has a great post on using Spark + sparklyr + h2o + rsparkling to build a neural net to study arrhythmia of the heart:

The data I am using to demonstrate the building of neural nets is the arrhythmia dataset from UC Irvine’s machine learning database. It contains 279 features from ECG heart rhythm diagnostics and one output column. I am not going to rename the feature columns because they are too many and the descriptions are too complex. Also, we don’t need to know specifically which features we are looking at for building the models. For a description of each feature, see https://archive.ics.uci.edu/ml/machine-learning-databases/arrhythmia/arrhythmia.names. The output column defines 16 classes: class 1 samples are from healthy ECGs, the remaining classes belong to different types of arrhythmia, with class 16 being all remaining arrhythmia cases that didn’t fit into distinct classes.

Very interesting post.

Comments closed

Prophet

Published 2017-03-01 by Kevin Feasel

Rodrigo Agundez looks at Prophet, Facebook’s new API for store sales forecasting:

The data is of a current client, therefore I won’t be disclosing any details of it.

Our models make forecasts for different shops of this company. In particular I took 2 shops, one which contains the easiest transactions to predict from all shops, and another with a somewhat more complicated history.

The data consists of real transactions since 2014. Data is daily with the target being the number of transactions executed during a day. There are missing dates in the data when the shop closed, for example New Year’s day and Christmas.

The holidays provided to the API are the same I use in our model. They contain from school vacations or large periods, to single holidays like Christmas Eve. In total, the data contains 46 different holidays.

It looks like Prophet has some limitations but can already make some nice predictions.

Comments closed

Getting M Code From Power Query

Published 2017-03-01 by Kevin Feasel

Chris Webb shows that you can copy a query and paste into Notepad to get the M code for that query:

Two years ago I blogged about a method to export all the M code for all of your queries in Power Query using the Send A Frown button – useful if you need the code for documentation purposes. This trick doesn’t work with Power BI Desktop, unfortunately, but the good news is that there’s a better way to do this now in Power Query and Power BI Desktop using copy/paste. It’s pretty simple really: when you copy a query from the Power Query or Power BI Desktop Query Editor you can not only paste the query to another Query Editor (pasting from Power Query to Power BI and vice versa works too) but you can also paste the query to a text editor like Notepad and get the M code for the query.

Read on for more.

Comments closed

Broadcast Nested Loop Joins In Spark

Published 2017-03-01 by Kevin Feasel

Reynold Xin, et al, debug an interesting test case:

While we were pretty happy with the improvement, we noticed that one of the test cases in Databricks started failing. To simulate a hanging query, the test case performed a cross join to produce 1 trillion rows.
spark.range(1000 * 1000).crossJoin(spark.range(1000 * 1000)).count()
On a single node, we expected this query would run infinitely or “hang.” To our surprise, we started seeing this test case failing nondeterministically because sometimes it completed on our Jenkins infrastructure in less than one second, the time limit we put on this query.

You’re not going to get this performance against a real data set, but it was interesting reading their troubleshooting notes.

Comments closed

Docker GUIs

Published 2017-03-01 by Kevin Feasel

Andrew Pruski demos Portainer, a user interface for Docker:

So let’s run through the setup and then look at the system. There’s a couple of pre-requisities to this I’m afraid, the first one is that you must setup remote administration using TLS on the Docker host that you want to manage via Portainer. I’ve detailed how to do this here.

Also, Portainer doesn’t support managing a local Docker Engine running on Windows so the way I’ve set it up is to run Portainer locally on Windows 10 and then point it at a server running the Docker Engine I want to manage. This means that you’ll need to install Docker locally, you can do that here.

Of course you get it in a container. How else would you get the container interface…?

It looks like a nice UI for getting started with Docker.

Comments closed

Invalid Database IDs In Extended Events

Published 2017-03-01 by Kevin Feasel

Jonathan Kehayias looks at a strange scenario:

I was recently emailed with a question about tracking database usage information using Extended Events and the person wanted to know why the event session was returning invalid database_id’s as a part of what it was collecting. The funny thing is that I’ve seen this before during demo’s when I have a very specific concurrent workload running for what I am demonstrating and I used to make sure to do this exact thing so I could explain it during Immersion Events. The TLDR answer to this is that Extended Events isn’t returning an invalid database_id at the time that the event occurs, the database_id just isn’t valid when the data is being reviewed or consumed. How could that be you wonder?

The explanation turns out to be pretty easy, though not necessarily intuitive.

Comments closed

Instance Configuration With dbatools

Published 2017-03-01 by Kevin Feasel

Rob Sewell has an interesting post on cross-platform configuration using dbatools in Powershell:

This weekend I set up some SQL vNext virtual machines, two on Windows and one on Linux so that I could test some scenarios and build an availability group.

IMPORTANT NOTE :- The names of dbatools commands with a Sql prefix WILL CHANGE in a later release of dbatools. dbatools will use Dba throughout in the future as the sqlserver PowerShell module uses the Sql prefix

I used PowerShell version 5.1.14393.693 and SQL Server vNext CTP 1.3 running on Windows Server 2016 and Ubuntu 16.04 in this blog post

There’s some fancy footwork in this post; if you’re looking for ways to compare instance configurations (specifically, sp_configure settings), check it out.

Comments closed

Replication With SQL_Variant Datatypes

Published 2017-03-01 by Kevin Feasel

Kevin Eckart ran into an interesting issue when trying to set up transactional replication on a table with a sql_variant datatype:

I recently tasked with setting up Transactional Replication in SQL 2008 R2. While this in and of itself isn’t necessarily complicated, I did run into an issue that kept the initial snapshot from being created. One of the articles (tables) in the publication had two columns that were defined with a SQL_Variant type and the snapshot agent could not convert those columns to create the snapshot. I tried the various column convert settings in the article properties, but they did not help.

Read on for the answer.

Comments closed

ggraph

Published 2017-02-28 by Kevin Feasel

David Smith has a post on a new R package to display graphs:

A graph, a collection of nodes connected by edges, is just data. Whether it’s a social network (where nodes are people, and edges are friend relationships), or a decision tree (where nodes are branch criteria or values, and edges decisions), the nature of the graph is easily represented in a data object. It might be represented as a matrix (where rows and columns are nodes, and elements mark whether an edge between them is present) or as a data frame (where each row is an edge, with columns representing the pair of connected nodes).

The trick comes in how you represent a graph visually; there are many different options each with strengths and weaknesses when it comes to interpretation. A graph with many nodes and edges may become an unintelligible hairball without careful arrangement, and including directionality or other attributes of edges or nodes can reveal insights about the data that wouldn’t be apparent otherwise. There are many R packages for creating and displaying graphs (igraph is a popular one, and this CRAN task view lists many others) but that’s a problem in its own right: an important part of the data exploration process is trying and comparing different visualization options, and the myriad packages and interfaces makes that process difficult for graph data.

Click through for more information as well as a mesmerizing animated image.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31