Press "Enter" to skip to content

Author: Kevin Feasel

An Introduction To seplyr

John Mount guest blogs on the Revolutions blog about seplyr:

seplyr is an R package that supplies improved standard evaluation interfaces for many common data wrangling tasks.

The core of seplyr is a re-skinning of dplyr‘s functionality to seplyr conventions (similar to how stringr re-skins the implementing package stringi).

Read on for a couple of examples of where seplyr can make it easier for you to program with than dplyr.

Comments closed

Creating Dynamic Pivot Tables

Ben Richardson shows how to use dynamic SQL to create pivot tables with arbitrary numbers of pivot elements:

The headings of the columns are the individual values inside the city column. We specified these values inside the pivot operator in our query.

The most tedious part of creating pivot tables is specifying the values for the column headings manually. This is the part that is prone to most errors, particularly if the data in your online data source changes. We can not be sure that the values we specified in the pivot operator will remain in the database until we create this pivot table next time.

For instance, in our script, we specified London, Liverpool, Leeds and Manchester as values for headings of our pivot table. These values existed in the Сity column of the student table. What if somehow one or more of these values are deleted or updated? In such cases, null will be returned.

A better approach would be to create a dynamic query that will return a full set of values from the column from which you are trying to generate your pivot table.

Click through to see how to build this.

Comments closed

Selecting Specific Characters In M

Chris Webb points out a new function in Power BI:

It’s very easy to use: the first parameter takes a text value, the second parameter takes either a text value containing a single text value or a list of single characters, and it returns the text from the first parameter minus all characters that are not in the second parameter. For example, the expression:

Text.Select("Hello", "l")

…returns the text value “ll”

Click through to see an example of how you can use this to filter out punctuation and other unwanted characters.

Comments closed

Reducing Reads In Queries

Bert Wagner has a few tips for improving query performance by reducing the number of reads:

If SQL Server thinks it only is going to read 1 row of data, but instead needs to read way more rows of data, it might choose a poor execution plan which results in more reads.

You might get a suboptimal execution plan like above for a variety of reasons, but here are the most common ones I see:

If you had a query that previously ran fine but doesn’t anymore, you might be able to utilize Query Store to help identify why SQL Server started generating suboptimal plans.

Click through for a few more ideas as well.

Comments closed

Capsule Neural Networks

Saurabh Kulshrestha covers the topic of capsule neural networks:

This is the problem with Convolutional Neural Networks as well. CNN is good at detecting features, but will wrongly activate the neuron for face detection. This is because it is less effective at exploring the spatial relationships among features.

A simple CNN model can extract the features for nose, eyes and mouth correctly but will wrongly activate the neuron for the face detection. Without realizing the mis-match in spatial orientation and size, the activation for the face detection will be too high.

Read on to see how capsule networks can help solve issues with convolutional neural networks.

Comments closed

Matrix Transposition In T-SQL

Phil Factor has some fun transposing a matrix using T-SQL:

What I’m doing is simply converting the table into its JSON form, and then using this to create a table using the multi-row VALUES  syntax which paradoxically allows expressions. The expression I’m using is JSON_Value, which allows me do effectively dictate the source within the table, via that JSON Path expression, and the destination. As it is an expression, I can do all sorts of manipulation as well as a transpose.  I could, if I wanted, (in SQL 2017)provide that path parameter as a variable. This sort of technique can be used for several other reporting purposes, and it is well-worth experimenting with it because it is so versatile.

That is not at all what I would have thought up; very interesting approach.  I’d probably just be lazy and shell out to R Services.

Comments closed

Creating An Azure Chat Bot

Dustin Ryan shows how to build a QnA bot:

After you’ve created your knowledge base you can then edit and update your knowledge base. There’s a few different ways to update your knowledge.

a. Manually edit the knowledge base directly within QnAMaker.ai. You can do this by directly editing the questions by modifying the text of your knowledge base.

b. Edit the source of your knowledge base. Click the Settings tab on the left to edit the URL of your FAQs or upload a new document.

Building a bot is pretty easy, and Dustin shows you just how to do it.

Comments closed

Data Lake Archive Tier

Ust Oldfeld looks at an important part of a data lake:

The Archive access tier in blob storage was made generally available today (13th December 2017) and with it comes the final piece in the puzzle to archiving data from the data lake.

Where Hot and Cool access tiers can be applied at a storage account level, the Archive access tier can only be applied to a blob storage container. To understand why the Archive access tier can only be applied to a container, you need to understand the features of the Archive access tier. It is intended for data that has no or low SLAs for availability within an organisation and the data is stored offline (Hot and Cool access tiers are online). Therefore, it can take up to 15 hours for data to be made online and available. Brining Archive data online is a process called rehydration (fitting for the data lake). If you have lots of blob containers in a storage account, you can archive them and rehydrate them as required, rather than having to rehydrate the entire storage account.

Read on for more details, including a pattern for archiving data lake data.

Comments closed

R In Linux For Windows

David Smith shows how to install and use R in the Windows Subsystem for Linux:

R has been available for Windows since the very beginning, but if you have a Windows machine and want to use R within a Linux ecosystem, that’s easy to do with the new Fall Creator’s Update (version 1709). If you need access to the gcc toolchain for building R packages, or simply prefer the bash environment, it’s easy to get things up and running.

Once you have things set up, you can launch a bash shell and run R at the terminal like you would in any Linux system. And that’s because this is a Linux system: the Windows Subsystem for Linux is a complete Linux distribution running within Windows. This page provides the details on installing Linux on Windows, but here are the basic steps you need and how to get the latest version of R up and running within it.

Click through for a quick tutorial.

Comments closed