Press "Enter" to skip to content

Author: Kevin Feasel

Error Handling In Scala

Manish Mishra gives a few examples of how to handle errors in Scala:

Try[T] is another construct to capture the success or a failure scenarios. It returns a value in both cases. Put any expression in Try and it will return Success[T] if the expression is successfully evaluated and will return Failure[T] in the other case meaning you are allowed to return the exception as a value. However with one restriction that it in case of failures it will only return Throwable types:

def validateZipCode(zipCode:String): Try[Int] = Try(zipCode.toInt)

But Throwing an exception doesn’t make much sense here since it is not much of a calculation. Although we can take this example to understand the use case. If the given string is not a number, it will be a failure. The value from the Try can be extracted in same as Option. It can be matched

As you write more complicated Spark operations, handling errors becomes critical.

Comments closed

An Introduction To seplyr

John Mount guest blogs on the Revolutions blog about seplyr:

seplyr is an R package that supplies improved standard evaluation interfaces for many common data wrangling tasks.

The core of seplyr is a re-skinning of dplyr‘s functionality to seplyr conventions (similar to how stringr re-skins the implementing package stringi).

Read on for a couple of examples of where seplyr can make it easier for you to program with than dplyr.

Comments closed

Creating Dynamic Pivot Tables

Ben Richardson shows how to use dynamic SQL to create pivot tables with arbitrary numbers of pivot elements:

The headings of the columns are the individual values inside the city column. We specified these values inside the pivot operator in our query.

The most tedious part of creating pivot tables is specifying the values for the column headings manually. This is the part that is prone to most errors, particularly if the data in your online data source changes. We can not be sure that the values we specified in the pivot operator will remain in the database until we create this pivot table next time.

For instance, in our script, we specified London, Liverpool, Leeds and Manchester as values for headings of our pivot table. These values existed in the Сity column of the student table. What if somehow one or more of these values are deleted or updated? In such cases, null will be returned.

A better approach would be to create a dynamic query that will return a full set of values from the column from which you are trying to generate your pivot table.

Click through to see how to build this.

Comments closed

Selecting Specific Characters In M

Chris Webb points out a new function in Power BI:

It’s very easy to use: the first parameter takes a text value, the second parameter takes either a text value containing a single text value or a list of single characters, and it returns the text from the first parameter minus all characters that are not in the second parameter. For example, the expression:

Text.Select("Hello", "l")

…returns the text value “ll”

Click through to see an example of how you can use this to filter out punctuation and other unwanted characters.

Comments closed

Reducing Reads In Queries

Bert Wagner has a few tips for improving query performance by reducing the number of reads:

If SQL Server thinks it only is going to read 1 row of data, but instead needs to read way more rows of data, it might choose a poor execution plan which results in more reads.

You might get a suboptimal execution plan like above for a variety of reasons, but here are the most common ones I see:

If you had a query that previously ran fine but doesn’t anymore, you might be able to utilize Query Store to help identify why SQL Server started generating suboptimal plans.

Click through for a few more ideas as well.

Comments closed

Capsule Neural Networks

Saurabh Kulshrestha covers the topic of capsule neural networks:

This is the problem with Convolutional Neural Networks as well. CNN is good at detecting features, but will wrongly activate the neuron for face detection. This is because it is less effective at exploring the spatial relationships among features.

A simple CNN model can extract the features for nose, eyes and mouth correctly but will wrongly activate the neuron for the face detection. Without realizing the mis-match in spatial orientation and size, the activation for the face detection will be too high.

Read on to see how capsule networks can help solve issues with convolutional neural networks.

Comments closed

Matrix Transposition In T-SQL

Phil Factor has some fun transposing a matrix using T-SQL:

What I’m doing is simply converting the table into its JSON form, and then using this to create a table using the multi-row VALUES  syntax which paradoxically allows expressions. The expression I’m using is JSON_Value, which allows me do effectively dictate the source within the table, via that JSON Path expression, and the destination. As it is an expression, I can do all sorts of manipulation as well as a transpose.  I could, if I wanted, (in SQL 2017)provide that path parameter as a variable. This sort of technique can be used for several other reporting purposes, and it is well-worth experimenting with it because it is so versatile.

That is not at all what I would have thought up; very interesting approach.  I’d probably just be lazy and shell out to R Services.

Comments closed

Creating An Azure Chat Bot

Dustin Ryan shows how to build a QnA bot:

After you’ve created your knowledge base you can then edit and update your knowledge base. There’s a few different ways to update your knowledge.

a. Manually edit the knowledge base directly within QnAMaker.ai. You can do this by directly editing the questions by modifying the text of your knowledge base.

b. Edit the source of your knowledge base. Click the Settings tab on the left to edit the URL of your FAQs or upload a new document.

Building a bot is pretty easy, and Dustin shows you just how to do it.

Comments closed

Data Lake Archive Tier

Ust Oldfeld looks at an important part of a data lake:

The Archive access tier in blob storage was made generally available today (13th December 2017) and with it comes the final piece in the puzzle to archiving data from the data lake.

Where Hot and Cool access tiers can be applied at a storage account level, the Archive access tier can only be applied to a blob storage container. To understand why the Archive access tier can only be applied to a container, you need to understand the features of the Archive access tier. It is intended for data that has no or low SLAs for availability within an organisation and the data is stored offline (Hot and Cool access tiers are online). Therefore, it can take up to 15 hours for data to be made online and available. Brining Archive data online is a process called rehydration (fitting for the data lake). If you have lots of blob containers in a storage account, you can archive them and rehydrate them as required, rather than having to rehydrate the entire storage account.

Read on for more details, including a pattern for archiving data lake data.

Comments closed