ACID Transactions on Spark

Achilleus explains one of the big announcements at Spark+AI Summit 2019:

Delta Lake is basically a compute layer that would sit on top of your existing On Prem HDFS cluster, your favourite Cloud storage or even run it locally on your laptop(Best part)! Data is stored on the above-mentioned storage as versioned Parquet files. Any data that is read using Spark can be used to read and write with Delta Lake. Delta lakes provides an unified platform to support both Batch Processing and Stream processing workloads on a single platform.

Read on to understand just how useful this is.

AzureGraph: Microsoft Graph in R

Kevin Feasel

2019-05-01

Cloud, R

Hong Ooi takes us through AzureGraph:

Microsoft Graph is a comprehensive framework for accessing data in various online Microsoft services, including Azure Active Directory (AAD), Office 365, OneDrive, Teams, and more. AzureGraph is an R package that provides a simple R6-based interface to the Graph REST API, and is the companion package to AzureRMR and AzureAuth.

Currently, AzureGraph aims to provide an R interface only to the AAD part, with a view to supporting R interoperability with Azure: registered apps and service principals, users and groups. Like AzureRMR, it could potentially be extended to support other services.

Just to clarify, this is like Facebook Graph API for Azure components, not a graph database that you can store your own data in.

Analyzing Data by the Numbers

I am close to wrapping up my series on forensic accounting techniques:

Round number analysis focuses on the final digits of a number, specifically looking at how many 0s there are at the end after rounding off the decimal. We’re interested in the percentage of records in each rank (or Type). For example, $10,820 is a Type 1 because there is a single 0 at the end. $8,300 is a Type 2 because there are two 0s at the end. $11,000 is a Type 3. And $9009 is a Type 0 because there are no 0s at the end.

With that in mind, how do we do this kind of calculation in SQL Server? With one of my favorite operators: APPLY.

Click through to watch me look at round numbers, last digits, and first digits.

Spark+AI Summit 2019 Announcements

Victoria Holt creates a roundup of Spark+AI Summit 2019 announcements:

Rohan Kumar  of Microsoft announced .NET for Apache Spark, making Apache Spark accessible to .NET developers – Git Hub

I’m very happy that the Spark for .NET team added F# support. Spark is so much nicer when using functional programming.

Breaking Up Queries with UNION ALL

Bert Wagner takes us through a scenario where it can be faster to combine queries with UNION ALL rather than using IN:

Even though this query reads the whole clustered index to get the Benefactor rows, the total number of logical reads is still smaller than the seek/key lookup pattern seen in the combined query with IN(). This UNION ALL version gives SQL Server the ability to build a hybrid execution plan, combining two different techniques to generate a plan with fewer overall reads.

Click through for the example.

Power BI: NFL Draft History

Dustin Ryan gives us a Power BI dashboard covering NFL draft history:

With the 2019 NFL Draft upon us, of course I wanted to visualize some NFL draft data in Power BI. So I put together this interesting set of visualizations based on some data I scraped from Pro Football Reference. The dataset includes drafts from 1936 to 2019 including picks through round 6 where applicable. I’ll update the dataset as the remaining rounds of the 2019 draft are completed. So feel free to take a look, interact with the dashboards, and let me know if you have any questions.

Click through for the dashboard.

Extracting the First Element from an Array in ADF

Kevin Feasel

2019-05-01

Cloud, ETL

Rayis Imayev shows how you can find the first element in an array using Azure Data Factory:

A user recently asked me a question on my previous blog post (Setting Variables in Azure Data Factory Pipelines) about possibility extracting the first element of a variable if this variable is set of elements (array).

So as a spoiler alert, before writing a blog post and adding a bit more clarity to the existing Microsoft ADF documentation, here is a quick answer to this question.

You’ll have to click through even for the quick answer.

DAX Error: Multiple Columns and Scalar Values

Eugene Meidinger walks us through an error message in DAX:

Sometimes, when working with DAX, you might get the following error:

The expression refers to multiple columns. Multiple columns cannot be converted to a scalar value.

This error occurs whenever the DAX engine was expecting a single value, or scalar, and instead received  a table of values instead. This is an easy error to make because many DAX functions, such as FILTERSUMMARIZE and ALL, return table values. 

Eugene lays out when each scenario occurs, so check it out.

Categories

May 2019
MTWTFSS
« Apr  
 12345
6789101112
13141516171819
20212223242526
2728293031