2019-05-01 – Curated SQL

ACID Transactions on Spark

Published 2019-05-01 by Kevin Feasel

Achilleus explains one of the big announcements at Spark+AI Summit 2019:

Delta Lake is basically a compute layer that would sit on top of your existing On Prem HDFS cluster, your favourite Cloud storage or even run it locally on your laptop(Best part)! Data is stored on the above-mentioned storage as versioned Parquet files. Any data that is read using Spark can be used to read and write with Delta Lake. Delta lakes provides an unified platform to support both Batch Processing and Stream processing workloads on a single platform.

Read on to understand just how useful this is.

Comments closed

AzureGraph: Microsoft Graph in R

Published 2019-05-01 by Kevin Feasel

Hong Ooi takes us through AzureGraph:

Microsoft Graph is a comprehensive framework for accessing data in various online Microsoft services, including Azure Active Directory (AAD), Office 365, OneDrive, Teams, and more. AzureGraph is an R package that provides a simple R6-based interface to the Graph REST API, and is the companion package to AzureRMR and AzureAuth.
Currently, AzureGraph aims to provide an R interface only to the AAD part, with a view to supporting R interoperability with Azure: registered apps and service principals, users and groups. Like AzureRMR, it could potentially be extended to support other services.

Just to clarify, this is like Facebook Graph API for Azure components, not a graph database that you can store your own data in.

Comments closed

Analyzing Data by the Numbers

Published 2019-05-01 by Kevin Feasel

I am close to wrapping up my series on forensic accounting techniques:

Round number analysis focuses on the final digits of a number, specifically looking at how many 0s there are at the end after rounding off the decimal. We’re interested in the percentage of records in each rank (or Type). For example, $10,820 is a Type 1 because there is a single 0 at the end. $8,300 is a Type 2 because there are two 0s at the end. $11,000 is a Type 3. And $9009 is a Type 0 because there are no 0s at the end.
With that in mind, how do we do this kind of calculation in SQL Server? With one of my favorite operators: APPLY.

Click through to watch me look at round numbers, last digits, and first digits.

Comments closed

Spark+AI Summit 2019 Announcements

Published 2019-05-01 by Kevin Feasel

Victoria Holt creates a roundup of Spark+AI Summit 2019 announcements:

Rohan Kumar of Microsoft announced .NET for Apache Spark, making Apache Spark accessible to .NET developers – Git Hub

I’m very happy that the Spark for .NET team added F# support. Spark is so much nicer when using functional programming.

Comments closed

Breaking Up Queries with UNION ALL

Published 2019-05-01 by Kevin Feasel

Bert Wagner takes us through a scenario where it can be faster to combine queries with UNION ALL rather than using IN:

Even though this query reads the whole clustered index to get the Benefactor rows, the total number of logical reads is still smaller than the seek/key lookup pattern seen in the combined query with IN(). This UNION ALL version gives SQL Server the ability to build a hybrid execution plan, combining two different techniques to generate a plan with fewer overall reads.

Click through for the example.

Comments closed

Power BI: NFL Draft History

Published 2019-05-01 by Kevin Feasel

Dustin Ryan gives us a Power BI dashboard covering NFL draft history:

With the 2019 NFL Draft upon us, of course I wanted to visualize some NFL draft data in Power BI. So I put together this interesting set of visualizations based on some data I scraped from Pro Football Reference. The dataset includes drafts from 1936 to 2019 including picks through round 6 where applicable. I’ll update the dataset as the remaining rounds of the 2019 draft are completed. So feel free to take a look, interact with the dashboards, and let me know if you have any questions.

Click through for the dashboard.

Comments closed

Extracting the First Element from an Array in ADF

Published 2019-05-01 by Kevin Feasel

Rayis Imayev shows how you can find the first element in an array using Azure Data Factory:

A user recently asked me a question on my previous blog post (Setting Variables in Azure Data Factory Pipelines) about possibility extracting the first element of a variable if this variable is set of elements (array).

So as a spoiler alert, before writing a blog post and adding a bit more clarity to the existing Microsoft ADF documentation, here is a quick answer to this question.

You’ll have to click through even for the quick answer.

Comments closed

DAX Error: Multiple Columns and Scalar Values

Published 2019-05-01 by Kevin Feasel

Eugene Meidinger walks us through an error message in DAX:

Sometimes, when working with DAX, you might get the following error:
The expression refers to multiple columns. Multiple columns cannot be converted to a scalar value.
This error occurs whenever the DAX engine was expecting a single value, or scalar, and instead received a table of values instead. This is an easy error to make because many DAX functions, such as FILTER, SUMMARIZE and ALL, return table values.

Eugene lays out when each scenario occurs, so check it out.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: May 1, 2019