Delta Lake is basically a compute layer that would sit on top of your existing On Prem HDFS cluster, your favourite Cloud storage or even run it locally on your laptop(Best part)! Data is stored on the above-mentioned storage as versioned Parquet files. Any data that is read using Spark can be used to read and write with Delta Lake. Delta lakes provides an unified platform to support both Batch Processing and Stream processing workloads on a single platform.
Read on to understand just how useful this is.
Microsoft Graph is a comprehensive framework for accessing data in various online Microsoft services, including Azure Active Directory (AAD), Office 365, OneDrive, Teams, and more. AzureGraph is an R package that provides a simple R6-based interface to the Graph REST API, and is the companion package to AzureRMR and AzureAuth.
Currently, AzureGraph aims to provide an R interface only to the AAD part, with a view to supporting R interoperability with Azure: registered apps and service principals, users and groups. Like AzureRMR, it could potentially be extended to support other services.
Just to clarify, this is like Facebook Graph API for Azure components, not a graph database that you can store your own data in.
Round number analysis focuses on the final digits of a number, specifically looking at how many 0s there are at the end after rounding off the decimal. We’re interested in the percentage of records in each rank (or Type). For example, $10,820 is a Type 1 because there is a single 0 at the end. $8,300 is a Type 2 because there are two 0s at the end. $11,000 is a Type 3. And $9009 is a Type 0 because there are no 0s at the end.
With that in mind, how do we do this kind of calculation in SQL Server? With one of my favorite operators:
Click through to watch me look at round numbers, last digits, and first digits.
Rohan Kumar of Microsoft announced .NET for Apache Spark, making Apache Spark accessible to .NET developers – Git Hub
I’m very happy that the Spark for .NET team added F# support. Spark is so much nicer when using functional programming.
Even though this query reads the whole clustered index to get the
Benefactorrows, the total number of logical reads is still smaller than the seek/key lookup pattern seen in the combined query with IN(). This UNION ALL version gives SQL Server the ability to build a hybrid execution plan, combining two different techniques to generate a plan with fewer overall reads.
Click through for the example.
With the 2019 NFL Draft upon us, of course I wanted to visualize some NFL draft data in Power BI. So I put together this interesting set of visualizations based on some data I scraped from Pro Football Reference. The dataset includes drafts from 1936 to 2019 including picks through round 6 where applicable. I’ll update the dataset as the remaining rounds of the 2019 draft are completed. So feel free to take a look, interact with the dashboards, and let me know if you have any questions.
Click through for the dashboard.
A user recently asked me a question on my previous blog post (Setting Variables in Azure Data Factory Pipelines) about possibility extracting the first element of a variable if this variable is set of elements (array).
So as a spoiler alert, before writing a blog post and adding a bit more clarity to the existing Microsoft ADF documentation, here is a quick answer to this question.
You’ll have to click through even for the quick answer.
Sometimes, when working with DAX, you might get the following error:
The expression refers to multiple columns. Multiple columns cannot be converted to a scalar value.
This error occurs whenever the DAX engine was expecting a single value, or scalar, and instead received a table of values instead. This is an easy error to make because many DAX functions, such as FILTER, SUMMARIZE and ALL, return table values.
Eugene lays out when each scenario occurs, so check it out.