Press "Enter" to skip to content

Day: January 19, 2021

Grouping Data with Spark

Ed Elliott has two quick examples of grouping data in Spark:

I have been playing around with the new Azure Synapse Analytics, and I realised that this is an excellent opportunity for people to move to Apache Spark. Synapse Analytics ships with .NET for Apache Spark C# support many people will surely try to convert T-SQL code or SSIS code into Apache Spark code. I thought it would be awesome if there were a set of examples of how to do something in T-SQL, then translated into how to do that same thing in Spark SQL and the Spark DataFrame API in C#.

Click through for the first example, GROUP BY.

Comments closed

T-SQL Tuesday 134 Roundup

James McGillivray summarizes the results from T-SQL Tuesday #134:

When I volunteered to host a T-SQL Tuesday, I had a very different topic in mind. However, the incredible events of the last year, and in particular, the immense pressure that my wife faced at work, made me realise how important it is to have ways to take breaks, both mental and physical. And while we were away in December, and we both recharged, I thought it would make a good topic for this event. It was wonderful to see the response from the #sqlfamily to my invitation, and by my count 29 different people contributed to the blog party.

I’ve tried to group posts with similar themes in this summary, and since some posts fall into multiple categories, I may mention a single post more than once. Links on names point to Twitter handles, links on descriptions point to the respective blog posts.

Click through for a rather large roundup.

Comments closed

The Concatenation Operator

Hugo Kornelis explains what the Concatenation operator does:

The Concatenation operator reads and returns all rows from all its inputs, in order, and without modification.

This operator is most commonly used to execute queries that use UNION or UNION ALL. In the former case, other operators are required to remove the duplicates, because Concatenation doesn’t provide that functionality. You may also find the Concatenation operator in queries on partitioned views.

Read on to see the algorithm and lots of details about the operator.

Comments closed

Using DEFINE COLUMN in DAX Queries

Marco Russo and Alberto Ferrari show off some new DAX syntax:

Introduced in December 2020, the DEFINE COLUMN statement lets you define a calculated column local to a query. The column is not persisted in the model, it exists only for the lifetime of the query. Apart from that, it is a calculated column in every sense of the term.

The extension of DAX with the capability to define calculated columns local to a query is needed in order to support composite models over Analysis Services (AS). There are no limitations in the use of the feature. For this reason, you can take advantage of local columns in any DAX query. We refer to calculated columns defined in a query as query calculated columns, or query columns for short.

Click through to see it in action. I like this idea a lot, though do read their note regarding performance, contrasting it with ADDCOLUMNS.

Comments closed