Press "Enter" to skip to content

Day: June 18, 2020

The Key Concepts of Azure Synapse Analytics

Simon Whiteley takes a look at what Azure Synapse Analytics really is:

You might have seen that I’ve been pretty busy recently, digging into the new Azure Synapse Analytics preview, announced back at Microsoft Build 2020. I’ve explored the spark engine, SQL serverless/On-Demand and various other bits… but I’m still getting the same question of “Cool!…. but what actually is it?”. One of the problems here is that Azure SQL Data Warehouse was rebranded as “Azure Synapse Analytics”… but it’s not the same as the full workspace. Having two products, both talked about in Marketing, one generally available, one still in preview – it’s no wonder people are still confused!

Simon also has a video, which I recommend so that you can enjoy the funny way he pronounces “Synapse.” That said, next time I’m in the UK, it’ll be just as fair for someone to point out the funny way I pronounce “Synapse.” Also, you should watch the video because Simon knows the topic cold and does a great job of explaining things.

Comments closed

Window Functions in Spark SQL

Juoko Virtanen walks us through window functions in Spark SQL:

When you think of windows in Spark you might think of Spark Streaming, but windows can be used on regular DataFrames. Window functions calculate an output value for every row of a DataFrame based on a group of rows. I have been working on optimizing some Spark code and have noticed a few places where the use of a window function eliminates the need for a join and speeds up the code. A common pattern where a window can be used to replace a join is when an aggregation is performed on a DataFrame and then the DataFrame resulting from the aggregation is joined to the original DataFrame. Let’s take a look at an example.

Read on for a few examples using the Scala flavor of Spark SQL.

Comments closed

Three Rants for the Price of One

Nate Johnson is a few months early for Festivus:

Another thing. This isn’t a “new update” thing; this is a long-standing “Oh my lord I can’t believe they DESIGNED it this way, WHAT were they THINKING?!?!” thing. Files. Sending & receiving files. I get an picture, like a screenshot, from a user (that’s NOT a OneDrive link, because that’s a whole ‘nother can-of-worms). I click to download it. It goes.. where?

OH RIGHT. It goes to my ‘Downloads’ folder. That dumpster-fire, where everything from anywhere goes into, and nobody keeps it organized, and nobody knows how to find anything unless they’ve gotten smart enough to sort by Date Modified descending. Right, that.

I can’t say I disagree about any of what Nate covers.

Comments closed

Uses of ROLLUP and CUBE

Greg Dodd wraps up a series on ROLLUP and CUBE:

I don’t think ROLLUP has a great use in the real world (feel free to comment below if you have a real use for it). The best I can see is that you have SQL Server emailing the results of a query to someone and they want subtotals. Adding totals can be done by any reporting app, probably more efficiently than SQL will do it, and it’s probably easier to debug there as well.

I do, however, think CUBE has an amazing purpose.

The funny thing is, I disagree and find ROLLUP a lot more useful than CUBE because ROLLUP gives me aggregations along a hierarchy. If I set up my GROUP BY clause correctly, I can see the results of my Product Category –> Product Subcategory –> Product hierarchy quite easily. Yeah, I can also do this in reporting, but if I just need a quick printout or need to do something special with the subtotals, ROLLUP is great. I definitely use GROUPING SETS more than ROLLUP and CUBE combined, but I’ve had production code useROLLUP and, to the best of my recollection, never have I created a procedure or report which is based off of CUBE.

Comments closed

Differences between Tableau and Power BI Data Models

David Eldersveld lays out a few differences between Tableau and Power BI’s data models:

Prior to 2020.2, Tableau had a physical data model that allowed joins between tables. This layer is still present, and it is similar to Power BI’s merge capability within Power Query to join multiple tables into a single table.

With the introduction of the logical model in Tableau 2020.2, a single logical table may consist of one or more physical tables. Two or more logical tables may be related to each other. This approach shares many similarities with Microsoft’s tabular model, but there are currently some core differences that may impact how you approach data modeling in one tool versus another. It’s not always going to be a straightforward scenario to apply knowledge of how to build a data model in one product to building it in the other.

NOTE: This post considers some core differences between Tableau and Power BI modeling as of June 2020. It does not go into detail on announced roadmap features such as Power BI’s enhanced composite models coming later in 2020, which will allow scenarios like combining a “live connection” composite model with additional data sources, or even multiple live connection models.

Read on for four differences David has found.

Comments closed


Randolph West digs into what a UNIQUEIDENTIFIER looks like in storage:

Let’s take our example GUID again: CC05E271-BACF-4472-901C-957568484405. If we look at the table storage for this row, we’ll find it persisted as follows: 0x71E205CCCFBA7244901C957568484405 (alternating octets are highlighted in bold).

If you haven’t been following this series, this is a good place to remind you that SQL Server stores data using little-endian sequencing on disk and in memory. In the vast majority of cases, bytes are stored in reverse order because that’s how Intel CPUs like their data. However GUIDs are persisted slightly differently because of their sort order.

This is probably the most GUIDs I’ve seen in a single blog post.

Comments closed

SQL Server Assessment Extension for Azure Data Studio

Ebru Ersan announces a new extension for Azure Data Studio:

SQL Server Assessment Extension for Azure Data Studio provides a user interface for evaluating your SQL Server instances and databases for best practices. It uses SQL Assessment API to achieve this. In this preview version, you can:

– Assess a SQL Server or Azure SQL Managed Instance and its databases with built-in rules (Invoke Assessment)

– Get a list of all built-in rules applicable to an instance and its databases (View applicable rules)

– Export assessment results and list of applicable rules as script to further store it in a SQL table

It’s in preview status and requires version 1.19.0 (this month’s version) of Azure Data Studio at a minimum.

Comments closed

Improving the system_health Session

Aaron Bertrand takes a look at a pre-configured tool and gives us ways to make it better:

I was recently trying to troubleshoot a SQL Server replication-related deadlock that our monitoring tool didn’t capture, and tried to find information about it in the system_health Extended Events session. With the default retention settings and the amount of noise contributed by security ring buffer events, I quickly discovered that the session only had data going back less than two hours. Meaning unless I started investigating an event immediately after it happened, all evidence had rolled out forever.

Read on for Aaron’s guidance around this. The natural next step is to build out your own extended events which capture what you need.

Comments closed