Press "Enter" to skip to content

Author: Kevin Feasel

Breaking out of Azure Data Factory ForEach Activities

Andy Leonard is planning a jailbreak:

“What if something fails inside the ForEach activity’s inner activities, Andy?”

That is an excellent question! I’m glad you asked. The answer is: The ForEach activity continues iterating.

I can now hear some of you asking…

“What if I want the ForEach activity to fail when an inner activity fails, Andy?”

Another excellent question, and you’ve found a post discussing one way to “break out” of a ForEach activity’s iteration.

Read on for the process.

Comments closed

Securing the Data Prep Area

Tim Mitchell explains why you should limit access to your staging area:

First things first, let’s define what a data prep area is. Data preparation (prep) is a common phase of extract, transform, and load (ETL) operations in which data is temporarily written for cleansing, deduplication, reshaping, or other data modifications. Also sometimes referred to as a landing area or a staging area, this is a common design pattern when moving data from a data store optimized for online transaction processing (OLTP) to a data model more friendly to analytics or reporting.

The data prep area really is a lot like a restaurant kitchen: it’s sometimes chaotic, it’s not consumer friendly, and there is a legitimate risk of consuming half-prepared goods.

Tim lays out why that is, so check it out.

Comments closed

The Origins of SentryOne Plan Explorer

Jason Hall gives us a bit of history:

Greg saw a need in his own work, and I was seeing a need in the field with our customers, for a way to go beyond identifying high-impact queries. DBAs and developers needed a way to tune queries surfaced by SentryOne SQL Sentry’s Top SQL without fiddling with a lot of extra tools to get there. We were already building integration with SQL Server Management Studio (SSMS), which included graphical query plans, so the original thought was to extend that integration from SentryOne with a link that opened plans in SSMS from Top SQL in SQL Sentry.

It seemed like an elegant solution that would allow us to reuse some code, but it wasn’t long before Brooke Philpott discovered that we wouldn’t be able to get what we needed this way. That particular part of SSMS wasn’t exposed to us in the manner we needed. Par for the course, we weren’t going to let that stop us from filling the need. Greg and Brooke dug into the problem to discover a mix of documentation, flow controls, and ingenuity that would provide the foundation for building our own query plan visuals.

Read on for the story and a bit about how the product has morphed through the years.

Comments closed

Azure Data Studio June 2020 Release

Alan Yu announces a new release of Azure Data Studio:

The Data Virtualization extension for Azure Data Studio is now updated with more functionality and a new logo. This update allows you to use the data virtualization wizard to virtualize MongoDB and Teradata data sources into your SQL Server. This new functionality is available for SQL Server 2019 instances running CU5 or later.

To install the extension, search for Data Virtualization in the extension viewlet in Azure Data Studio and click install.

Of course I’m going to clip the bit about PolyBase.

Comments closed

Tips for Reducing Cloud Costs

Manas Narkar has a few tips for reducing the amount of money you spend on cloud infrastructure:

Cost optimization is a continuous process that evolves as you build your solutions. It starts with the initial architecture and continues throughout the entire solution lifecycle. Getting the architecture right will save you a lot of effort and money down the road. Having said that, you should regularly review your architectural approach and selection of services to adapt to business changes.

A fully cost-optimized system optimizes cloud resources without sacrificing performance and efficiency. When it comes to cost optimization, you can use several tools and techniques. The information below lists some of the core principles that you can apply to any cloud solution.

Costing items in the cloud is a good bit different than on-premises, to the point where entirely different architectures succeed.

Comments closed

The Key Concepts of Azure Synapse Analytics

Simon Whiteley takes a look at what Azure Synapse Analytics really is:

You might have seen that I’ve been pretty busy recently, digging into the new Azure Synapse Analytics preview, announced back at Microsoft Build 2020. I’ve explored the spark engine, SQL serverless/On-Demand and various other bits… but I’m still getting the same question of “Cool!…. but what actually is it?”. One of the problems here is that Azure SQL Data Warehouse was rebranded as “Azure Synapse Analytics”… but it’s not the same as the full workspace. Having two products, both talked about in Marketing, one generally available, one still in preview – it’s no wonder people are still confused!

Simon also has a video, which I recommend so that you can enjoy the funny way he pronounces “Synapse.” That said, next time I’m in the UK, it’ll be just as fair for someone to point out the funny way I pronounce “Synapse.” Also, you should watch the video because Simon knows the topic cold and does a great job of explaining things.

Comments closed

Window Functions in Spark SQL

Juoko Virtanen walks us through window functions in Spark SQL:

When you think of windows in Spark you might think of Spark Streaming, but windows can be used on regular DataFrames. Window functions calculate an output value for every row of a DataFrame based on a group of rows. I have been working on optimizing some Spark code and have noticed a few places where the use of a window function eliminates the need for a join and speeds up the code. A common pattern where a window can be used to replace a join is when an aggregation is performed on a DataFrame and then the DataFrame resulting from the aggregation is joined to the original DataFrame. Let’s take a look at an example.

Read on for a few examples using the Scala flavor of Spark SQL.

Comments closed

Uses of ROLLUP and CUBE

Greg Dodd wraps up a series on ROLLUP and CUBE:

I don’t think ROLLUP has a great use in the real world (feel free to comment below if you have a real use for it). The best I can see is that you have SQL Server emailing the results of a query to someone and they want subtotals. Adding totals can be done by any reporting app, probably more efficiently than SQL will do it, and it’s probably easier to debug there as well.

I do, however, think CUBE has an amazing purpose.

The funny thing is, I disagree and find ROLLUP a lot more useful than CUBE because ROLLUP gives me aggregations along a hierarchy. If I set up my GROUP BY clause correctly, I can see the results of my Product Category –> Product Subcategory –> Product hierarchy quite easily. Yeah, I can also do this in reporting, but if I just need a quick printout or need to do something special with the subtotals, ROLLUP is great. I definitely use GROUPING SETS more than ROLLUP and CUBE combined, but I’ve had production code useROLLUP and, to the best of my recollection, never have I created a procedure or report which is based off of CUBE.

Comments closed

Three Rants for the Price of One

Nate Johnson is a few months early for Festivus:

Another thing. This isn’t a “new update” thing; this is a long-standing “Oh my lord I can’t believe they DESIGNED it this way, WHAT were they THINKING?!?!” thing. Files. Sending & receiving files. I get an picture, like a screenshot, from a user (that’s NOT a OneDrive link, because that’s a whole ‘nother can-of-worms). I click to download it. It goes.. where?

OH RIGHT. It goes to my ‘Downloads’ folder. That dumpster-fire, where everything from anywhere goes into, and nobody keeps it organized, and nobody knows how to find anything unless they’ve gotten smart enough to sort by Date Modified descending. Right, that.

I can’t say I disagree about any of what Nate covers.

Comments closed

How SQL Server Stores UNIQUEIDENTIFIERS

Randolph West digs into what a UNIQUEIDENTIFIER looks like in storage:

Let’s take our example GUID again: CC05E271-BACF-4472-901C-957568484405. If we look at the table storage for this row, we’ll find it persisted as follows: 0x71E205CCCFBA7244901C957568484405 (alternating octets are highlighted in bold).

If you haven’t been following this series, this is a good place to remind you that SQL Server stores data using little-endian sequencing on disk and in memory. In the vast majority of cases, bytes are stored in reverse order because that’s how Intel CPUs like their data. However GUIDs are persisted slightly differently because of their sort order.

This is probably the most GUIDs I’ve seen in a single blog post.

Comments closed