Curated SQL – Page 908 – A Fine Slice Of SQL Server

Distinct Counts in Power Query

Published 2020-02-28 by Kevin Feasel

Reza Rad shows how you can get a distinct count in Power Query:

You can have a distinct count calculation in multiple places in Power BI, through DAX code, using the Visual’s aggregation on a field, or even in Power Query. If you are doing the distinct count in Power Query as part of a group by operation, however, the existing distinct count is for all columns in the table, not for a particular column. In this article, I’ll show you a method you can use to get the distinct count of a particular column through the Group By transformation in Power Query component of Power BI.

Click through to learn how.

Comments closed

Corruption and Secondary Databases

Published 2020-02-28 by Kevin Feasel

Paul Randal shares some wisdom on corruption:

We’ve had a few new clients come to use recently after experiencing corruption, and they’ve been worried about whether physical corruption can propagate to secondary databases (like an availability group secondary or log shipping secondary) through the mechanism used to maintain the secondary database in sync with the primary database. I explained how it’s very rare for that to happen, but sometimes it does, and not in a way you’d think. Read on…

I don’t even have to ask you to read on; Paul has even done that. And do read the comments as well.

Comments closed

Preventing Query Timeouts with Power BI Incremental Refresh

Published 2020-02-28 by Kevin Feasel

Gilbert Quevauvilliers shows how to set the default timeout for a query against SQL Server from Power BI:

This was because on the first refresh it has to process all the data before it can incrementally refresh the dataset.
As per the documentation the default timeout for a SQL Server database is set to 10 minutes, and when I am processing a lot of data it can easily take longer than 10 minutes to return all the data.

Read on to see how you can change that if you need to.

Comments closed

Goodbye, MCSE

Published 2020-02-28 by Kevin Feasel

John Deardurff helps break the news:

Major Announcement from Microsoft Learning today. As Microsoft continues to invest in role-based learning offerings, the Microsoft Certified Solutions Associate (MCSA), Microsoft Certified Solutions Developer (MCSD), and Microsoft Certified Solutions Expert (MCSE) certifications will be phased out with a final retirement date of June 30th, 2020. Find the entire list of retired certifications here.

On the plus side, at least people who hold the next iteration of the MCSE won’t be confused with people who worked with NT4 anymore…

Comments closed

Loading Data into Delta Lake

Published 2020-02-27 by Kevin Feasel

Prakash Chockalingam takes us through auto-loading Delta Lake from various sources:

Auto Loader is an optimized file source that overcomes all the above limitations and provides a seamless way for data teams to load the raw data at low cost and latency with minimal DevOps effort. You just need to provide a source directory path and start a streaming job. The new structured streaming source, called “cloudFiles”, will automatically set up file notification services that subscribe file events from the input directory and process new files as they arrive, with the option of also processing existing files in that directory.

This does look interesting.

Comments closed

How Apache Beam Runs on Top of Apache Flink

Published 2020-02-27 by Kevin Feasel

Maximilian Michels and Markos Sfikas explain why you might want to combine Apache Beam with Apache Flink:

Apache Flink and Apache Beam are open-source frameworks for parallel, distributed data processing at scale. Unlike Flink, Beam does not come with a full-blown execution engine of its own but plugs into other execution engines, such as Apache Flink, Apache Spark, or Google Cloud Dataflow. In this blog post we discuss the reasons to use Flink together with Beam for your batch and stream processing needs. We also take a closer look at how Beam works with Flink to provide an idea of the technical aspects of running Beam pipelines with Flink. We hope you find some useful information on how and why the two frameworks can be utilized in combination. For more information, you can refer to the corresponding documentation on the Beam website or contact the community through the Beam mailing list.

Read on for the full story. If you’re so inclined, you can also check out the full talk as a video.

Comments closed

A Metadata-Driven Framework for ADF Pipelines

Published 2020-02-27 by Kevin Feasel

Paul Andrew has started a series on metadata-driven Azure Data Factory pipelines:

The concept of having a processing framework to manage our Data Platform solutions isn’t a new one. However, overtime changes in the technology we use means the way we now deliver this orchestration has to change as well, especially in Azure. On that basis and using my favourite Azure orchestration service; Azure Data Factory (ADF) I’ve created an alpha metadata driven framework that could be used to execute all our platform processes. Furthermore, at various community events I’ve talked about bootstrapping solutions with Azure Data Factory so now as a technical exercise I’ve rolled my own simple processing framework. Mainly, to understand how easily we can make it with the latest cloud tools and fully exploiting just how dynamic you can get a set of generational pipelines.

This first post lays out some of the architectural decisions and prep work needed for the series.

Comments closed

When the Optimizer Can Use Batch Mode on Row Store

Published 2020-02-27 by Kevin Feasel

Erik Darling looks at some internals for us:

Things like Accelerated Database Recovery, Optimize For Sequential Key, and In-Memory Tempdb Metadata are cool, but they’re server tuning. I love’em, but they’re more helpful for tuning an entire workload than a specific query.
The thing with BMOR is that it’s not just one thing. Getting Batch Mode also allows Adaptive Joins and Memory Grant Feedback to kick in.
But they’re all separate heuristics.

Read on to see the extended events around batch mode to help you determine if it’s possible for the optimizer to use it for a given query.

Comments closed

Creating an Elastic Jobs Agent

Published 2020-02-27 by Kevin Feasel

Kate Smith continues a series on elastic jobs in Azure SQL Database:

Having laid the conceptual groundwork for Elastic Jobs in two previous postings (1, 2), I am now going to create an elastic job and associated credentials using PowerShell. For this scenario, I have one or more databases with a table ‘T’ and statistics ‘tStats’. I want to enforce an update for these statistics every day. To do this, I need to check that my stats have been updated in the past day, and if not, update them. The T-SQL to update statistics on a table “T” with stats named “tStats” is simple:

Click through for the Powershell script.

Comments closed

Understanding SOS_SCHEDULER_YIELD

Published 2020-02-27 by Kevin Feasel

David Fowler takes us through the SOS_SCHEDULER_YIELD wait type:

I decided to write this off the back of a conversation I was having the other day around the SOS_SCHEDULER_YIELD wait type.
The conversation went something along the lines of “but David, I’m seeing SOS_SCHEDULER_YIELD, we must have CPU issues”.
Yes this particular customer had been CPU bound recently but was that really their problem now, what is SOS_SCHEDULER_YIELD really mean?

It’s a good write-up of when it is and is not a problem.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts