Press "Enter" to skip to content

Curated SQL Posts

Extended Events in Azure SQL Database

Grant Fritchey compares and contrasts extended events on-premises to extended events in Azure SQL Database:

I have long advocated for the use of Extended Events. I’ve been posting all sorts of blog posts on how to implement them, how they present unique opportunities for new and interesting data, and how they do so much more than the old trace events, yet, put less of a load on the system. All of that is true, until we hit Azure SQL Database.

Now, don’t get me wrong, Extended Events are still awesome, amazing and wonderful. It’s just that, Azure SQL Database is going to force us to hop through a few hoops. I want to be up front with these… I’m trying to find the right word here, challenges? Maybe. Frustrations? Yeah, kind of. Limitations? Again, sort of, but not quite. We’ll settle on as neutral a term as possible: differences. For the moment.

Read on for the first part in an ongoing series.

Leave a Comment

QUOTENAME and Dynamic SQL

Chris Johnson looks at one way to protect dynamic SQL statements:

I don’t have the usual aversion to dynamic SQL that you find a lot of developers have. I use it quite regularly as part of my ETL processes, usually to define how data will move from one stage to another. So, for instance, I might have a single Validate procedure that takes a number of parameters, including an import table name, and moves data from that import table to a valid table according to my validation rules. Or I might have a single SCDMerge procedure that takes a transformation view and merges that into a slowly changing dimension (although not using the actual MERGE function for various reasons). These procedures allow me to have confidence that data will always move from one stage to another in the same way, and saves me from writing essentially the same statement 50 times, and having to update it 50 times when I need to change the way we do something, and inevitably missing something and introducing some error.

This always feels like a pretty safe use of dynamic SQL to me, because it avoids some of the more common objections people like to raise to it:

Click through to see how QUOTENAME() can help sanitize user inputs. I personally prefer the route of using sp_executesql but QUOTENAME() can also do the trick.

Leave a Comment

Querying Serverless SQL Pools from Spark Notebooks in Scala

Jovan Popovic shows off one integration point between the data services in Azure Synapse Analytics:

Azure Synapse Analytics provides multiple query runtimes that you can use to query in-database or external data. You have the choice to use T-SQL queries using a serverless Synapse SQL pool or notebooks in Apache Spark for Synapse analytics to analyze your data.

You can also connect these runtimes and run the queries from Spark notebooks on a dedicated SQL pool.

In this post, you will see how to create Scala code in a Spark notebook that executes a T-SQL query on a serverless SQL pool.

Read on to see how. You can also query Spark pool and dedicated SQL pool tables from serverless SQL pools.

4 Comments

Removing a Node from a Hadoop Cluster

The Hadoop in Real World team shows us the proper way to remove a node from a Hadoop cluster:

This post will list out the steps to properly remove a node from a Hadoop cluster. It is not advisable to just shut down the node abruptly.

Node exclusions should be properly recorded in a file that is referred to by the property dfs.hosts.exclude. This property doesn’t have default value so in the absence of a file location and a file, the Hadoop cluster will not exclude any nodes.

Read on for more information, including what happens if you simply turn off the node.

Leave a Comment

Enabling and Disabling Table Load in Power Query

Jon Fletcher shows how to disable query processing in Power Query:

In Power BI Power Query there is an option to enable or disable whether a table is loaded into the report. The option ‘Enable load’ can be found by right clicking on the table. Typically, by default, the load is already enabled.

Click through to see how you can disable query processing or remove a particular query from report refreshes.

Leave a Comment

Capturing Deadlocks with the system_health Extended Event

Jack Vamvas is hunting deadlocks:

An application using SQL Server as the database backend was experiencing some application rollbacks. I decided to investigate the SQL Server to identify any errors which could be correlated to the application timeouts experienced by the users. 

I started reviewing the errors in the Extended Events system health logs, which are normally running by default on a SQL Server. They have a ton of useful information . I noticed a steady stream of deadlocks . This is the code used to create a permanent table to store the deadlock details , for review by the application team. 

Click through for the script.

Leave a Comment

Understanding Long Failover Times for Availability Groups

Sean Gallardy has answers to your Availability Group questions, as long as you ask the specific question in this post:

One of the most common issues I look at from day to day is some variation of the question. “Why did it take a long time for my AG/Database to failover?”. There are many different meanings for this innocuously simple looking statement, for example was it that the failover time was long or was it a long time bringing the database online, or was it that it took a long time because a failover wasn’t possible, and what *exactly* is a long time? Are we talking a long time means 10 seconds, 1 minute, 5 minutes, 30 minutes? To each different business and their needs, “long” dramatically fluctuates. I’d like to go through at a high level, some of the most common reasons that I troubleshoot and if they might apply to your environment. FYI, if you tell me 1 second is a long time then I’m going to point you toward different architectures with multiple layers of caches and front-end servers/services which isn’t going to be cheap, but that’s what you want so you’re _willing_ to pay for it, right? Yeah, I thought not.

Click through for several factors which may affect how long it takes for a failover to occur.

Leave a Comment

Alternatives to GREATEST() and LEAST()

Mike Scalise has an alternative to using the GREATEST() and LEAST() functions in SQL Server:

As of 4/14/21, Microsoft has officially announced that the GREATEST and LEAST functions are in Azure SQL Database and SQL Managed Instance. Unofficially, it seems they had silently included these functions in at least (no pun intended) SQL Managed Instance several months prior. In any case, here we are today with official Microsoft documentation on GREATEST and LEAST. This is all great news. What’s also great is that, in their statement, Microsoft stated they would be including these two functions in the next version of SQL Server.

But what about all of us on SQL Server 2019 and prior? Fortunately, there’s a way to mimic these two functions in your queries using a correlated subquery in the SELECT clause.

Click through for examples. This is a bit different from getting the largest or smallest value in a window, which you can do with MIN(val) OVER () or MAX(val) OVER (). But I’m looking forward to seeing GREATEST() and LEAST() in the box product.

Leave a Comment

Improving Dataflow Performance in Power BI

Chris Webb shows how you can improve dataflow performance in Power BI after switching to a Premium Per User model:

Over the years I have written a lot about Power BI/Power Query performance but it has always been in the context of loading data direct into datasets, not dataflows. A lot of cool things have been happening in dataflows recently, though, and now that Premium Per User has made Premium features to a much wider audience I thought it would be interesting to look at an example of how PPU can help dataflow performance and specifically how and when the Enhanced Compute Engine can make dataflow refresh faster.

Click through for some interesting findings.

Leave a Comment

Check Those Feature Distributions

Antoine Rebecq shares a warning:

I was recently working on a cool dataset that looked unusually friendly. It was tidy, neat, interesting… the kind of things that you rarely encounter in the wild! My goal was to build a super simple predictor for one of the features. However, I kept getting poor results and at first couldn’t figure out what was happening.

There’s some good, practical advice in there, so check it out. H/T R-Bloggers

Leave a Comment