Press "Enter" to skip to content

Day: April 19, 2021

Querying Serverless SQL Pools from Spark Notebooks in Scala

Jovan Popovic shows off one integration point between the data services in Azure Synapse Analytics:

Azure Synapse Analytics provides multiple query runtimes that you can use to query in-database or external data. You have the choice to use T-SQL queries using a serverless Synapse SQL pool or notebooks in Apache Spark for Synapse analytics to analyze your data.

You can also connect these runtimes and run the queries from Spark notebooks on a dedicated SQL pool.

In this post, you will see how to create Scala code in a Spark notebook that executes a T-SQL query on a serverless SQL pool.

Read on to see how. You can also query Spark pool and dedicated SQL pool tables from serverless SQL pools.

4 Comments

Removing a Node from a Hadoop Cluster

The Hadoop in Real World team shows us the proper way to remove a node from a Hadoop cluster:

This post will list out the steps to properly remove a node from a Hadoop cluster. It is not advisable to just shut down the node abruptly.

Node exclusions should be properly recorded in a file that is referred to by the property dfs.hosts.exclude. This property doesn’t have default value so in the absence of a file location and a file, the Hadoop cluster will not exclude any nodes.

Read on for more information, including what happens if you simply turn off the node.

Comments closed

Enabling and Disabling Table Load in Power Query

Jon Fletcher shows how to disable query processing in Power Query:

In Power BI Power Query there is an option to enable or disable whether a table is loaded into the report. The option ‘Enable load’ can be found by right clicking on the table. Typically, by default, the load is already enabled.

Click through to see how you can disable query processing or remove a particular query from report refreshes.

Comments closed

Capturing Deadlocks with the system_health Extended Event

Jack Vamvas is hunting deadlocks:

An application using SQL Server as the database backend was experiencing some application rollbacks. I decided to investigate the SQL Server to identify any errors which could be correlated to the application timeouts experienced by the users. 

I started reviewing the errors in the Extended Events system health logs, which are normally running by default on a SQL Server. They have a ton of useful information . I noticed a steady stream of deadlocks . This is the code used to create a permanent table to store the deadlock details , for review by the application team. 

Click through for the script.

Comments closed

Understanding Long Failover Times for Availability Groups

Sean Gallardy has answers to your Availability Group questions, as long as you ask the specific question in this post:

One of the most common issues I look at from day to day is some variation of the question. “Why did it take a long time for my AG/Database to failover?”. There are many different meanings for this innocuously simple looking statement, for example was it that the failover time was long or was it a long time bringing the database online, or was it that it took a long time because a failover wasn’t possible, and what *exactly* is a long time? Are we talking a long time means 10 seconds, 1 minute, 5 minutes, 30 minutes? To each different business and their needs, “long” dramatically fluctuates. I’d like to go through at a high level, some of the most common reasons that I troubleshoot and if they might apply to your environment. FYI, if you tell me 1 second is a long time then I’m going to point you toward different architectures with multiple layers of caches and front-end servers/services which isn’t going to be cheap, but that’s what you want so you’re _willing_ to pay for it, right? Yeah, I thought not.

Click through for several factors which may affect how long it takes for a failover to occur.

Comments closed

Alternatives to GREATEST() and LEAST()

Mike Scalise has an alternative to using the GREATEST() and LEAST() functions in SQL Server:

As of 4/14/21, Microsoft has officially announced that the GREATEST and LEAST functions are in Azure SQL Database and SQL Managed Instance. Unofficially, it seems they had silently included these functions in at least (no pun intended) SQL Managed Instance several months prior. In any case, here we are today with official Microsoft documentation on GREATEST and LEAST. This is all great news. What’s also great is that, in their statement, Microsoft stated they would be including these two functions in the next version of SQL Server.

But what about all of us on SQL Server 2019 and prior? Fortunately, there’s a way to mimic these two functions in your queries using a correlated subquery in the SELECT clause.

Click through for examples. This is a bit different from getting the largest or smallest value in a window, which you can do with MIN(val) OVER () or MAX(val) OVER (). But I’m looking forward to seeing GREATEST() and LEAST() in the box product.

Comments closed

Improving Dataflow Performance in Power BI

Chris Webb shows how you can improve dataflow performance in Power BI after switching to a Premium Per User model:

Over the years I have written a lot about Power BI/Power Query performance but it has always been in the context of loading data direct into datasets, not dataflows. A lot of cool things have been happening in dataflows recently, though, and now that Premium Per User has made Premium features to a much wider audience I thought it would be interesting to look at an example of how PPU can help dataflow performance and specifically how and when the Enhanced Compute Engine can make dataflow refresh faster.

Click through for some interesting findings.

Comments closed