Curated SQL – Page 992 – A Fine Slice Of SQL Server

Diagnosing PolyBase Errors

Published 2019-12-31 by Kevin Feasel

Niels Berglund takes us through an odd “incorrect syntax” error with PolyBase:

What we see in Figure 1, incorrect syntax exception, is strange, as I have executed the same code in a SQL Server 2019 Big Data Cluster, (BDC), without any issues, and the forum poster executed the same in SQL Server 2019 Enterprise Edition also without any issues.
Ok, but what about creating an external table against a relational data source – where we do not need to define an external file format?

There is a straightforward answer as to why the specific error message pops up, but I agree with Niels that it’d be nice to have “here’s the problem and here’s the solution” types of error messages. The deeper you get into the product—especially the older Hadoop external data source—the worse the error messages get.

Comments closed

Finding the Largest Profit or Loss with R

Published 2019-12-30 by Kevin Feasel

David Robinson takes us through a brainteaser:

I recently came across an interview problem from A Cool SQL Problem: Avoiding For-Loops . Avoiding loops is a topic I always enjoy reading about, and the blog post didn’t disappoint. I’ll quote that post’s description of the problem:
You have a table of trading days (with no gaps) and close prices for a stock.
Find the highest and lowest profits (or losses) you could have made if you bought the stock at one close price and sold it at another close price, i.e. a total of exactly two transactions.
You cannot sell a stock before it has been purchased. Your solution can allow buying and selling on the same trading_date (i.e. profit or loss of $0 is always, by definition, an available option); however, for some bonus points, you may write a more general solution for this problem that requires you to hold the stock for at least N days.

The SQL solution is another scenario where window functions save the day. In R, there are a couple straightforward options (if you happen to know about the functions!) which have radically different performance profiles.

Comments closed

Mapping in R with mapply

Published 2019-12-30 by Kevin Feasel

Andrew Treadway shows how to use the Map() and mapply() functions in R:

An older post on this blog talked about several alternative base apply functions. This post will talk about how to apply a function across multiple vectors or lists with Map and mapply in R. These functions are generalizations of sapply and lapply, which allow you to more easily loop over multiple vectors or lists simultaneously.

The idea of Map in functional programming takes a bit of time to really wrap your head around, but once you do, it becomes extremely powerful. H/T R-bloggers

Comments closed

Creating a Custom Power BI Filter Pane

Published 2019-12-30 by Kevin Feasel

Tomaz Kastrun walks us through creating a custom filter pane in Power BI:

Having as much possible Power BI estate when it comes to putting the visuals for data exploration, is everyone’s dream. But the slicers for “slicing and dicing” the data is also very important. Having all the slicers hidden has always been mine go-to design.
Several similar ideas have been shown, Guy in a cube (Adam and Patric) have both done similar ideas over past years, many questions have also been answered on Power BI community website. And mine requirements were similar:
1. have the ability to hide the filter pane
2. have the selected slicer items listed

Read on to see how Tomaz took care of the issue.

Comments closed

When OBJECT_NAME() Can Block

Published 2019-12-30 by Kevin Feasel

Andy Mallon shows us that the OBJECT_NAME() function does not care about your READ UNCOMMITTED transaction isolation level:

That’s just going to sit there and wait, and wait, and wait, and wait. Because the table was created inside a transaction, the metadata about the table is uncommitted, and thus unavailable to the second session. The result is that my second session waits & waits & waits until the first session is committed (or rolled back).
OK, so it’s being blocked due to an uncommitted transaction. We could try doing dirty reads, right? I’ll kill my second session that’s been blocked, and I’ll throw in SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;. That’s like just like using a nolock hint:
USE AM2_WTF;
GO
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT OBJECT_NAME(581577110);
What the… It’s still being blocked by the uncommitted CREATE TABLE.

Read on to understand why, and what you can do about it.

Comments closed

The JOIN Function in SSRS

Published 2019-12-30 by Kevin Feasel

Tim Mitchell explains what the JOIN() function does in SQL Server Reporting Services:

The JOIN() function in SQL Server Reporting Services is a handy tool that allows you to turn a list into a delimited string value. This function accepts two parameters, a list and a delimiter with which to separate the output, and returns a string with that list separated by the specified delimiter character.

It’s not an inaccurate name, but I’m surprised it’s not named CONCAT_WS() or something like that given how the term “join” has such a strong connotation in the relational database world.

Comments closed

Query Concurrency + Azure Synapse Analytics

Published 2019-12-30 by Kevin Feasel

James Serra takes us through query concurrency with Azure Synapse Analytics:

A common question I here from customers is because of the performance of Azure Synapse Analytics (formally called Azure SQL Data Warehouse or SQL DW), can they run Power BI dashboards against it using DirectQuery (and not have to use Azure Analysis Services (AAS), Import the data into Power BI, or use Power BI aggregation tables), avoiding having another copy of the data (saving money), and having data “real time” (as of the last refresh of the data warehouse)?
There are two things to think of in considering an answer this question. The first is if you will get the performance you need (discussed in my last blog), the second is if a certain amount of concurrent queries or connections will cause a problem (the subject of this blog).

Read the whole thing.

Comments closed

Diagnosing Power Query Steps

Published 2019-12-30 by Kevin Feasel

Chris Webb takes us through the Diagnose Step button in Power Query:

As you might have guessed, it’s closely related to the Query Diagnostics functionality that was introduced back in October. Whereas the existing Query Diagnostics functionality allows you to see what happens inside the Power Query when a query is executed, this new feature does something similar but allows you to run a query up to a specific step. This is useful in scenarios where you want to reduce the diagnostics data you are collecting to a subset of the steps in the query without having to comment out a lot of M code.

It looks pretty useful.

Comments closed

TempDB Usage and WhoIsActive

Published 2019-12-30 by Kevin Feasel

Josh Darnell takes us through a weird scenario where WhoIsActive can’t catch the real culprit:

Let’s say you are informed that tempdb is getting hammered on a production SQL Server instance (in the “lots of reads and writes” sense, not the “lots of shots of tequila” sense), and it’s disrupting other workloads on the system. You may have found this out through the power of monitoring (tempdb files are growing or full), or your favorite DMV queries, or just from being really smart.
You spring into action to find the offending query, and run EXEC sp_WhoIsActive but get…nothin’:

I did not successfully guess why this might be, but Josh explains it well.

Comments closed

How Spark Runs on YARN with HDFS

Published 2019-12-27 by Kevin Feasel

Sarfaraz Hussain explains how some of the pieces of the Hadoop ecosystem fit together:

Once it verifies that everything is in place, it will assign a Job ID to the Job and then allocate the Job ID into a Job Queue.
So, in Job Queue there can be multiple jobs waiting to get processed.
As soon as a job is assigned to the Job Queue, it’s corresponding information about the Job like Input/Output Path, the location of the Jar, etc. are written into the temp location of HDFS.

Read the whole thing.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Curated SQL Posts