Press "Enter" to skip to content

Curated SQL Posts

Power Bi Dataflows and the Right Tool for the Job

Matthew Roche answers a reader question and waxes philosophical at the same time:

– Power BI dataflows and CDM folders provide capabilities for bridging the low-code/no-code world of self-service BI with managed central corporate BI in Azure.
– Power BI dataflows enable Excel-like composition of ETL processes with linked and computed entities.
– Power BI dataflows can scale beyond the desktop and leverage the power of the cloud to become part of an end-to-end BI application.

But… This is just a list of features.

Read the whole thing.

Comments closed

Using Graph + Spatial to Find Closest Points

Hasan Savran shows how you can combine graph tables with spatial data types in SQL Server to find the nearest thing—in this case, a distribution center:

Today, I want to show you how Graph Processing Tables can make your data models flexible and smart. Let’s say we work in a e-commerce company, we have many users and products just like Amazon. We also have many warehouses, same product might be located in multiple warehouses. Whenever we want to ship a product, we want to pick the closest warehouse to buyer. In this way, we should be able save good amount of money for shipping and products will arrive to our customers locations faster.

Click through for the demo.

Comments closed

Top Products Per Group in Power BI

Marco Russo shows how you can display just the top few products in each grouping using Power BI:

A common approach to this scenario is to create a complex measure that hides the result if the element should not be displayed. In other words, it computes the ranking of the product and it blanks out the result if the ranking is larger than three. Though easy to implement, this approach requires modifying all the measures that should be displayed in a visual and negatively impacts performance. A better solution is to create a specific measure to use in the visual-level filter: that measure is executed only once for each product to define the set of visible items, without requiring any change to the other measures used in the visualization. We will see different options to solve the scenario.

Marco shows off a few techniques to get this done.

Comments closed

PolyBase in SQL Server 2019

Ben Weissman takes us through SQL Server 2019’s PolyBase enhancements:

Isn’t that the same thing, as a linked server?
At first sight, it sure looks like it. But there are a couple of differences. Linked Servers are instance scoped, whereas PolyBase is database scoped, which also means that PolyBase will automatically work across availability groups. Linked Servers use OLEDB providers, while PolyBase uses ODBC. There are a couple more, like the fact that PolyBase doesn’t support integrated security, but the most significant difference from a performance perspective is PolyBase’s capability to scale out – Linked Servers are single-threaded.

Read the whole thing. Ben asks and answers the question of whether PolyBase replaces ETL. You’ll want to read his answer. My answer (and I won’t tell you how close it is to his because I want you to read his article) is that PolyBase will only replace a fraction of total ETL and will act as an ETL process in a larger percentage of cases. I can see a pattern where you virtualize the data as external tables and then connect them together locally to insert into local facts and dimensions, for example. But there are too many things you can do with other ETL platforms which make me say this will never be a full replacement.

Comments closed

Polishing Uncalibrated Models

Nina Zumel takes an uncalibrated random forest model and applies a calibration technique to improve the estimate on one variable:

In the previous article in this series, we showed that common ensemble models like random forest and gradient boosting are uncalibrated: they are not guaranteed to estimate aggregates or rollups of the data in an unbiased way. However, they can be preferable to calibrated models such as linear or generalized linear regression, when they make more accurate predictions on individuals. In this article, we’ll demonstrate one ad-hoc method for calibrating an uncalibrated model with respect to specific grouping variables. This “polishing step” potentially returns a model that estimates certain rollups in an unbiased way, while retaining good performance on individual predictions.

This is a great explanation of the process as well as its risks and limitations.

Comments closed

Generating Excel Spreadsheets from Shiny

Richard Hill and Andy Merlino show how you can export data from a Shiny app into Excel:

R is great for report generation. Shiny allows us to easily create web apps that generate a variety of reports with R.

This post details a demo Shiny app that generates an Excel report, a PowerPoint report, and a PDF report:

The full Shiny app source code is available here. Also, we included a more basic Shiny app that generates an Excel report at the end of this post. Follow up posts will include similar simple Shiny apps generating PowerPoint and PDF reports.

Excel is still the most popular business intelligence tool and Excel support tends to be one of the first requests people get with third-party apps, so it’s good to know you can do this in Shiny without too much rigmarole.

Comments closed

When tempdb Spills Attack

Josh Darnell ran into a problem with a SQL Agent job:

One of my colleagues reached out to me recently about a production issue where a SQL Server Agent job had failed with this error message:

Msg 1105, Level 17, State 2, Line 15
Could not allocate space for object ‘dbo.SORT temporary run storage: 140737513062400’ in database ‘tempdb’ because the ‘PRIMARY’ filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

I fully expected this to be a scheduled maintenance task, like index rebuilds or statistics updates. I’ve seen this error before in those contexts (rebuilding large indexes in tempdb, or updating statistics with FULLSCAN).

But watch as SQL Server subverts Josh’s expectations.

Comments closed

Power BI Conditional Formatting and Icons

Matt Allington shows how you can now use icons as the output of conditional formatting in Power BI:

Note how the icons above have both shape and colour so you can differentiate between them even if you are colour blind.  This is best practice.

You can also change the default formatting to work on the hard coded number settings that you specify. In the example below I have changed the settings to work on absolute numbers instead of percentages (note the changes in the highlighted boxes).  Also note that I have set the minimum and maximum numbers shown as 1 and 2.  To do this, simply delete the value in these boxes.  Thanks to Chris Webb for finally helping me understand how this works.

It’s easy to go overboard with this, but I’m happy to see conditional formatted icons in place; done right, you can pack a lot of information into a small space with them.

Comments closed

The SQL Notebook Experience, Featuring Powershell

Rob Sewell takes a break from book-writing and talks about using Powershell in SQL Notebooks:

Yes, it’s funny but also it carries a serious warning. Without understanding what it is doing, please don’t enable PowerShell to be run in a SQL Notebook that someone sent you in an email or you find on a GitHub. In the same way as you don’t open the word document attachment which will get a thousand million trillion pounddollars into your bank account or run code you copy from the internet on production without understanding what it does, this could be a very dangerous thing to do.

With that warning out of the way, there are loads of really useful and fantastic use cases for this. SQL Notebooks make great run-books or incident response recorders and PowerShell is an obvious tool for this. (If only we could save the PowerShell output in a SQL Notebook, this would be even better)

“It’s a bit hacky” is a generous statement, but it’s really cool that Rob figured out a way to do this. There is a Powershell kernel for Jupyter, but I’ve not had the best experience adding new kernels to Azure Data Studio (at least not F#’s kernel, which I tried).

Comments closed

Simple Query Zen

Erik Darling wants you to simplify your life queries:

See, when a query is big and complicated to you, there’s a pretty good chance you’re gonna get a big and complicated query plan, because it’s big and complicated to the optimizer, too.

This isn’t to say the optimizer is dumb or bad or ugly; it’s just that there’s only so long it’s willing to spend coming up with a plan.

Remember, cheap plan fast. Not perfect, not great, maybe good enough.

It’s a good operating philosophy: if you have a query which has gone off the rails, one of the best things you can do is try to turn the query into several small steps. It’s possible to reduce complexity that way…though you may also gain complexity in the process if you do it wrong.

Comments closed