Press "Enter" to skip to content

Curated SQL Posts

Finding The Last Non-Null Value With Snowflake

Koen Verbeeck shows how two words makes solving a problem with Snowflake a lot easier than with SQL Server:

Sometimes you need to find the previous value in a column. Easy enough, the LAG window function makes this a breeze (available since SQL Server 2012). But what if the previous value cannot be null? You can pass a default, but we actually need the previous value that was not null, even if it is a few rows back. This makes it a bit harder. T-SQL guru Itzik Ben-Gan has written about the solution to this problem: The Last non NULL Puzzle. It’s a bit of tricky solution. 

Click through for the magic words and if you’re on the SQL Server side, upvote this issue to get that functionality in SQL Server too.

Comments closed

Syncing Slicers In Power BI

Prathy Kamasani takes us through a recently added feature in Power BI:

As per Microsoft docs:
“This feature lets you create a custom group of slicers to keep synchronized. A default name is provided, but you can use any name you prefer.
The group name provides additional flexibility with slicers. You can create separate groups to sync slicers that use the same field, or put slicers that use different fields into the same group.”

First, let’s look at creating groups to sync slicers that use the same field. The use case Syncing within a page, we can easily use the group functionality to do this.

Click through for a few demos of increasing complexity.

Comments closed

A Rant About ORMs

Ned Otter is not a fan of ORMs:

I’ve seen a lot of tech come and go in my time, but nothing I’ve seen vexes me more than “framework generated SQL”.  No doubt I’m ignorant about some aspects of it, but its usage continues to confound many a DBA.

To troubleshoot one of these bad boys, you might consider Google Glass, but it will fail you. The first issue is that these crappy frameworks generate a code tsunami that’s almost (or actually) unreadable by humans. The tables you know and love are aliased with names such as “Extent1” and the like. Multiple nestings of that, and it’s all gobbledygook aka spaghetti code.

These work great as long as you have more hardware to throw at the problem.

I would differentiate here a micro-ORM like Dapper from a Hibernate or Entity Framework like Ned has in mind, where the difference is that Dapper acts as a way of automating the data access layer but you still write the SQL queries or stored procedures.

Comments closed

Using The ROWVERSION Type For ETL

Max Vernon shows us how to use the ROWVERSION data type to tell how much work you have to do to ETL data over from one table to another:

The OLTP table implements a rowversion column that is automatically updated whenever a row is updated or inserted. The rowversion number is unique at the database level, and increments monotonically for all transactions that take place within the context of that database. The dbo.OLTP_Updates table is used to store the minimum row version available inside the transaction used to copy data from the OLTP table into the OLAP table. Each time this code runs it captures incremental changes. This is far more efficient than comparing all the rows in both tables using a hashing function since this method doesn’t require reading any data other than the source data that is either new, or has changed.

I think this is the first time I’ve seen someone use ROWVERSION types successfully.

Comments closed

Effective Identities And Power BI Embedded

Angela Henry shows how you can use Power BI Embedded for row-level security even when the accessing users don’t have Power BI accounts:

Now that you familiar with Row Level Security in Power BI, how do you make it work when you want to pass in your customer’s identifier because your customers don’t have Power BI accounts?  It seems like the only way to make dynamic row level security is to use the Username() DAX function?  But wait, doesn’t that require the user to have a Power BI account?  Sigh, it seems we are going in circles.

The one thing these articles don’t talk about is that when you are using Power BI Embedded, you can pass in whatever you like for the EffectiveIdentity via the Power BI API and it will “overwrite” the Username() function.  What?!  That’s right, it will completely ignore the Username() function and use whatever you give it.  WooHoo!

Read on for the details.

Comments closed

No-Code ML On Cloudera Data Science Workbench

Tim Spann has a post covering ML on the Cloudera Data Science Workbench:

Using Cloudera Data Science Workbench with Apache NiFi, we can easily call functions within our deployed models from Apache NiFi as part of flows. I am working against CDSW on HDP (https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_hdp.html),  but it will work for all CDSW regardless of install type.
In my simple example, I built a Python model that uses TextBlob to run sentiment analysis against a passed-in sentence. It returns Sentiment Polarity and Subjectivity, which we can immediately act upon in our flow.
CDSW is extremely easy to work with and I was up and running in a few minutes. For my model, I created a python 3 script and a shell script for install details. Both of these artifacts are available here: https://github.com/tspannhw/nifi-cdsw.

The “no code” portion was less interesting to me than the scalable ML portion, as “no code” either drops into tedium or ends up being replaced by code.

Comments closed

Generating Plots Like The BBC

David Smith has some notes on bbplot, a ggplot2 extension the BBC uses for its graphics:

If you’re looking a guide to making publication-ready data visualizations in R, check out the BBC Visual and Data Journalism cookbook for R graphics. Announced in a BBC blog post this week, it provides scripts for making line charts, bar charts, and other visualizations like those below used in the BBC’s data journalism. 

I’m still reading through the linked cookbook but it’s a good one.

Comments closed

Generating SSRS Subscription Agent Job Commands

Craig Porteous has a quick script to generate T-SQL commands to start and stop SQL Agent jobs tied to Reporting Services subscriptions:

This is a query I would run when I needed to quickly make bulk changes to Reporting Services subscriptions. It’s part of an “emergency fix” toolkit. 

Maybe a DB has went down and I have to quickly suspend specific subscriptions or locate Agent jobs for subscriptions. This was always a quick starting point.

I could take the generated StartEnable and Disable commands and record these in tickets or email threads to demonstrate actions taken. There are other ways to make bulk changes to SSRS subscriptions involving custom queries but this can be run immediately, I don’t have to tailor a WHERE clause first. I also wrote previously on managing subscription failures.

Click through for the script.

Comments closed

Generating Fake Data

Rich Benner shows us how to use the Faker library in Python to generate test data:

There are far more options when using Faker. Looking at the official documentation you’ll see the list of different data types you can generate as well as options such as region specific data.

Go have fun trying this, it’s a small setup for a large amount of time saved.

These types of tools can be great for generating a bunch of data but come with a couple of risks. One is that in the fake addresses Rich shows, ZIP codes don’t match their states at all, so if your application needs valid combos, it can cause issues. The other problem comes from distributions: generated data often gets created off of a uniform distribution, so you might not find skewness-related problems (e.g., parameter sniffing issues) strictly in your test data.

That said, easily generating test data is powerful and I don’t want to let the good be the enemy of the great.

Comments closed

Improving The Azure Automated AG Experience

Allan Hirt would like to see a few improvements to the experience when creating availability groups on Azure VMs:

What does this mean? To have a supported WSFC-based configuration (doesn’t matter what you are running on it – could be something non-SQL Server), you need to pass validation. xFailOverCluster does not allow this to be run. You can create the WSFC, you just can’t validate it. The point from a support view is that the WSFC has to be vetted before you create it. Could you run it after? Sure, but you still have no proof you had a valid configuration to start with which is what matters. This is a crucial step for all AGs and FCIs, especially since AGs do not check this whereas the installation process for FCIs does.

If you look at MSFT_xCluster, you’ll see what I am saying is true. It builds the WSFC without a whiff of Test-Cluster. To be fair, this can be done in non-Azure environments, too, but Microsoft givs you warnings not to do that for good reason. I understand why Microsoft did it this way. There is currently no tool, parser, or cmdlet to examine the output of Test-Cluster results. This goes back to why building WSFCs is *very* hard to automate.

I’m not sure how easy some of these fixes would be, but they’d definitely be nice.

Comments closed