Press "Enter" to skip to content

Curated SQL Posts

Eliminate the DeWitt Clause

Justin Olsson and Reynold Xin throw down the gauntlet:

At Databricks, we often use the phrase “the future is open” to refer to technology; it reflects our belief that open data architecture will win out and subsume proprietary ones (we just set a new official record on TPC-DS). But “open” isn’t just about code. It’s about how we as an industry operate and foster debate. Today, many companies in tech have tried to control the narrative on their products’ performance through a legal maneuver called the DeWitt Clause, which prevents comparative benchmarking. We think this practice is bad for customers and bad for innovation, and it’s time for it to go. That’s why we are removing the DeWitt Clause from our service terms, and calling upon the rest of the industry to follow.

One example of how you can tell if you’re influential is how many legal terms are named after you, which I’m pretty sure makes Dr. DeWitt the Steve Tasker of the database industry. So put David DeWitt in the Data Platform Hall of Fame.

And good of Databricks to eliminate their DeWitt Clause. Vendors put the clause in ostensibly to prevent rigged or invalid comparisons between products, but there’s a much better way to do this: publish the benchmark configuration and allow peer validation. If you put out garbage numbers (including on accident because you didn’t know the right way to do something), people are smart enough to catch that. And if people aren’t willing to publish the process, call for them to do it and if they still don’t, ignore the results. 100 times out of 100, that’s the right way to do it…assuming that you’re looking for the truth and not just trying to hide inferiorities in your product *cough* Oracle *cough*.

1 Comment

Ordered String Splitting with OPENJSON

Aaron Bertrand splits and cares about sort order:

Last year, I wrote about replacing all your CLR or custom string splitting functions with native calls to STRING_SPLIT. As I work on a project migrating several Microsoft SQL Server instances to Linux, I am encountering one of the roadblocks I mentioned last time: the need to provide an element in the output to indicate the order of the elements in the input string. This means STRING_SPLIT in its current form is out, because 1) it offers no such column; and, 2) the results are not guaranteed to be returned in any specific order. Are there other ways to achieve this functionality at scale and without CLR?

As Koen mentions in the comments, you can now get STRING_SPLIT with a sort parameter, but Aaron’s response is also valid: not everybody will have access to that today, so it still makes sense to understand the options.

Comments closed

Backing UP Power BI Premium—Couldn’t Connect to Azure

Gilbert Quevauvilliers troubleshoots an error:

What I did learn when working through the blog post is that I ran into some errors when trying to re-connect or trying to connect to the Azure Storage in my Premium App Workspace and it failed.

The errors that I got were, “We couldn’t connect to Azure, but it’s likely temporary. Try again later or see details.”

Read on for the cause and the solution.

Comments closed

Alerting on Log Shipping Failures

Andrea Allred is waiting for an e-mail:

I don’t want emails that tell me everything is ok, only when things are bad and I want them to be helpful emails. Not only did I want an alert, I wanted an email with actual information that I can use to make my decisions. Decisions like, can I just apply a few logs to get caught up or did everything burn down and I need to pull a full backup plus all the logs to be up and running again?

This was a task for some super fancy alerts on my agent job.

Click through to see how.

Comments closed

ML Updates in Azure Synapse Analytics

Aria Jelinek and Nellie Gustafsson have some announcements for us:

Announced last week at Ignite 2021, data teams now have a handful of new opportunities to drive value with machine learning built directly into their Apache Spark pools in Azure Synapse Analytics.

With the general availability of our machine learning library for Apache Spark on Azure Synapse, data teams now have expanded access to both code-first and code-free ML tools for forecasting, model training, and pre-built AI. This library provides both familiar open-source tools such as LightGBM as well as proprietary solutions to provide a comprehensive, streamlined approach to ML workloads. Updates include PREDICT, a new keyword that supports scoring AzureML and MLFlow models directly in Azure Synapse, and integration with Azure Cognitive Services, now generally available.

Click through for all of the announcements.

1 Comment

The Importance of Data Governance

Rob Farley riffs on another T-SQL Tuesday topic:

But the checks that we do are more about things that the database can allow, but are business scenarios that should never happen.

Plenty of businesses seem to recognise these scenarios all too well, and can point them out when they come across them. You hear phrases like “Oh, we know that’s not right, it should be XYZ instead”. And they become reasons why they don’t really trust their data. It’s a data quality issue, and every time someone comes across a data quality issue, they trust the data a little less.

Click through for Rob’s thoughts.

Comments closed

Azure Network Gateway Logging

Denny Cherry walks us through gateway logging in Azure:

If you’ve ever set up an Azure Network Gateway for Site to Site or Person to Site VPNing you’ve probably wanted to be able to see logging from the gateway. In the Azure portal, you can see a Logs option, but all it does is tell you to set up log analytics and the link that it gives you is … less than helpful.

Denny, however, has helpful instructions, so check it out.

Comments closed

Show All Merge Replication Articles

Steve Stedman prods the demons of merge replication:

At Stedman Solutions, we do a lot of work with SQL Server replication, mostly transactional and merge replication.

The other day I needed a query to show all the merge replication publication on a SQL Server, not just a single database, but to see it for all databases on the SQL Server.

Here is the query that I came up with.

Merge replication can be really great if you know what you’re doing. But it can also turn into a train wreck easily, and it’s really tough to get a good understanding of why something’s going wrong or how long it will take to be fixed (if at all).

Comments closed