Press "Enter" to skip to content

Author: Kevin Feasel

T-SQL Bugs with Joins

Itzik Ben-Gan takes us through four bugs or oddities around joins:

The order counts are now correct, but the total freight values are not. Can you spot the new bug?

The new bug is more elusive because it manifests itself only when the same customer has at least one case where multiple orders happen to have the exact same freight values. In such a case, you are now taking the freight into account only once per customer, and not once per order as you should.

Click through to avoid accidentally introducing bugs in your T-SQL code.

Comments closed

Usability Issues with Query Store

Lonny Niederstadt has a new series on usability “soft spots” with Query Store. Part one looks at plan identifiers:

Yeah. That’s a lotta plans in the right-hand legend.  22 of them.  In a not very helpful order.  In fact… though I’ve tried to figure it out, I don’t know what type of order is used for that right-hand legend.  It’s not chronological.  It’s not based on duration which is the metric displayed by the graph. I dunno.

Part two looks at “morally equivalent plans”:

Let’s refresh the “Tracked Queries” activity.

Ohhhh.  I forced plan_id 2 (in the purple box below) but what showed up was plan_id 3220 (in the yellow box below).

Lonny promises more, so keep on the lookout.

Comments closed

Using Biml to Retrieve Query Metadata

David Stein shows how you can use the GetQuerySchema() function in Biml to retrieve metadata for a query:

This post is part of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so.

The GetQuerySchema method provides an excellent way to import metadata from a query, in a similar way as GetDatabaseSchema imports schemas, tables, and views. GetQuerySchema was introduced last year with the 2018 editions of BimlStudio and BimlExpress.

This kind of higher-level programming takes some getting used to, but once you understand it, you can find common data access patterns and build one solution to work through many versions of the pattern.

Comments closed

What Compatibility Level 150 Gets You

Erik Darling explains the upsides and downsides of moving to SQL Server 2019 and compatibility level 150:

In those versions, flipping compatibility level uses the new Cardinality Estimator (CE). That new Cardinality Estimator is real hit or miss.

The worst part is that there’s practically no gain to be realized for using higher compatibility levels — that changes with SQL Server 2019.

Read on to see what those new features are. As far as the compatibility level switch goes, there comes a time when you just need to bite the bullet and use the new cardinality estimator. Erik has a few tips to help with that too.

Comments closed

Create Reports in Power BI Desktop Instead of Service

Melissa Coates explains why you should create reports in Power BI Desktop rather than directly through the Power BI Service:

I always recommend to Power BI authors that report creation & editing should happen in Power BI Desktop and to just ignore the edit capability in the Power BI Service. Usually my reasons are concerned with (a) which version is the latest version and (b) having versioning history, and (c) reducing the risk of someone overwriting someone else’s work because two different reporting editing/publishing options are in use. These are very valid concerns, but I’ve discovered one additional problem that could easily confuse some people: (d) the side effects experienced if you download then re-upload a web-created report. Let me explain…

Read on for that explanation.

Comments closed

Monte Carlo Simulation in Python

Kristian Larsen has a couple of posts on Monte Carlo style simulation in Python. First up is a post which covers how to generate data from different distributions:

One method that is very useful for data scientist/data analysts in order to validate methods or data is Monte Carlo simulation. In this article, you learn how to do a Monte Carlo simulation in Python. Furthermore, you learn how to make different Statistical probability distributions in Python.

You can also bootstrap your data, reusing data points when building a set of samples:

A useful method for data scientists/data analysts in order to validate methods or data is Bootstrap with Monte Carlo simulation In this article, you learn how to do a Bootstrap with Monte Carlo simulation in Python.

Both posts are worth the read.

Comments closed

Hyperparameter Tuning with MLflow

Joseph Bradley shows how you can perform hyperparameter tuning of an MLlib model with MLflow:

Apache Spark MLlib users often tune hyperparameters using MLlib’s built-in tools CrossValidator and TrainValidationSplit.  These use grid search to try out a user-specified set of hyperparameter values; see the Spark docs on tuning for more info.

Databricks Runtime 5.3 and 5.3 ML and above support automatic MLflow tracking for MLlib tuning in Python.

With this feature, PySpark CrossValidator and TrainValidationSplit will automatically log to MLflow, organizing runs in a hierarchy and logging hyperparameters and the evaluation metric.  For example, calling CrossValidator.fit() will log one parent run.  Under this run, CrossValidator will log one child run for each hyperparameter setting, and each of those child runs will include the hyperparameter setting and the evaluation metric.  Comparing these runs in the MLflow UI helps with visualizing the effect of tuning each hyperparameter.

Hyperparameter tuning is critical for some of the more complex algorithms like random forests, gradient boosting, and neural networks.

Comments closed

Default Schemas in SQL Server

Max Vernon takes us through the order in which SQL Server searches for tables given a single-part name:

Default schemas in SQL Server can be a blessing, since they reduce the need to specify the schema when creating DDL statements in T-SQL. However, relying on the default schema when creating DML statements can be problematic. A recent question on dba.stackexchange.com asked “Does T-SQL have a Schema search path?”, similar to PostgreSQL implements the search_pathparameter. This post shows how schemas are implemented in SQL Server. We’ll also see why it’s important to always specify the schema when using SQL Server.

A lot of this behavior goes back to the pre-2005 era. 2005 introduced schemas as logical separators, whereas pre-2005 they were more of a security measure (and dbo was the database owner’s schema). I completely agree that you should specify two-part names in-database. It’s a tiny bit faster (which adds up when you’re doing thousands of transactions per second) and reduces ambiguity.

Comments closed

Extended Event Filters Outlive Sessions

Dave Bland ran into an interesting problem during a demo:

Recently during a demo at a SQL Saturday the query to pull the Extended Event session data, didn’t return the expected results. The session I used for the demo was the create database statement.

Prior to the session, I deleted the Create Database session, however did not delete the target files because they are part of the demo.  Then I recreated the session, just as I had done before.  However, this time was there was a difference when I attempted to read the target data.  The entry for the newly created database was not showing up when I used the GUI, however was showing up when I read the XML.  During the session, I was not able to figure out why that was the case.

Click through to see the root cause and how Dave fixed the problem.

Comments closed