Press "Enter" to skip to content

Author: Kevin Feasel

Problems with Pivoting

Itzik Ben-Gan wraps up an outstanding series:

When people want to pivot data using T-SQL, they either use a standard solution with a grouped query and CASE expressions, or the proprietary PIVOT table operator. The main benefit of the PIVOT operator is that it tends to result in shorter code. However, this operator has a few shortcomings, among them an inherent design trap that can result in bugs in your code. Here I’ll describe the trap, the potential bug, and a best practice that prevents the bug. I’ll also describe a suggestion to enhance the PIVOT operator’s syntax in a way that helps avoid the bug.

If you use the PIVOT operator, you definitely want to read this article.

Comments closed

Scaling Out Continuous Integration

Chris Adkin shows off parallelism in Azure DevOps continuous integration pipelines:

A SQL Server data tools project is checked out of GitHub, built into a DacPac, four containerized SQL Server instances are spun up using clones of the ‘Seed’ docker volume. The DacPac is applied to a database running inside each container, which a tSQLt test is then executed against, finally, at the end very end the tSQLt results are aggregate and published.

This is an interesting approach to the problem of lengthy tests: run them on several separate machines concurrently.

Comments closed

Renaming Multiple Columns with Power Query

Matt Allington shows how you can use M to rename all columns at once in a table in Power Query:

When you are using a matrix like this, it can be difficult to tell which “Year” column is coming from which table, as shown below.

One solution to this problem is to rename all the columns in each table by pre-pending Order or Delivery to the front of the existing column names.  Once that is done, it is much clearer which column is which.

Matt describes the concept for you, but also has a video showing how to do this.

Comments closed

Getting the Last Actual Plan

Grant Fritchey shows off an improvement in SQL Server 2019:

I’ve always felt responsible for making such a big deal about the differences between estimated and actual plans. I implied in the first edition of the execution plans book (get the new, vastly improved, 3rd edition in digital form for free here, or you can pay for the print version) that these things were so radically different that the estimated plan was useless. This is false. All plans are estimated plans. However, actual plans have some added runtime metrics. It’s not that we’re going to get a completely different execution plan when we look at an actual plan, it’s just going to have those very valuable runtime metrics. The problem with getting those metrics is, you have to execute the query. However, this is no longer true in SQL Server 2019 (CTP 2.4 and greater) thanks to sys.dm_exec_query_plan_stats

Click through for an example, as well as what you need to do to enable this.

Comments closed

AutoSetDefaultInitialCatalog in Analysis Services

Chris Webb goes on a journey to understand what the AutoSetDefaultInitialCatalog property in Analysis Services does:

In Shabnam Watson’s recent blog post on a bug she found when trying to create a Live connection from Power BI to Analysis Services she mentioned that the AutoSetDefaultInitialCatalog server property could be used to solve her problem. This piqued my interested because I’d seen this property but had no idea what it did exactly or why it was there. Luckily, now I work for Microsoft, it’s even easier for me to find out about things like this from the dev team and Akshai Mirchandani was able to help.

First of all, what does it do? The documentation on this property has just been added here, and this is what it says:

Chris has connections, and we get to benefit from that.

Comments closed

R User Salaries By Country

Capri Granville shares a chart showing a box plot of salaries for professional R users by country:

Interesting analysis done in R, about salaries of R developers broken down by country, featuring salary range and median salary. 

The dataset consists of survey answers from nearly 90,000 respondents. About 5,000 of them reported using R for “extensive development work over the past year”. The first filter used reduces the dataset from 88,883 respondents to 5,048. The second filter excludes students, hobby programmers and former developers. This reduces the dataset to 4,047 respondents. The third filter excludes unemployed and retired respondents and the dataset is further reduced to 3,871 respondents. Finally, we exclude respondents from an unknown country and respondents with unknown or zero salary.

Check out Tomaz Weiss’s detailed post which dives into these numbers for United States respondents.

Comments closed

Logistic Regression Defaults and sklearn

Giovanni Lanzani shares some thoughts on scikit-learn defaults for Logistic Regression:

If you read the post, you can see that the biggest problem with the choice is that, unless your data is regularized, you will train a model that probably under performs: you are unnecessarily penalizing it by making it learn less than what it could from the data.

The second problem with the default behavior of LogisticRegression is about choosing a regularization constant that is — in effect — a magic number (equal to 1.0). This hides the fact that the regularization constant should be tuned by hyperparameter search, and not set in advance without knowing how the data and problem looks like.

Knowledge is power. Also read the post Giovanni links to in order to learn more about the issue.

Comments closed

Azure Data Studio September Release

Alan Yu announces the September release of Azure Data Studio:

As we continue to bring over key features from SQL Server Management Studio, one highly requested feature was enabling SQL Server command line (SQLCMD) mode in our Query Editor. SQLCMD mode allows users to write and edit queries as SQLCMD scripts. In addition, users can also execute the SQLCMD scripts.

This feature is now possible in Azure Data Studio.

Looks like there were several good improvements this month.

Comments closed

The Value of Query Store

Erin Stellato has started a series on the benefits of Query Store:

The Query Store feature previewed in Azure SQL Database in summer 2015, was made generally available that fall, and was part of the SQL Server 2016 release the following summer.  Over the past four years (has it really been that long?!) I have devoted significant time to learning Query Store – not just understanding how it works from the bottom up, but also why it works the way it does.  I’ve also spent a lot of time sharing that information and helping customers understand and implement the feature, and then working with them to use the query store data to troubleshoot issues and stabilize performance.  During this time I have developed a deep appreciation for the Query Store feature, as its capabilities go far beyond its original marketing.  Is it perfect?  No.  But it’s a feature that Microsoft continues to invest in, and in this series of blog posts my aim is to help you understand why Query Store is a tool you need to leverage in your environment.

Read on for a high-level overview of how Query Store is useful.

Comments closed