Press "Enter" to skip to content

Category: T-SQL

Ordering In Views

Kenneth Fisher explains why you shouldn’t order in views:

For many years it’s been a best practice to never put an ORDER BY in a view. The idea is that a view shouldn’t have an inherent order. Just like any other query. If you want the data from a view ordered then you query the view with an ORDER BY clause. In fact if you put an ORDER BY in a view you’ll get an error:

Msg 1033, Level 15, State 1, Procedure MyView, Line 4 [Batch Start Line 2]
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

I knew about the TOP 100 PERCENT bit, but had no idea that order was outright ignored.  Read the comments for additional information.

Comments closed

Chi Square Tests

Mala Mahadevan discusses how to perform a Chi Square test:

For any dataset to lend itself to the Chi Square test it has to fit the following conditions  –

1 Both  variables are categorical (in this case – exposure to smoking – yes/no, and health condition – sick/not sick are both categorical).
2 Researchers used a random sample to collect data.
3 Researchers had an adequate sample size.Generally the sample size should be at least 100.
4 The number of respondents in each cell should be at least 5.

This is an easy case for using R over T-SQL—the Chi Square test is built in, whereas you have to roll your own T-SQL code.  Mala does show you how to do this from within SQL Server R Services as well.

Comments closed

Finding Destinations In SSISDB

Bill Fellows has a script to figure out the name of that table throwing errors upon insertion:

There is a rich set of tables and views available in the SSISDB that operate as a flight recorder for SSIS packages as they execute. Markus Ehrenmüller (t) had a great question in Slack. In short, can you figure out what table is being used as a destination and I took a few minutes to slice through the tables to see if I could find it.

If it’s going to be anywhere, it looks like you can find it in catalog.event_message_context

If someone is using an OLE DB Destination and uses “Table or view” or “Table or View – fast load” settings, the name of the table will be the event message_context table. If they are using a variable name, then it’s going to be trickier.

Read on for the script.

Comments closed

Try-Catch Blocks

Dave Mason discusses a major problem with try-catch blocks:

We didn’t just lose one error. We lost them all! DBCC CHECKDB on a corrupt database just came back “clean” and the CATCH block never executed. We’ve moved into dangerous territory now. This is no longer an inconvenience or an annoyance. This could lead to data loss, financial harm to your company, maybe even some risk to your career.

Dave promises a better approach in his next post.

Comments closed

Pattern Matching

Aaron Bertrand has an article on performance testing various pattern matching techniques:

I wanted to prove to myself that the above procedure was undeniably better overall for all possible searches than any of the queries that use ORconditionals, regardless of whether they use LIKE, LEFT/RIGHT, SUBSTRING, or CHARINDEX. I took David’s basic query structures and put them in stored procedures (with the caveat that I can’t really test “contains” without his input, and that I had to make his OR logic a little more flexible to get the same number of rows), along with a version of my logic. I also planned to test the procedures with and without an index I would create on the search column, and under both a warm and a cold cache.

The conclusion is ambiguous.  Read on for the full story.

Comments closed

Pearson’s Correlation Coefficient

Mala Mahadevan explains correlation coefficients:

The statistical definition of Pearson’s R Coefficient, as it is called, can be found in detail here for those interested. A value of 1 indicates that there is a strong positive correlation(the two variables in question increase together), 0 indicates no correlation between them, and -1 indicates a strong negative correlation (the two variables decrease together). But you rarely get a perfect -1, 0 or 1. Most values are fractional and interpreted as follows:
High correlation: .5 to 1.0 or -0.5 to 1.0.
Medium correlation: .3 to .5 or -0.3 to .5.
Low correlation: .1 to .3 or -0.1 to -0.3.

Mala includes R and T-SQL code so you can follow along.

Comments closed

Arrays And Lists In SQL Server

Erland Sommarskog has updated his essay on Arrays and Lists in SQL Server.  He’s broken it down into a few parts.  First, the short version:

Now you know why IN (@list) does not work as you hoped for, but if you have a comma-separated list you still need to know to work with it.

The best approach in my opinion is to reconsider having a comma-separated list at all. After all, you are in a relational database, so why not use a table instead? That is, you should pass the data in a table-valued parameter (TVP) instead of that comma-separated list. If you have never used TVPs before, I have an article, Using Table-Valued Parameters in SQL Server and .NET, where I give a tutorial of passing TVPs from .NET to SQL Server, and there is a detailed description exactly of the case of passing a comma-separated list to a TVP. You will find that it is astonishly simple.

Unfortunately, not all environments support TVPs – Entity Framework has no real support for TVPs, reportedly nor has Reporting Services. The same applies if you are on SQL 2005 or earlier, since TVPs were added in SQL 2008. Or you may just be plain stubborn and want to use your comma-separated list. Or you are simply pressed for time, and don’t have the time to learn something new right now.

If you want a longer article on using table-valued parameters, Erland has one of those as well:

This is an article that is intended to get you started with passing table-valued parameters (TVPs) from SQL Server to .NET. I begin with presenting how you use table-valued parameters in SQL Server itself whereupon I give a quick overview of the mechanisms to pass TVPs from ADO .NET to SQL Server.

The main meat of this article are two real-world examples where I use TVPs. The first example is the classical problem of passing a comma-separated list of values to SQL Server, this time through a table-valued parameter. You will be amazed of how simple it is. In the second example I show two ways to load a file with master-detail data into tables in SQL Server. In addition to the examples, there is also some discussion on how you can improve performance when loading large amounts of data.

Despite the appearance of .NET in the title of this article, there is a final chapter that explores the possibilities in other APIs, of which some and some do not support TVPs. This includes Entity Framework which has no for support TVPs. In this chapter I briefly discuss workarounds when TVPs are not available to you.

And for the advanced look at arrays and lists, you have the long-form article:

A problem that has been popular over the years with SQL Server is how to handle a list of values. In the majority of the cases, people have a comma-separated list, because this format is produced by commonly used tools like multi-choice controls in .NET, Reporting Services and other places.

When I say that the problem is popular, I don’t only mean that the questions are commonplace – but so are solutions. You can find no end of blog posts etc that presents string-splitting functions, including performance tests of such functions and there are function that are known to be the fastest etc.

The aim of this article is two-fold: 1) Give a general discussion of how to design string-splitting functions. 2) Present and discuss each method from the angles I bring up in the general discussion. This includes performance, but not only.

Even if you’ve read this article before, it’s worth checking again to refresh your memory and to see his changes.

Comments closed

Graphing Customer Churn

Fang Zhou and Wee Hyong Tok have released a case study on a telephone company’s customer churn:

In the case of telco customer churn, we collected a combination of the call detail record data and customer profile data from a mobile carrier, and then followed the data science process —  data exploration and visualization, data pre-processing and feature engineering, model training, scoring and evaluation — in order to achieve the churn prediction. With a churn indicator in the dataset taking value 1 when the customer is churned and taking value 0 when the customer is non-churned, we addressed the problem as a binary classification problem and tried varioustree-based models along with methods like bagging, random forests and boosting. Because the number of churned customers is much less than that of non-churned customers (making the data set quite unbalanced), SMOTE (Synthetic Minority Oversampling Technique) was applied to adjust the proportion of majority class over minority class in the training data set, thus further improving model performance, especially precision and recall.

All the above data science procedures could be implemented with base R. Rather than moving the data out from the database to an external machine running R, we instead run R scripts directly on SQL Server data by leveraging the in-database analytics capability provided by SQL Server R Services, taking advantage of the rich and powerful CRAN R packages plus the parallel external memory algorithms in the RevoScaleR library. In what follows, we will describe the specific R packages and algorithms that we used to implement the data science solution for predicting telco customer churn.

They have provided the relevant materials in GitHub as well.

Comments closed

Parsing DBCC MEMORYSTATUS

Slava Murygin has a script to turn DBCC MEMORYSTATUS output into one result set:

When I wanted to research memory problem on a server and started to dig deeper into “DBCC MEMORYSTATUS” command.Very useful links to understand that command were from Microsoft:
https://support.microsoft.com/en-us/kb/271624
https://support.microsoft.com/en-us/kb/907877

During the research I’ve faced two problems:
1. I had to wait several seconds to get the full result set.
2. I had to scroll down and shuffle 115 different data sets to find the counter I want.

To eliminate both these problem all these different counters have to be in the same table/data set.
That will make research easier and data can be stored or compared with the base line.

Read on for the T-SQL script.

Comments closed

Pivoting Data

Jeffrey Verheul shows how to use the PIVOT operator in T-SQL:

The idea of using key-value pairs to store data isn’t new, but with the rapid development of cloud solutions like Azure and the hype around NoSQL databases, using key-value pairs to store data got a big boost. Especially developers (in my experience) love using key-value pair to store their data, because it’s easy for them to consume the data in an application. But it gives the database professional an extra challenge because we’re used to retrieve columns with values instead of a record per value. So how can we turn those key-value pairs into rows?

This is a good example of using PIVOT.  I’m not a big fan of storing data in key-value pairs and using pivoting operators because you’re burning CPU on that very expensive SQL Server instance (and you’re not taking advantage of what relational databases do well); if you really need to store data as key-value, I’d recommend doing the pivot in cheaper application servers.

Comments closed