Press "Enter" to skip to content

Category: T-SQL

T-SQL And R Performance Comparisons

Tomaz Kastrun does several performance comparisons between various R packages and T-SQL constructs:

Couple of packages I will mention for data manipulations are plyr, dplyr and data.table and compare the execution time, simplicity and ease of writing with general T-SQL code and RevoScaleR package. For this blog post I will use R packagedplyr and T-SQL with possibilites of RevoScaleR computation functions.

My initial query will be. Available in WideWorldImportersDW database. No other alterations have been done to underlying tables (fact.sale or dimension.city).

Read on for code and conclusions.  I don’t think there are any shocking conclusions:  the upshot is to filter data as early as possible.

Comments closed

ISNULL And COALESCE Behavior Difference

Vladimir Oselsky notes an edge case where ISNULL and COALESCE can behave differently:

Even though we would expect to see both records returned we only get 1 record. Huh? This is exactly what puzzled a coworker, ofcourse query was not as simple as this one but same issue caused him to hit a road block.

In the case of COALESCE and OR methods, results are identical.

The underlying issue here is that the variable data type differs from the column’s data type, and exposes a difference in how COALESCE and ISNULL work.

Comments closed

Rounding

Kenneth Fisher looks at decimal truncation methods:

ROUND

This is the most complicated of the three. It does a standard rounding. If value is .5 or over then you get back 1. If it’s less than .5 you get back 0. On top of that you get to pass the place you want to round to. So for example 0 rounds to the nearest ones place, -1 rounds to the tens place, 2 rounds to the hundredths.

There’s a bonus here:  if you use ROUND in T-SQL, the results are different than the default Round method in .NET; that method uses banker’s rounding by default whereas the T-SQL rounding does not.

Comments closed

Returning Defult Rows

Christopher Huntley wants to return a default record when there are no results:

Or if you’re ready to take it to the ╰[ ⁰﹏⁰ ]╯level then change the column to NChar and use the hex of your favorite emoji like:

DECLARE @testtable1 TABLE (
testid int identity (1,1),
testvalue  nchar (255))

–use the below for the final query

SELECT
ISNULL((SELECT testvalue from @testtable1 where testvalue > 101),  NCHAR(0xD83D)+ NCHAR(0xDE20) ) as testvaluethatworks

There are a few other alternatives, such as loading results into a temp table and inserting a default row if the temp table is empty.

Comments closed

Shred That XML

Steve Jones has an intro-level post on shredding an extended event to get to the relevant portion:

I was playing with some Extended Events recently. If you haven’t tried, I’d encourage you to do so. However, working with XML is not my favorite. I know I can get the GUI in SSMS 16.x to show me events, but I sometimes want to query.

Here was my quick adventure in XML and XQUERY. I should know this stuff better, but I think I’m working with XML so rarely that I’m constantly re-learning things.

Read on for the code.

Comments closed

Don’t Use Double Dot

Chris Bell warns against using double dot syntax:

I am finding more and more cases where SQL code is being created using the double dot or period for the 2 part naming convention.

For example, instead of using dbo.table1 I am seeing ..table1.

I don’t know who suggested this in the first place, but it is not a good idea. Sure it works and does what you expect, but there is a HUGE risk with doing this. When you use the .. syntax, you are telling the code to use whatever the default schema is for the user that is running the query. By default that is the dbo schema, but there is no guarantee that all systems are going to be that way.

Read on to understand why this is a big deal.

Comments closed

One-Sample T Tests

Mala Mahadevan shows how to perform one-sample T Tests:

For this post I decided to go with a simple example of how many steps I walked with my per day for the month of August. My goal is 10,000 steps per day – that has been my average over the year but is this true of the data I gathered in August? I have a simple table with two columns – day and steps. Each record has how many steps I took in August per day, for 30 days. So – SELECT AVG(steps) FROM [dbo].[mala-steps] gives me 8262 as my average number of steps per day in August. I want to know if am consistently under performing my goal, or if this is a result of my being less active in August alone. Let me state my problem first – or state what is called ‘null hypothesis’:

I walk 10,000 steps on an average per year. 

Read on for T test operations in T-SQL (although not all operations are available) and R.

Comments closed

Ordering In Views

Kenneth Fisher explains why you shouldn’t order in views:

For many years it’s been a best practice to never put an ORDER BY in a view. The idea is that a view shouldn’t have an inherent order. Just like any other query. If you want the data from a view ordered then you query the view with an ORDER BY clause. In fact if you put an ORDER BY in a view you’ll get an error:

Msg 1033, Level 15, State 1, Procedure MyView, Line 4 [Batch Start Line 2]
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

I knew about the TOP 100 PERCENT bit, but had no idea that order was outright ignored.  Read the comments for additional information.

Comments closed

Chi Square Tests

Mala Mahadevan discusses how to perform a Chi Square test:

For any dataset to lend itself to the Chi Square test it has to fit the following conditions  –

1 Both  variables are categorical (in this case – exposure to smoking – yes/no, and health condition – sick/not sick are both categorical).
2 Researchers used a random sample to collect data.
3 Researchers had an adequate sample size.Generally the sample size should be at least 100.
4 The number of respondents in each cell should be at least 5.

This is an easy case for using R over T-SQL—the Chi Square test is built in, whereas you have to roll your own T-SQL code.  Mala does show you how to do this from within SQL Server R Services as well.

Comments closed