Press "Enter" to skip to content

Category: T-SQL Tuesday

T-SQL Tuesday 137 Round-Up

Steve Jones wraps up the latest T-SQL Tuesday:

I hosted the blog party this month, with the invite to write about notebooks. These are a neat technology, and I’ve written about them at SQLServerCentral.

This post is a wrap-up of the various responses to my invitation. First, quite a few people give credit to either Aaron Nelson or Rob Sewell for their writings and work with notebooks, so check out their blogs.

Click through for the list of respondents.

Leave a Comment

Lessons from using Notebooks

Glenn Berry takes us through some of the past (and sometimes present) challenges of running notebooks in Azure Data Studio:

I have to admit that I do not use Jupyter notebooks or Azure Data Studio (ADS) everyday. Last August, I made separate Jupyter notebook versions of my SQL Server Diagnostic Information Queries. There was a separate version for SQL Server 2012 through SQL Server 2019, along with one for Azure SQL Database. This was after a number of requests from people in the community.

Creating these notebooks was a pretty decent amount of work. Luckily, this was right around the time that Azure Data Studio was making it much easier to edit and format markdown for the text blocks. Since then, Azure Data Studio is even easier to use for editing and formatting. Even more fortuitous was the fact that Julie Koesmarno (@MsSQLGirl) volunteered to greatly improve my formatting!

Unfortunately, there has not been as much interest in my Jupyter notebooks as I hoped for. There are probably a number of reasons for this.

Read on for Glenn’s notes.

Comments closed

Running Jupyter Notebooks from Powershell

Rob Farley has a change of heart:

The concept is that if I have a notebook with a bunch of queries in it, I can easily call that using Invoke-SqlNotebook, and get the results of the queries to be stored in an easily-viewable file. But I can also just call Invoke-SqlCmd and get the results stored. Or I can create an RDL to create something that will email me based on a subscription. And I wasn’t sure I needed another method for running something.

Read on to see what changed Rob’s mind.

Comments closed

T-SQL Tuesday 136 Wrap-Up

Brent Ozar rounds up the usual suspects, plus several more:

For this month’s T-SQL Tuesday, I asked you to blog about your most-loved and least-loved data types.

Crazy, right? How could people possibly love or hate data types? Well, if you’ve been working with them for a while, I figured you’d have built up an array of tips or pain points, and y’all delivered with 29 interesting blog posts.

Click through for a lengthy list of interesting posts.

Comments closed

Disliking User-Defined Data Types

Andy Levy has a bone to pick:

Here’s the thing – these types are really just aliases for native types in SQL Server, but more constrained. Constrain yourself to UDTs and you’ll have trouble right-sizing your fields. Let’s say you’ve got three data types for text data:

– myShortString (varchar(10))
– myString (varchar(256))
– myBigString (varchar(8000))

These lengths are not helping anyone. You can’t store email addresses or names in myShortString. But myString is probably way too much for that data. You’re going to waste memory because of how SQL Server estimates memory grants and your indexes will be bloated. Maybe you just need to create more UDTs to cover these situations. But that just compounds the other problems, doesn’t it?

Pushes glasses up the bridge of his nose. Teeeeechnically, an e-mail address may be up to 256 characters long, including a username of up to 64 characters (and maybe two angle brackets, depending on the host). So myString would actually be perfect. Steve Jones has a comment about 300, but that was probably the original standard of 320. Regardless, I realize how far beside the point that is, and Andy’s point is a good one, as well as the several others he makes in the post.

One quick note on defined types: they really do make a lot of sense in a domain-driven design, especially when working with functional programming languages. Defining a CustomerID as an int is fine, but if I know my customer IDs are natural numbers (1, 2, 3, …), 9 digits long, and do not contain the sequence 2345 (because my company considers this an unlucky number sequence), creating a CustomerID type which provides this sort of type checking is great because you keep the rules as close to the data as possible and ensure consistency. It’s also more restrictive than int, so you can cast back down to an int when you’re ready to interact with some remote system. So short answer, do this all day in F#, but not in the database.

Comments closed

In Defense of Float

Hugo Kornelis levies a defense of floating point data types:

Let’s return to the database. Let’s figure out a way to store these numbers appropriately.

Could we use decimal (or its synonym numeric)? Well, yes. We can. We need 25 digits after the decimal place for 0.0000000000000000000004052, and 9 digits before the decimal place for 299900000, so that would fit in a decimal(34,25). But if you try to compute c2 so you can then multiply that to the m, you’ll run into an overflow error.

Hugo does a good job of defending the float data type.

Comments closed

Two Disliked Data Types

Aaron Bertrand has two bones to pick:

I am not often one to do the bare minimum, unless it comes to looking for things around the house. After about 22 seconds I throw my hands in the air and exclaim, “I can’t find it!” As my wife so kindly added: It usually turns out to be in a spot I already looked.

But when it revolves around SQL Server and opinions, I’m all over it. So I’m not going to talk about a data type today; I’m going to talk about two of them. One of them Brent already mentioned in his invite:

I kinda-sorta disagree with Aaron’s second choice. By that I mean that I fully agree with his premise: use UTC everywhere. But if you don’t use UTC everywhere, then use DATETIMEOFFSET everywhere and apply the time zones.

Comments closed

The Problems with VARCHAR(MAX)

Deepthi Goguri has a complaint about VARCHAR(MAX):

Though VARCHAR(MAX) is suitable in situations with large strings of data, it has its own complications that we need to consider. In my career as a DBA, I at least saw couple of times SQL developers using VARCHAR(MAX) when they should not. Fixing the datatypes once in production is painful and causes risks.

I think Deepthi’s advice is sound: use VARCHAR(MAX) when necessary but not as a starting point. I don’t shy away from VARCHAR(MAX) on principle (except with columnstore indexes—get that noise right out of here) and I don’t think you should either, as long as you understand the ramifications.

Comments closed

Fragmentation and GUIDs

Chad Callihan doesn’t llike the UNIQUEIDENTIFIER data type:

My mind quickly went to the uniqueidentifier (GUID) data type. It may not be fair but I think of it as my least favorite. The reasoning is more of a pet peeve. Most of the time there’s nothing wrong with the uniqueidentifier data type; however, it makes me cringe if it is the clustering key on a table when an INT would do just fine because it ends up wasting disk space.

On this topic, Jeff Moden had a really great presentation for our SQL Server user group. He has a rather contrarian take and interesting findings on fragmentation in practice. It’s a lengthy and advanced talk, but definitely worth the nearly 2 1/2 hours.

Comments closed

Against VARBINARY(MAX)

Travis Page wants you to think twice before using VARBINARY(MAX):

Brent Ozar put out a call for blogging about your data type of choice. This might be a favorite (or least favorite) datatype. Naturally, varchar(max) and nvarchar(max) are going to have their punishment, deservedly. The datatype I’m not so sure is going to be getting the bludgeoning it deserves is varbinary(max). Sure, there are valid uses for the data type, but having the potential for just shy of 2GB of binary LOB data stored in a database has some negative potentials. Let’s take a look at some of these pitfalls.

Click through for a good reckoning of the downsides. In terms of upside, two good ones that I actively use are:

  • VARBINARY(MAX), when combined with the COMPRESS() function, can efficiently store a lot of text data, as that text data gets gzipped. This works best if you just need to store data but not search through it. Then, DECOMPRESS() when it’s time for that data to show up in an app somewhere.
  • SQL Server Machine Learning Services models are stored in VARBINARY(MAX).

Comments closed