Press "Enter" to skip to content

Category: Data Types

Uniqueidentifier Ordering in SQL Server

Jose Manuel Jurado Diaz clears up the mystery:

Today, I worked on a service request that our customer asked about how SQL Server sorts the uniqueidentifier data type. We know that uniqueidentifier store globally unique identifiers (GUIDs). GUIDs are widely used for unique keys due to their extremely low probability of duplication. One common method to generate a GUID in SQL Server is by using the NEWID() function. However, the ordering of GUIDs, especially those generated by NEWID(), can appear non-intuitive. I would like to share my lessons learned how to determine the ordering method using uniqueidentifier and NEWID().

Yeah, things get pretty weird because x86-x64 processors (Intel and AMD both) are Little Endian, and only the fourth chunk of a GUID is Big Endian.

Comments closed

Fun with Implicit Conversion and Table Partitioning

Rod Edwards takes us through an issue:

CONVERT_IMPLICIT(nvarchar(200),[xxxxxxxxxxxxxxxxxxx].[Category,0)=[@Category] AND CONVERT_IMPLICIT(nvarchar(200),[xxxxxxxxxxxxxxxxxxx].[Id],0)=[@Id]

Oh dear, what’s all that then, we have a fat Residual Predicate, where SQL is performing an Implicit conversion on our query predicates.

Time to look at our datatypes.

This is one of several reasons why I espouse the philosophy of NVARCHAR Everywhere. You can’t have implicit conversion if you’re always using NVARCHAR over VARCHAR.

Comments closed

Data Compression and Data Type Changes

Bob Pusateri asks the important questions:

A few different times I have been asked one or more forms of the following question:

Can datatypes be changed faster with data compression enabled?

I’ve always replied that I’m pretty sure compression will help in this situation, because based on my understanding, it should. But I’ve never had any actual data to back up this belief. Until now. I recently set up a demonstration to test this, and I’m very happy to share the results.

If you want to see the results, you’re going to have to read Bob’s article.

Comments closed

SHA_256 Hashes and Data Type

Reitse Eskens hit an interesting issue:

The issue is quite simple. A text needs to be converted into a SHA2_256 hash for some authentication reasons. The example shown here is simplified. The thing is, the outcome of the hash isn’t accepted by the authorising party and when the input is checked via an online MD5 hashing site, there’s a difference between that output and that from the SQL Script.

Read on to see what the problem is and how it can affect you.

Comments closed

Inferring Data Types in SQL Server Tables

Emanuele Meazzo tidies up the database:

Do you know the feeling when someone gives you a flat file to import, without any schema information? You have to dump it in your database as-is, then spend precious time figuring out the data type for each column
Let me help you with that, as I’m helping myself too.

Click through for the scenario, as well as a link to the script Emanuele has created. It looks like the script doesn’t try to estimate string lengths, instead focusing on types alone.

Comments closed

Implicit Conversion Works in Both Directions

Deborah Melkin shares a lesson:

I’ve been working on a project where I have to do a lot of interesting data manipulation. Unfortunately, dealing with implicit conversions has been part of the norm. So naturally, I managed to run across an interesting scenario that had me stumped, thinking that I ran into a strange edge case.

The tl;dr version is that this wasn’t an edge case or some undocumented issue or a bug or anything other than the db engine doing its job. I was looking at one scenario and missed the rest of the clues. But it’s worth sharing how these things are easily missed.

Click through for the full story.

Comments closed

Contrasting Three Unique Identifiers in Postgres

Laetitia Avrot shares some advice:

This month’s PGSQLPhriday event is about UUId thanks to my post calling for a fight debate on the topic.

I will answer a question a friend developer asked me: “What is the best when we need a primary key? UUID, CUID, or TSID?”

I hadn’t heard of two of these, but Laetitia provides some links to learn more about them and then offers up some advice on whether to use any of them. And the advice sounds a lot like the advice for SQL Server.

Comments closed

VARCHAR() in Microsoft Fabric Lakehouses and SQL Endpoints

Gerhard Brueckl models some data:

Defining data types and knowing the schema of your data has always been a crucial factor for performant data platforms, especially when it comes to string datatypes which can potentially consume a lot of space and memory. For Lakehouses in general (not only Fabric Lakehouses), there is usually only one data type for text data which is a generic STRING of an arbitrary length. In terms of Apache Spark, this is StringType(). While this applies to Spark dataframes, this is not entirely true for Spark tables – here is what the docs say:

Read through for more information on that, as well as how to define a table in a Microsoft Fabric lakehouse using VARCHAR(). The display is a little weird, but Greg Low explains why in the comments.

Comments closed

Data Types and Stored Procedures

Erik Darling plays the roles of both Goofus and Gallant here:

All sorts of bad things happen when you do this. You can’t index for this in any meaningful way, and comparing non-string data types (numbers, dates, etc.) with a double wildcard string means implicit conversion hell.

You don’t want to do this. Ever.

Unless you want to hire me.

Click through for good advice on the proper use of data types and input parameters.

Comments closed

GPS Data in PostGIS

Ryan Lambert clues us in:

One of the key elements to using PostGIS is having spatial data to work with! Lucky for us, one big difference today compared to the not-so-distant past is that essentially everyone is carrying a GPS unit with them nearly everywhere. This makes it easy to create your own GPS data that you can then load into PostGIS! This post explores some basics of loading GPS data to PostGIS and cleaning it for use. It turns out, GPS data fr om nearly any GPS-enabled device comes with some… character. Getting from the raw input to usable spatial data takes a bit of effort.

This post starts with using ogr2ogr to load the .gpx data to PostGIS. Once the data is in PostGIS then we actually want to do something with it. Before the data is completely usable, we should spend some time cleaning the data first. Technically you can start querying the data right away, however, I have found there is always data cleanup and processing involved first to make the data truly useful.

Click through for an example of how it all fits together.

Comments closed