Press "Enter" to skip to content

Category: Data Types

SHA_256 Hashes and Data Type

Reitse Eskens hit an interesting issue:

The issue is quite simple. A text needs to be converted into a SHA2_256 hash for some authentication reasons. The example shown here is simplified. The thing is, the outcome of the hash isn’t accepted by the authorising party and when the input is checked via an online MD5 hashing site, there’s a difference between that output and that from the SQL Script.

Read on to see what the problem is and how it can affect you.

Leave a Comment

Inferring Data Types in SQL Server Tables

Emanuele Meazzo tidies up the database:

Do you know the feeling when someone gives you a flat file to import, without any schema information? You have to dump it in your database as-is, then spend precious time figuring out the data type for each column
Let me help you with that, as I’m helping myself too.

Click through for the scenario, as well as a link to the script Emanuele has created. It looks like the script doesn’t try to estimate string lengths, instead focusing on types alone.

Leave a Comment

Implicit Conversion Works in Both Directions

Deborah Melkin shares a lesson:

I’ve been working on a project where I have to do a lot of interesting data manipulation. Unfortunately, dealing with implicit conversions has been part of the norm. So naturally, I managed to run across an interesting scenario that had me stumped, thinking that I ran into a strange edge case.

The tl;dr version is that this wasn’t an edge case or some undocumented issue or a bug or anything other than the db engine doing its job. I was looking at one scenario and missed the rest of the clues. But it’s worth sharing how these things are easily missed.

Click through for the full story.

Comments closed

Contrasting Three Unique Identifiers in Postgres

Laetitia Avrot shares some advice:

This month’s PGSQLPhriday event is about UUId thanks to my post calling for a fight debate on the topic.

I will answer a question a friend developer asked me: “What is the best when we need a primary key? UUID, CUID, or TSID?”

I hadn’t heard of two of these, but Laetitia provides some links to learn more about them and then offers up some advice on whether to use any of them. And the advice sounds a lot like the advice for SQL Server.

Comments closed

VARCHAR() in Microsoft Fabric Lakehouses and SQL Endpoints

Gerhard Brueckl models some data:

Defining data types and knowing the schema of your data has always been a crucial factor for performant data platforms, especially when it comes to string datatypes which can potentially consume a lot of space and memory. For Lakehouses in general (not only Fabric Lakehouses), there is usually only one data type for text data which is a generic STRING of an arbitrary length. In terms of Apache Spark, this is StringType(). While this applies to Spark dataframes, this is not entirely true for Spark tables – here is what the docs say:

Read through for more information on that, as well as how to define a table in a Microsoft Fabric lakehouse using VARCHAR(). The display is a little weird, but Greg Low explains why in the comments.

Comments closed

Data Types and Stored Procedures

Erik Darling plays the roles of both Goofus and Gallant here:

All sorts of bad things happen when you do this. You can’t index for this in any meaningful way, and comparing non-string data types (numbers, dates, etc.) with a double wildcard string means implicit conversion hell.

You don’t want to do this. Ever.

Unless you want to hire me.

Click through for good advice on the proper use of data types and input parameters.

Comments closed

GPS Data in PostGIS

Ryan Lambert clues us in:

One of the key elements to using PostGIS is having spatial data to work with! Lucky for us, one big difference today compared to the not-so-distant past is that essentially everyone is carrying a GPS unit with them nearly everywhere. This makes it easy to create your own GPS data that you can then load into PostGIS! This post explores some basics of loading GPS data to PostGIS and cleaning it for use. It turns out, GPS data fr om nearly any GPS-enabled device comes with some… character. Getting from the raw input to usable spatial data takes a bit of effort.

This post starts with using ogr2ogr to load the .gpx data to PostGIS. Once the data is in PostGIS then we actually want to do something with it. Before the data is completely usable, we should spend some time cleaning the data first. Technically you can start querying the data right away, however, I have found there is always data cleanup and processing involved first to make the data truly useful.

Click through for an example of how it all fits together.

Comments closed

A Look at UTF-8 in SQL Server

Daniel Hutmacher takes a peek:

A client asked me about SQL Server collations, and if they should consider the new UTF8 collations (new since SQL Server 2019). I tried to hide my blank stare of ignorance, and promised them I’d look it up and get back to them.

Not gonna lie, I think UTF and Unicode can be pretty confusing at times, so I did some googling and some testing, and here’s what I found.

Read on to see what Daniel found, and check the comments for one person’s unfortunate experience. For a deep dive into UTF-8 in SQL Server, I also recommend Solomon Rutzky’s blog post on the topic, noting that it was last updated in 2020 so some of the incompatibility section may not be relevant any longer.

Comments closed

Fast Key Optimization in SQL Server

Paul White explains the performance benefits of fast key optimization and when you can get them:

SQL Server can be called upon to sort a variety of data types. To facilitate this, the sorting code normally calls out to a specific comparator to determine how two compared values should sort: lower, higher, or equal.

Although calling comparator code has low overhead, performing enough of them can cause noticeable performance differences.

To address this, SQL Server has always (since at least version 7) supported a fast key optimization for simple data types. 

Click through to learn which data types support fast key optimization and to gain a feeling of the performance impact.

Comments closed