Press "Enter" to skip to content

Category: Data Types

Reading Pipe-Delimited Files in SSIS

Greg Low does a bit of parsing:

There was a question on the Q&A forums today, asking how to read data using SSIS, when it’s in this format:

|Col1| |Col2|Col3|Col|
|101| |A|21|DC|

One of the concerns was that there was a leading pipe. This is not a problem. When you have data like that, and you set | as the delimiter, because there are 6 delimiters, then there are 7 columns output. 

Read on for Greg’s answer. This is a big part of why I recommend people to use Parquet (or ORC, if you’re an iconoclast like I am): not needing to deal with whatever silly, arbitrary file formats people come up with.

Leave a Comment

Removing Leading Zeroes from a String in T-SQL

Steve Stedman gets rid of leading zeroes:

When working with data in SQL Server, there may be times when you need to remove leading zeros from a string. This task can be particularly common when dealing with numerical data stored as strings, such as ZIP codes, product codes, or other formatted numbers. In this blog post, we’ll explore several methods to remove leading zeros in SQL Server.

I’m not sure I see the reason to use anything other than CAST() (or, better yet, TRY_CAST()), but Steve does show two other methods.

2 Comments

Bit Column Order and Data Length

Brent Ozar performs an experiment:

At the PGConf.dev, where Postgres developers get together and strategize the work they wanna do for the next version, I attended a session where Matthias van de Meent talked about changing the way Postgres stores columns. As of right now (Postgres 17), columns are aligned in 8-bit intervals, so if you create a table with alternating columns:

Read on to see an example, and then Brent performs a test to see how SQL Server handles this scenario. The comments also mention that at least older versions of Oracle behaved like Postgres.

Comments closed

Working with XML in SQL Server

Ed Pollack talks XML:

XML is a common storage format for data, metadata, parameters, or other semi-structured data. Because of this, it often finds its way into SQL Server databases and needs to be managed alongside other data types.

Even though a relational database is not the optimal place to store and manage XML data, it is often needed due to application requirements, convenience, or a need to maintain this information in close proximity to other app data.

This article dives into a variety of common XML challenges and the functionality included in SQL Server to help make managing them as simple as possible.

Ed does a good job of walking through what you can do. My general philosophy on XML and JSON in the database is simple: if you simply want a place to store some JSON or XML outputs and retrieve the results exactly as they are without performing any searches or transformations, write as JSON/XML. If you want to use the database to search through JSON/XML records for particular attributes and values, or if you want to reshape the JSON/XML data within the database, create a proper data model for this input.

Comments closed

Uniqueidentifier Ordering in SQL Server

Jose Manuel Jurado Diaz clears up the mystery:

Today, I worked on a service request that our customer asked about how SQL Server sorts the uniqueidentifier data type. We know that uniqueidentifier store globally unique identifiers (GUIDs). GUIDs are widely used for unique keys due to their extremely low probability of duplication. One common method to generate a GUID in SQL Server is by using the NEWID() function. However, the ordering of GUIDs, especially those generated by NEWID(), can appear non-intuitive. I would like to share my lessons learned how to determine the ordering method using uniqueidentifier and NEWID().

Yeah, things get pretty weird because x86-x64 processors (Intel and AMD both) are Little Endian, and only the fourth chunk of a GUID is Big Endian.

Comments closed

Fun with Implicit Conversion and Table Partitioning

Rod Edwards takes us through an issue:

CONVERT_IMPLICIT(nvarchar(200),[xxxxxxxxxxxxxxxxxxx].[Category,0)=[@Category] AND CONVERT_IMPLICIT(nvarchar(200),[xxxxxxxxxxxxxxxxxxx].[Id],0)=[@Id]

Oh dear, what’s all that then, we have a fat Residual Predicate, where SQL is performing an Implicit conversion on our query predicates.

Time to look at our datatypes.

This is one of several reasons why I espouse the philosophy of NVARCHAR Everywhere. You can’t have implicit conversion if you’re always using NVARCHAR over VARCHAR.

Comments closed

Data Compression and Data Type Changes

Bob Pusateri asks the important questions:

A few different times I have been asked one or more forms of the following question:

Can datatypes be changed faster with data compression enabled?

I’ve always replied that I’m pretty sure compression will help in this situation, because based on my understanding, it should. But I’ve never had any actual data to back up this belief. Until now. I recently set up a demonstration to test this, and I’m very happy to share the results.

If you want to see the results, you’re going to have to read Bob’s article.

Comments closed

SHA_256 Hashes and Data Type

Reitse Eskens hit an interesting issue:

The issue is quite simple. A text needs to be converted into a SHA2_256 hash for some authentication reasons. The example shown here is simplified. The thing is, the outcome of the hash isn’t accepted by the authorising party and when the input is checked via an online MD5 hashing site, there’s a difference between that output and that from the SQL Script.

Read on to see what the problem is and how it can affect you.

Comments closed

Inferring Data Types in SQL Server Tables

Emanuele Meazzo tidies up the database:

Do you know the feeling when someone gives you a flat file to import, without any schema information? You have to dump it in your database as-is, then spend precious time figuring out the data type for each column
Let me help you with that, as I’m helping myself too.

Click through for the scenario, as well as a link to the script Emanuele has created. It looks like the script doesn’t try to estimate string lengths, instead focusing on types alone.

Comments closed

Implicit Conversion Works in Both Directions

Deborah Melkin shares a lesson:

I’ve been working on a project where I have to do a lot of interesting data manipulation. Unfortunately, dealing with implicit conversions has been part of the norm. So naturally, I managed to run across an interesting scenario that had me stumped, thinking that I ran into a strange edge case.

The tl;dr version is that this wasn’t an edge case or some undocumented issue or a bug or anything other than the db engine doing its job. I was looking at one scenario and missed the rest of the clues. But it’s worth sharing how these things are easily missed.

Click through for the full story.

Comments closed