Press "Enter" to skip to content

Category: Data Types

Power BI Data Type Optimization

Nikola Ilic shows how important it can be to choose the right data types:

For demo purposes, I’ll be using a fact table that contains the data about chats performed by a customer support department of the fictitious company Customer First. This table includes approximately 9 million rows, which is not considered a large table in the context of Power BI and analytical workloads. For the sake of simplicity, let’s pretend that our model consists of only this single table. Finally, a semantic model is configured as an Import mode model. If you want to learn how your data is stored in Power BI, I suggest you start by reading this article first.

Data was loaded into Power BI from the underlying data source (SQL Server database) as-is, without any additional optimizations applied.

Nikola walks through the process of finding the most expensive columns in terms of data size and using the least precise acceptable value. One other thing that I commonly see is identity columns or other keys on fact tables. Those are very rarely necessary, because the point of a fact table is typically to aggregate it in some fashion. And these keys are unique (by design), meaning they won’t compress very well and will take up a lot of space. Looking at Nikola’s example, my next question would be, knowing that the name of the table is factChat, does chatID tie to some chat dimension? If not, is it actually necessary for reporting? Again, if not, that could shave off another 60 MB or so from the data model.

Comments closed

Changing the Data Type of a PostgreSQL Column Used in a View

Andrea Gnemmi makes a change:

We have all encountered the need to change a column data type in a table. Unfortunately, in PostgreSQL, you cannot change it in a single step if the column is used in a view. In this article, we cover the steps to change the data type for a column in a PostgreSQL table.

Read on for those steps. This is a bit more complex than SQL Server, at least for non-materialized views. In that case, the view is little more than a saved T-SQL query, so you would want to run sp_refresh [view_schema].[view_name] to ensure that the view has the latest metadata and doesn’t fail when you run the query.

Comments closed

Avoid Mixing DATETIME with other Date Types

Paul White shares some advice:

Microsoft encourages us not to use the datetime data type: 

Avoid using datetime for new work. Instead, use the time, date, datetime2, and datetimeoffset data types. These types align with the SQL Standard, and are more portable. time, datetime2 and datetimeoffset provide more seconds precision. datetimeoffset provides time zone support for globally deployed applications.

Well, ok. Sensible and well-informed people might still choose to use datetime for performance reasons. Common date and time functions have optimised implementations in the SQL Server expression service for the datetime and smalldatetime data types.

Paul has posted the full article on X.

Comments closed

The Internals of DATETIME2

Chad Baldwin digs in:

I noticed in sys.column_store_segments the min_data_id and max_data_id columns store very large bigint values in the segments for datetime2 columns. After doing a bit more googling and tinkering, I found for bit/tinyint/smallint/int/bigint it stores the min/max of the actual values rather than dictionary lookup values. So I assume it’s likely doing the same for date/time/datetime/datetime2 and storing some sort of bigint representation of the actual value.

This post is going to focus on datetime2(7) datatypes mainly because that’s what I was dealing with. Though I’m sure it wouldn’t be much work to figure out the other types.

Click through to learn more about the datatype and see how this wraps into a discussion of temporal table cleanup and columnstore indexes.

Comments closed

Reading Pipe-Delimited Files in SSIS

Greg Low does a bit of parsing:

There was a question on the Q&A forums today, asking how to read data using SSIS, when it’s in this format:

|Col1| |Col2|Col3|Col|
|101| |A|21|DC|

One of the concerns was that there was a leading pipe. This is not a problem. When you have data like that, and you set | as the delimiter, because there are 6 delimiters, then there are 7 columns output. 

Read on for Greg’s answer. This is a big part of why I recommend people to use Parquet (or ORC, if you’re an iconoclast like I am): not needing to deal with whatever silly, arbitrary file formats people come up with.

Comments closed

Removing Leading Zeroes from a String in T-SQL

Steve Stedman gets rid of leading zeroes:

When working with data in SQL Server, there may be times when you need to remove leading zeros from a string. This task can be particularly common when dealing with numerical data stored as strings, such as ZIP codes, product codes, or other formatted numbers. In this blog post, we’ll explore several methods to remove leading zeros in SQL Server.

I’m not sure I see the reason to use anything other than CAST() (or, better yet, TRY_CAST()), but Steve does show two other methods.

2 Comments

Bit Column Order and Data Length

Brent Ozar performs an experiment:

At the PGConf.dev, where Postgres developers get together and strategize the work they wanna do for the next version, I attended a session where Matthias van de Meent talked about changing the way Postgres stores columns. As of right now (Postgres 17), columns are aligned in 8-bit intervals, so if you create a table with alternating columns:

Read on to see an example, and then Brent performs a test to see how SQL Server handles this scenario. The comments also mention that at least older versions of Oracle behaved like Postgres.

Comments closed

Working with XML in SQL Server

Ed Pollack talks XML:

XML is a common storage format for data, metadata, parameters, or other semi-structured data. Because of this, it often finds its way into SQL Server databases and needs to be managed alongside other data types.

Even though a relational database is not the optimal place to store and manage XML data, it is often needed due to application requirements, convenience, or a need to maintain this information in close proximity to other app data.

This article dives into a variety of common XML challenges and the functionality included in SQL Server to help make managing them as simple as possible.

Ed does a good job of walking through what you can do. My general philosophy on XML and JSON in the database is simple: if you simply want a place to store some JSON or XML outputs and retrieve the results exactly as they are without performing any searches or transformations, write as JSON/XML. If you want to use the database to search through JSON/XML records for particular attributes and values, or if you want to reshape the JSON/XML data within the database, create a proper data model for this input.

Comments closed

Uniqueidentifier Ordering in SQL Server

Jose Manuel Jurado Diaz clears up the mystery:

Today, I worked on a service request that our customer asked about how SQL Server sorts the uniqueidentifier data type. We know that uniqueidentifier store globally unique identifiers (GUIDs). GUIDs are widely used for unique keys due to their extremely low probability of duplication. One common method to generate a GUID in SQL Server is by using the NEWID() function. However, the ordering of GUIDs, especially those generated by NEWID(), can appear non-intuitive. I would like to share my lessons learned how to determine the ordering method using uniqueidentifier and NEWID().

Yeah, things get pretty weird because x86-x64 processors (Intel and AMD both) are Little Endian, and only the fourth chunk of a GUID is Big Endian.

Comments closed

Fun with Implicit Conversion and Table Partitioning

Rod Edwards takes us through an issue:

CONVERT_IMPLICIT(nvarchar(200),[xxxxxxxxxxxxxxxxxxx].[Category,0)=[@Category] AND CONVERT_IMPLICIT(nvarchar(200),[xxxxxxxxxxxxxxxxxxx].[Id],0)=[@Id]

Oh dear, what’s all that then, we have a fat Residual Predicate, where SQL is performing an Implicit conversion on our query predicates.

Time to look at our datatypes.

This is one of several reasons why I espouse the philosophy of NVARCHAR Everywhere. You can’t have implicit conversion if you’re always using NVARCHAR over VARCHAR.

Comments closed