Press "Enter" to skip to content

Category: Dates and Numbers

Converting SQL Audit FileTime to DateTime Format

Patrick Keisler helps a customer:

One of my customers recently wanted to rename each of the SQL audit files will the datetime stamp of when it was created. I explained to them the filename already contains a datetime stamp. While it does not look like a typical timestamp, it is based on the Windows Filetime data structure that is a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (UTC). Nonetheless, they still wanted a traditional datetime stamp in the file name.

Read on to see how. I can understand the displeasure in adding redundancy to a filename, though I also understand the reasoning from the customer’s point of view: FileTime isn’t human-readable in any meaningful way.

Leave a Comment

Using Week-Based Calendars in Power BI

Marco Russo and Alberto Ferrari work in weeks:

Weekly calendars are common in manufacturing, retail, and any business that is sensitive to weekends or to the number of working days. For example, the scenario described in this article uses the number of pageviews on a website from 2019 to 2024, with data available until September 3, 2024. The website analyzed has a clear weekly trend, with slower traffic over the weekend, as shown in the following line chart with a daily granularity. It seems like a business website. A sports website would probably display the opposite trend.

Read on to see some of the challenges around week-based calendars. There’s a reason I have a “Dates and Numbers” category on Curated SQL and it’s exactly for things like this: some of the most common things we as humans work with are extremely complex and fraught with exceptions, including calendars.

Comments closed

Power BI Data Type Optimization

Nikola Ilic shows how important it can be to choose the right data types:

For demo purposes, I’ll be using a fact table that contains the data about chats performed by a customer support department of the fictitious company Customer First. This table includes approximately 9 million rows, which is not considered a large table in the context of Power BI and analytical workloads. For the sake of simplicity, let’s pretend that our model consists of only this single table. Finally, a semantic model is configured as an Import mode model. If you want to learn how your data is stored in Power BI, I suggest you start by reading this article first.

Data was loaded into Power BI from the underlying data source (SQL Server database) as-is, without any additional optimizations applied.

Nikola walks through the process of finding the most expensive columns in terms of data size and using the least precise acceptable value. One other thing that I commonly see is identity columns or other keys on fact tables. Those are very rarely necessary, because the point of a fact table is typically to aggregate it in some fashion. And these keys are unique (by design), meaning they won’t compress very well and will take up a lot of space. Looking at Nikola’s example, my next question would be, knowing that the name of the table is factChat, does chatID tie to some chat dimension? If not, is it actually necessary for reporting? Again, if not, that could shave off another 60 MB or so from the data model.

Comments closed

Avoid Mixing DATETIME with other Date Types

Paul White shares some advice:

Microsoft encourages us not to use the datetime data type: 

Avoid using datetime for new work. Instead, use the time, date, datetime2, and datetimeoffset data types. These types align with the SQL Standard, and are more portable. time, datetime2 and datetimeoffset provide more seconds precision. datetimeoffset provides time zone support for globally deployed applications.

Well, ok. Sensible and well-informed people might still choose to use datetime for performance reasons. Common date and time functions have optimised implementations in the SQL Server expression service for the datetime and smalldatetime data types.

Paul has posted the full article on X.

Comments closed

Rounding Options in T-SQL

Rick Dobson talks rounding:

Please compare the SQL Server round function to banker’s rounding in T-SQL for converting decimal values to integer values. I seek a framework for assessing how closely banker’s rounding results versus SQL Server Round function results match the underlying decimal values. Please provide a couple of empirical comparisons with the framework to indicate which set of rounded values are closer to the underlying decimals and by how much.

Rick talks about what banker’s rounding is and shows how its results adhere more closely to the underlying distribution. Rick does show a user-defined function that generates a rounded number, but if you’re doing this with large enough amounts of data, using CLR and the System.Math.Round() function will likely give you better performance. Incidentally, this is also why if you write T-SQL code to round decimal numbers and .NET code to round numbers, your results may be a little different: T-SQL rounds to the nearest integer, whereas .NET uses banker’s rounding by default.

Comments closed

Translating Numbers to Words in T-SQL

Sebastiao Pereira needs no number:

Sometimes, it is necessary to have numbers spelled out in words, like when writing a sentence (“Two hundred sixty-one victims were hospitalized.”). How do you express numbers in words with T-SQL code?

I could see a bit more recursion in the solution making it tidier, but in fairness, a bit more recursion would make it a lot slower in T-SQL.

For moderate difficulty mode, try it in German. For hard mode, switch to French.

Comments closed

Removing Leading Zeroes from a String in T-SQL

Steve Stedman gets rid of leading zeroes:

When working with data in SQL Server, there may be times when you need to remove leading zeros from a string. This task can be particularly common when dealing with numerical data stored as strings, such as ZIP codes, product codes, or other formatted numbers. In this blog post, we’ll explore several methods to remove leading zeros in SQL Server.

I’m not sure I see the reason to use anything other than CAST() (or, better yet, TRY_CAST()), but Steve does show two other methods.

2 Comments

Random Date Generation in Python

Chris LaGreca spits out some dates:

I often work with time series data and find it useful to have a variety of ways to randomly generate dates. This particular example is great for evenly distributed date partitions. Running the script below with the default arguments will output a list of random dates, one for each month of the year.

It looks like this is generating based off of a uniform distribution, which probably makes the most sense for “give me a day of the month” data generation.

Comments closed

An Introduction to pg_timeseries

Samay Sharma and Jason Petersen have an announcement:

We are excited to launch pg_timeseries: a PostgreSQL extension focused on creating a cohesive user experience around the creation, maintenance, and use of time-series tables. You can now use pg_timeseries to create time-series tables, configure the compression and retention of older data, monitor time-series partitions, and run complex time-series analytics functions with a user-friendly syntax. pg_timeseries is open-sourced under the PostgreSQL license and can be added to your existing PostgreSQL installation or tried as a part of the Timeseries Stack on Tembo Cloud.

Read on to learn more about how it works. The syntax and concepts do remind me a good bit of InfluxDB, as well.

Comments closed

Arbitrary Intervals for Partitioning in Postgres

Keith Fiske does a bit of interval math:

Whether you are managing a large table or setting up automatic archiving, time based partitioning in Postgres is incredibly powerful. pg_partman’s newest versions support a huge variety of custom time internals. Marco just published a post on using pg_partman with our new database product for doing analytics with PostgresCrunchy Bridge for Analytics. So I thought this would be a great time to review the basic and complex options for the time based partitioning.

Read on for a note of how pg_partman works and interval management, especially for versions earlier than 5.0.

Comments closed