Press "Enter" to skip to content

Category: Dates and Numbers

Finding the Earliest Date in R

Steven Sanderson puts on the archaeologist’s fedora and bullwhip:

Greetings, fellow data enthusiasts! Today, we embark on a quest to uncover the earliest date lurking within a column of dates using the power of R. Whether you’re a seasoned R programmer or a curious newcomer, fear not, for we shall navigate through this journey step by step, unraveling the mysteries of date manipulation along the way.

Imagine you have a dataset filled with dates, and you’re tasked with finding the earliest one among them. How would you tackle this challenge? Fear not, for R comes to our rescue with its arsenal of functions and packages.

Click through to see how, keeping those pernicious missing values in mind.

Comments closed

Calculating Date Differences in Month with R

Steven Sanderson has ways to track months:

Greetings fellow R enthusiasts! Today, let’s dive into the fascinating world of date calculations. Whether you’re a data scientist, analyst, or just someone who loves coding in R, understanding how to calculate the number of months between dates is a valuable skill. In this blog post, we’ll explore two approaches using both base R and the lubridate package, ensuring you have the tools to tackle any date-related challenge that comes your way.

Read on to see how to do this in base R as well as the lubridate package.

Comments closed

2024 Data Professional Salary Survey Results

Brent Ozar counts the cash:

This is the 8th year now that we’ve been running our annual Data Professional Salary Survey, and I was really curious to see what the results would hold this year. How would inflation and layoffs impact the database world? Download the raw data here and slice & dice it to see what’s important to you. Here’s what I found.

Read on for the results and Brent’s analysis.

Comments closed

Aggregating by Month and Year in R

Steven Sanderson groups by month and year:

Taming the beast of daily data can be daunting. While it captures every detail, sometimes you need a bird’s-eye view. Enter aggregation, your secret weapon for transforming daily data into monthly and yearly insights. In this post, we’ll dive into the world of R, where you’ll wield powerful tools like dplyr and lubridate to master this data wrangling art.

Click through for examples of summarizing daily data into monthly and annual data. One thing to keep in mind, however, is that the monthly aggregation in these examples is just month, so if you have July 2023 and July 2024 data, you’ll get a row back for July. It’s all about understanding what the grain of your data is, as well as your desired grain.

Comments closed

Time Series Data in Postgres with TimescaleDB

Semab Tariq keeps track of time:

TimescaleDB is an open-source time-series database extension for PostgreSQL. It is designed to efficiently manage and query time-series data, offering features such as automatic data partitioning, data retention policies, and specialized time-series functions. 

This extension provides scalability, improved performance, and seamless integration with PostgreSQL, making it a powerful choice for applications dealing with large volumes of time-stamped data, including IoT, monitoring, and analytics.

Read on to learn how to install it (on Linux), some of the tuning parameters available, and how to create time series hypertables and chunk tables.

Comments closed

CAST() and CONVERT() for Dates

Chad Callihan converts a date:

CAST and CONVERT can both be used to switch a value to a new data type. They are similar, but certainly not identical. While CAST is considered ANSI SQL and will get you across the finish line, CONVERT can give you more flexibility when it comes to formatting date values. Let’s look at an example comparing the usage of CAST and CONVERT with dates.

Most of the time, I’ll use CAST() over CONVERT(), not so much because the former is ANSI compliant, but rather because I think it’s more intuitive to remember. Date formatting is one of the few occasions in which I usually prefer CONVERT() and that’s precisely because of the format options. Of course, if you want more custom formatting options, you can use FORMAT(), though that function uses .NET in the background and is remarkably slow. It’s fine if you’re formatting a few dates, but if you’re outputting millions of rows, you will certainly see a marked difference.

Comments closed

PostgreSQL 16 and Infinity

Ryan Lambert goes to infinity and beyond:

This month, Ryan Booz chose the topic: What Excites You About PostgreSQL 16? With the release of Postgres 16 expected in the near(ish) future, it’s starting to get real. It won’t be long until casual users are upgrading their Postgres instances. To decide what to write about I headed to the Postgres 16 release notes to scan through the documents. Through all of the items, I picked this item attributed to Vik Fearing.

  • Accept the spelling “+infinity” in datetime input

The rest of this post looks at what this means, and why I think this matters.

Read on to see what’s new about this and what it all means.

Comments closed

ADX Date and Time Representations in Power Query and Power BI

Dany Hoter does some explaining:

Data in ADX (aka Kusto aka RTA in Fabric) almost always has columns that contain datetime values like 2023-08-01 16:45 and sometimes timespan values like 2 hours or 36 minutes.

In this article I’ll describe how these values are represented in ADX in Power Query and in Power BI.

Notice that I don’t just say Power BI because timespan values have different types in Power Query and in Power BI.

Read on for those details.

Comments closed

DATEDIFF() and Month Boundaries

Deb Melkin fed the mogwai after midnight:

I was working on a query this week that reminded me of a fun quirk when working with dates and the DATEDIFF function in particular.

I have a process that takes a while to run. Because of all of the moving parts to keep track of, I have an audit table to track what I’m doing to collect basic info like when did it start, when did it end, etc. I created a simple report for myself to break things down so I can report back to the team. I threw together a simple SQL statement, using DATEDIFF to figure out the how long things took. Looking at the results, I saw some interesting results.

Read on for two queries, one which has a bit of a problem and one which strives to correct that problem.

Comments closed