Time travel in Snowflake is similar to temporal tables in SQL Server: it allows you to query the history rows of a table. If you delete or update some rows, you can retrieve the status of the table at the point in time before you executed that statement. The biggest difference is that time travel is applied by default on all tables in Snowflake, while in SQL Server you have to enable it for each table specifically. Another difference is Snowflake only keeps history for 1 day, configurable up to 90 days. In SQL Server, history is kept forever unless you specify a retention policy.
How does time travel work? Snowflake is built for the cloud and its storage is designed for working with immutable blobs. You can imagine that for every statement you execute on a table, a copy of the file is made. This means you have multiple copies of your table, for different points in time. Retrieving time travel data is then quite easy: the system has only to search for the specific file that was valid for that point in time. Let’s take a look at how it works.
It looks interesting, though the “Snowflake doesn’t have backups like you know them in SQL Server” gives pause.
Last month I was fortunate to have my first ever article published on Simple-Talk, among the best quality website for sql server articles ever. During the process of writing this article I ran into several errors related to temporal tables that I have not seen before. Some of these are documented by Microsoft, some are fairly obvious to understand and others are not. Below I summarize the list of errors you can possibly run into if you are using this really cool feature.
Click through for the list.
A common example of a slowly changing dimension would be a Person table. When storing information about a person, you’ll often store attributes such as name, address, marital status, date of birth, and social security number. Depending on the domain in which you’re working, you may even store extended information such as hair color, eye color, homeownership status, and income level.
Some of these attributes would certainly change over time. Since the average person moves 11.4 times in their life, you can expect that the address component of your Person table will need to be updated. The same applies to name attributes, since people often change their names after marriage or other life events. By definition, the load process for a slowly changing dimension table would update those attributes when they change.
I’m not as big a fan of the concept as Tim is (mostly because the date is a system time, not an application time, so fixing dates is not an easy problem), but it’s an interesting idea.
SQL Server has had many different methods to track changes to data. There has been old-fashioned trigger-based logging, Change Data Capture, and Change Tracking. All of these features allow you to see how data has changed with varying degrees of how far you can go back historically to see the changes. However, there has never been the ability to see how the entire table looked at any given point in time. That is what Temporal Tables do – they log every change that happens to the table. When you want to query it for any specific period of time, SQL Server will do the hard work of getting the snapshot of how the data in the entire table looked at that time.
A great introduction on how to set up temporal tables in different ways with various limitations can be found here. In this article, you will learn how to set up versioning when creating a new table and how to convert a table with an existing history to a system-versioned temporal table.
Read on to learn how to use temporal tables.
When you want to get latest (actual) state of data in a temporal table, you can query the same way as you query a normal table. If the PERIOD columns are not hidden, their values will appear in a SELECT * query. If you specified PERIOD columns as hidden, their values won’t appear in a SELECT * query. When the PERIOD columns are hidden, reference the PERIOD columns specifically in the SELECT clause to return the values for these columns.
To perform any type of time-based analysis, use the new FOR SYSTEM_TIME clause with four temporal-specific sub-clauses to query data across the current and history tables.
AS OF <date_time>
FROM <start_date_time> TO <end_date_time>
BETWEEN <start_date_time> AND <end_date_time>
CONTAINED IN (<start_date_time> , <end_date_time>)
Do check it out. The big downside I’ve seen to temporal tables is that I can’t query a temporal table as of a per-row timestamp in another table.
You delete data in the current table with a regular DELETE statement. The end period column for deleted rows will be populated with the begin time of underlying transaction.
You cannot directly delete rows from history table while SYSTEM_VERSIONING = ON.
Set SYSTEM_VERSIONING = OFF and delete rows from current and history table but keep in mind that way system will not preserve history of changes. TRUNCATE, SWITCH PARTITION OUT of current table and SWITCH PARTITION IN history table are not supported while SYSTEM_VERSIONING = ON.
Data modification is reasonably straightforward with temporal tables. Read on for examples.
To audit the data changes for a specific table, we should perform the time-based analysis of temporal tables. To do that, we must use the ‘FOR SYSTEM_TIME’ clause with below temporal-specific sub-clauses to the query data across the current and history tables. Let me explain the output of queries using different sub-clauses. Below is the setup:
I inserted a product named ‘Flat Washer 8’ with List price 0.00 in the temporal table at 09:02:25 AM.
I changed the List Price at 10:13:56 AM. New Price is 500.00.
Temporal tables aren’t going to solve all of your auditing problems but it can be useful.
Msg 13538, Level 16, State 3, Line 6
You do not have the required permissions to complete the operation.
Well, that’s not good. What permissions do I need exactly? Well, again, according to BOL I need CONTROL on the table and its history table. For those that don’t know CONTROL is the top level permission for any object. You can do anything at all with it.
Read the whole thing. I gather the reason for requiring this level of access is that you don’t want people to go monkeying with data collected for auditing purposes.
This post is another in the continuing theme of “making things consistent.” We were voluntold to help another team get their staging environment set up. Piece of cake, SQL Compare made it trivial to snap the tables over.
Oh, we don’t want these tables in Custom schema, we want them in dbo. No problem, SQL Compare again and change owner mappings and bam, out come all the tables.
Oh, can we get this in near real-time? Say every 15 minutes. … Transaction replication to the rescue!
Oh, we don’t know what data we need yet so could you keep it all, forever? … Temporal tables to the rescue?
Yes, temporal tables is perfect. But don’t put the history table in the same schema as the table, put in this one. And put all of that in its own file group.
Click through for a helpful script, and tune in next time, when the other team has Bill move their furniture around. Maybe move the couch just a hair to the right…no, a little more, oops, too much…
In my first 5 blogs on Temporal, I failed to note something pretty important about their usage. The start and end times for the row (and it follows, the historical rows) are likely not in your local time zone. They are stored in UTC time zone. This is obviously good because of that pesky daylight saving time issue where 1:30 AM occurs twice each year (here in the US, the time occurs on the first Sunday of November).
Unless you live in London, England or someplace where the offset from UTC is 0, it can make working with these tables confusing, because most people rarely work in UTC time, and even rarer is to think in UTC time when most of your data is likely in your local time zone. So you write your query and use local time…and then, most likely, data is returned…but not necessarily the data you actually desired.
Click through to see ways of translating those values.