The TO_DATE function in PostgreSQL is used to converting strings into dates. Its syntax is TO_DATE(text, text) and the return type is a date.
In contrast with MS SQL Server which has strictly specified date formats, in Redshift, any format constructed using the patterns of the table found in the corresponding documentation can be correctly interpreted.
When using the TO_DATE() one has to pay attention as even if an invalid date is passed, it will convert it into a nominally valid date without raising any error.
There are a few other tricks in SQL Server for some of these (for example, on 2012 or newer, I’d use TRY_CONVERT rather than CONVERT). That said, it’s a good overview of how to translate skills in one relational system to another.
I was editing an article recently that talked about ROLLUP, and I wanted to play with this a bit more. I hadn’t used this much in my career, but it’s a handy operator that’s worth including in your toolkit.
ROLLUP is used to provide additional totals for your aggregates while using GROUP BY. Here’s a little example. I’ve got some data for American Football quaterbacks. In this case, I’ve extracted some stats for a few noteworthy players today.
I’ll probably end up using ROLLUP about once every three months and be very pleased when I remember that it exists. I use GROUPING SETS more often and almost never use CUBE.
Each time the value from that column is queried, the calculation is performed so the result can be returned. This makes sense when the value is changing regularly and the value is queried infrequently.
However, according to my completely subjective statistics, most computed columns are queried much more than they are ever changed. So why work the value out each and every time?
One really nice thing about persisted computed columns is that you can then build non-clustered indexes using these columns. It’s a great way of pre-computing work that you need to do often but which would violate rules of database normalization.
CTEs are cool things. You can essentially create one or more in-line view(s) within your query. One thing that isn’t overly well known is that you can actually update the data within the CTE. No, I don’t mean using using the UPDATE statement with a CTE but actually running the update through the CTE.
This is really powerful when combined with window functions, like only updating the first record given a particular partition. You can also delete, which makes duplicate detection and deletion fairly straightforward.
Doug Kline has a new series on window functions. First, he looks at differences between RANK, DENSE_RANK, and ROW_NUMBER:
— Quick! What’s the difference between RANK, DENSE_RANK, and ROW_NUMBER?
— in short, they are only different when there are ties…
— here’s a table that will help show the difference
— between the ranking functions
— note the [Score] column,
— it will be the basis of the ranking
— here’s a simple SELECT statement from the Products table
ORDER BY UnitPrice DESC
— this shows that the highest priced product is Cote de Blaye, productID 38
— but sometimes the *relative* price is more important than the actual price
— in other words, we want to know how products *rank*, based on price
Doug’s entire posts are T-SQL scripts along with embedded videos.
So which should you use? Well, like most things in the database world, it depends. Do you need to cross databases? Do you need to use NOLOCK for a diagnostic query? Do you need additional information available in one of the views that isn’t available in one of the helper functions?
Click through for the comparison. I agree with his end result, that you should at least know both methods, even if you lean toward one.
You may be asking yourself “What the…?!”.
I asked myself the same question but then I thought of a better idea and asked my Senior the question instead (he usually has better answers than I do), who proceeded to tell me that the final test would only work exactly at midnight!
….so I repeated my question to him.
It’s an interesting read, and not something you’d commonly think about.
Apache Kafka’s Streams API provides a very sophisticated API for joins that can handle many use cases in a scalable way. However, some join semantics might be surprising to developers as streaming join semantics differ from SQL semantics. Furthermore, the semantics of changelog streams and tombstone messages (that are used for deletes) are a new concept in stream processing.
Kafka’s journey from Pub/Sub broker to distributed streaming platform is well underway, and our times as engineers are very exciting!
I didn’t know you could join streams together in Kafka, so that’s really cool.
It wasn’t until a few years ago, when I started contributing to the First Responder Kit at Brent Ozar Unlimited, that I noticed every status message in the kit scripts was thrown with something other than PRINT.
Strange, I thought, since those scripts like to report on what statements are running. Turns out, they avoided PRINT because it has some serious drawbacks:
- PRINT doesn’t necessarily output anything at the moment it’s called.
- PRINT statements won’t show up in Profiler.
- PRINT can’t be given variable information without CAST or CONVERT.
Those are important limitations, as Doug shows.
And in case you run into a development team that complains that when they time their code the duration is all over the place, this little gem will make sure their query will always take the same amount of time (assuming normal run time is under 90 seconds).
It’s the T-SQL equivalent of speed-up loops.