Astronomers wanted a tool that would be able to quickly answer questions like: “find asteroid candidates” or “find other objects like this one”, which originally gave the motive to build the SQL-based backend. Indeed, right from the beginning Jim Gray asked Alex Szalay to define 20 typical queries astronomers might want to ask and then together they designed the SkyServer database to answer those queries. The anecdote is that the conversation went as follows:
Jim: What are the 20 questions you want to ask?
Alex: Astronomers want to ask anything! Not just 20 queries.
Jim: Ok, start with 5 queries.
[it took Alex 30 minutes to write them all down]
Jim: Ok, add another 5 queries.
[it took Alex 1 hour to write them all down]
Jim: Ok, now add another 5 queries.
[Alex gave up and went home to think about them]
Alex (said later): In 1.5 hours, Jim taught me a lot of humility!
Alex (said later): It also taught us the importance of long-tail distribution and how to prioritize.
This is my favorite part of the article.
When I started this blog a friend of mine suggested I write a really complicated query as a header. Now I’m not sure how complicated it really is, but I find it fairly amusing, and the whole point of it is to manipulate some text to generate a different set of text. So this seems like a good time to go through it and explain what’s going on.
There’s a bit to unpack, but it’s a fun experiment.
At 1am the time jumps straight to 2am. Got any jobs scheduled to run at 01:30? They ain’t going to happen. I hope they weren’t important.
He also has the start of a script which helps fix timing issues, either from losing an hour in spring or gaining an hour in fall. This is a timely reminder (no pun intended) that Daylight Savings Time begins on Sunday, March 13th this year in the US and March 27th in most of Europe.
What is a distributed availability group? Distributed availability groups allows you to create two different AGs on different Windows Server failover clusters (WSFCs) but join them together – if you will, an AG of AGs. This is great for disaster recovery scenarios where you do not want to worry (more than you should) about things like voters and quorum in a single WSFC which could make the configuration more complex. This also allows for different versions of Windows Server (one WSFC is Windows Server 2012 R2 and another Windows Server 2016). Heck, it could even facilitate migrations to new hardware/the public cloud/virtualization assuming the same major version of SQL Server. It’s a cool feature. You can only manually fail over the AG between the WSFCs, but that’s OK, and you do need a listener for each AG. If you do not plan on using a listener, you cannot create a distributed AG.
Side note: when I read DAG, I think directed acyclical graph. Maybe I’m just weird that way…
Main note: the idea of “seeding” an Availability Group sounds wonderful.
Each of these is part of a striped backup, a piece of a backup file. To restore the backup, I need all the files to reassemble the backup. This is fairly simple, but you should be aware of how this works and how to perform a restore.
In my case, you can see I have 7 files for each database. They are the same name with an _0x at the end, with x being the number of the file.
Take Steve’s advice and script out the restore process. That way you know how to do it next time, can start building automation scripts, and can make your life easier.
Mockaroo is a really impressive service with a wide spread of different data types. They also have simple ways of adding things like within group differences to data so that you can mock realistic class differences. They use the freemium model so you can get a thousand rows per download, which is pretty sweet. The big BUT you can feel coming on is this – it’s a GUI! I don’t want to have spend time hand cranking a data extract.
Thankfully, they have a GUI for getting data too and it’s pretty simply to use so I’ve started making a package for it.
Steph is working on an R package, so this is pretty exciting.
SQL Server is implicitly converting my date value to DATETIME2(7). That is a larger, more precise value than the data type I have in the table–FakeBirthDateStamp is DATETIME2(0).
That data type mismatch is preventing partition elimination!
This is a nasty issue to catch in production, especially after you spend a bunch of time arguing with devs that DATETIME2 is the way of the future, that it’s better because of the variable precision, etc.