In the field, I see a lot of people using Availability Groups to have a near real-time replica for reporting. I talked a little bit about this above. What isn’t mentioned here is you have to maintain a Windows Failover Cluster, Quorum, Active Directory (Unless using Windows 2016 Preview) and more. This gets you a replica that is just a copy of the database. What does this mean? You cannot change database objects like security, indexes, etc. Also, what if you don’t need the whole database(s) for reporting? If not, you can replicate only the data you truly need.
So, let’s recap here. You only have to replicate the data that you need. You can have different security and indexes on your reporting subscriber database(s). The reporting subscriber database can be scaled up or down as needed based on your needs. The reporting database can now be an Azure Database. Folks, I call this a huge win!
There’s a lot more replication love out there than I’d expect. John promises to follow up with a guide on how to implement this, so keep an eye out for that.
With the end of SQL Server 2005, we also will soon see the end of database mirroring. There is a new feature releasing with SQL Server 2016 called Basic Availability Groups. This is the replacement for database mirroring. The use cases and limitations will appear very similar to database mirroring but it will use the Availability Group technology. In theory this will be like a stim-pack for the database mirroring feature while leaving it available in Standard Edition. Let’s cross our fingers that the Windows Failover Cluster components don’t slow down the failovers like it did with AGs.
A bold choice, but that “available in standard edition” thing is huge for smaller organizations which can’t afford Enterprise (especially with The Licensing Changes of 2012).
In the end I decided it would be fun to post about one of the newer features;Delayed Durability. To understand DD (delayed durability) you have to understand how SQL Server implements durability. To start with there are two pieces of any data change. The actual data page and any log records that are created. In order to keep SQL Server as fast as possible the data pages are changed in memory and then written to disk as part of a background process. Unfortunately this means that you could lose those changes if the server crashes or shuts down before they are written to disk. In order to avoid this, and keep the changes durable, the transaction can not complete until the log information is actually written to disk. This means that one of the big bottlenecks in any given transaction is writing that log record to disk. The DD option basically changes that write from a synchronous to an asynchronous operation. The log writes are bundled together in 60k chunks before being written to disk. This can (depending on your work load) provide a huge increase in speed. I’m not going to bother discussing that part of it since Aaron Bertrand (b/t) and Melissa Connors (b/t) wrote great posts on just that (Aaron’s and Melissa’s).
What I want to discuss is the actual risk of data loss.
My philosophy on this is simple: if you need delayed durability and you can afford the data loss, then maybe there’s a better data platform. I want my SQL Server like I want my Grateful Dead concert: ACID compliant.
Seriously though, replication has been around since the beginning and it’s not going anywhere. I can’t think of any other feature more prolific than replication. Name another SQL Server HA/DR technology that is as extensible as replication. Replication has gotten a bad rap over the years mostly on anecdotal comments that it “breaks all the time” or “it takes too much time to manage”. I’ve worked in many environments and have setup dozens and dozens of instances of log shipping, mirroring, clusters, availability groups, and replication. From my anecdotal experience, I can tell you I’ve had more trouble with availability groups than I have with replication. If you have a good DBA that understands replication, uses it correctly, and is provided the correct tools (read $ for hardware/infrastructure) replication works just fine. I have setup replication in a global environment in which multiple databases, publications, subscriptions, and agents ran around the clock and without issue.
Replication is very powerful, I agree…but it still breaks. A lot. I’m grateful for its existence and also for the fact that I’m not the one maintaining it…
A quick one today, just looking for strings. I wrote an article on this, so there’s more detail there, but here’s a bit of code you can look through and see what it does.
He didn’t tag his post T-SQL Tuesday, but it certainly is apropos.
When I started this blog a friend of mine suggested I write a really complicated query as a header. Now I’m not sure how complicated it really is, but I find it fairly amusing, and the whole point of it is to manipulate some text to generate a different set of text. So this seems like a good time to go through it and explain what’s going on.
There’s a bit to unpack, but it’s a fun experiment.
Looking into what was needed, I quickly realized there was a LOT of data, guess 50+ years of news broadcasts will do this. Consider this was in the early 2000s, some innovation was needed from anything I had coded before. Obviously LIKE searches was out of the question, full text search was not available. So what to do?
Basically I decided to break down each broadcast to words into a separate table, the entire application fit in 2 tables: Story and Words.
This is a case in which thinking about the grain of data helps solve an otherwise-intractable problem.
If there’s an index on the column you’re grouping, then changing the collation is going to hurt a bit. Grouping could take advantage of a Stream Aggregate under our indexed collation, but changing the column is like throwing it away the index order (ORDER BY doesn’t get handled well by changing the collation) means a Hash is required. But comparing two query plans that both use Hash Match (Aggregate), one on a case-insensitive collation and one on a binary collation, then I found the latter was slightly faster. Not as drastic a change as searching, but still 10-30% better. One would run in about 12 seconds, and one in about 10.
Be sure to check out his comments for more details.
This month I challenged the blogging community to share their own creations in Power BI. We got a ton of great entries this month, thank you everyone who participated! My overarching goal for this month’s topic was to get folks who may not normally play in the BI space to use this fantastic solution and maybe get some ideas flowing on how they may be able to apply it in their everyday work.
The part I like most about T-SQL Tuesday is that it introduces you to a whole new set of bloggers and a whole new set of perspectives on any particular topic.
My story with this half-baked product (the Dashboard you are about to see), is that I needed some way of tracking performance on a couple of Analysis Services (SSAS) query servers. There are a lot of good posts and talks about how to collect and store performance counters and SSAS logs out there, and I suggest you look into this, this or that, if you need inspiration.
The current data set is about 200K rows, as I am sampling each server every 5th minute.
Both of these are valuable tools in a Microsoft BI environment.