Whenever I’m teaching and recovering from corruption, I always stress that if the REPAIR_ALLOW_DATA_LOSS option of DBCC CHECKDB (which I’ll just call ‘repair’ from now on) has to be used, you should do the following:
- Run another DBCC CHECKDB to make sure that repair fixed everything and no new problems were exposed (by the repair fixing something that was blocking DBCC CHECKDB‘s access to a portion of the database)
- Run DBCC CHECKCONSTRAINTS on any tables or indexes that were repaired and are involved in constraints (stated in Books Online too)
- Reinitialize any replication subscriptions that are based on tables that were repaired (stated in Books Online too)
- Reevaluate your HA/DR strategy so you don’t have to use repair again in future
One question I’m often asked is about why replication can be broken by running repair.
Read on for the answer.
I have a love-hate relationship with replication. Mostly hate due to latency and errors, but it does serve its purpose. Before Availability Groups came out, I used Transactional Replication to copy data from the production OLTP database to another server so that we could offload reports.
Will it work with a Managed Instance?
My relationship is closer to hate-hate-love-hate-hate-love-hate. Tara also provides some good advice about watching replication latency stats.
When a new agent user profile is created and assigned to an agent, relevant entries are added to the [dbo].[MSagent_profiles] and the [dbo].[MSagent_parameters] tables in the MSDB database on the Distributor server. The values in the tables are updated whenever a parameter value is updated for the Profile. These values are reloaded from the tables, whenever a replication agent is restarted. As part of the improvement a new profile parameter “ProfileReloadInterval” with a default value of 300 seconds, was added to the profiles. This parameter dictates the interval at which the agent reloads the values from the tables. If using the default value, the agent will try and reload the values from the MSDB tables every 5 minutes. If it encounters that any of the above-mentioned parameter values have changed, the new values will be used by the agents.
I don’t work with replication enough to really see the benefit, but I’m sure somebody is giddy about this.
It’s a right laugh when replication goes wrong, if you haven’t been there and use replication heavily, you’re in for the night / early morning of your life. Like any issue, the cause of replication problems could be one of many, but checking what commands are being passed through the distributor is always worthwhile, as an inefficient query could be updating the same data over and over, putting unnecessary strain on the distributor, infrastructure or the subscribers.
Read on for those queries.
Microsoft have stated that transactional replication isn’t supported on Linux and we’re not sure if they intend to in the future. This means that if you try to add the server to a publisher, you get the following message. So you can’t use the GUI and it also means that you can’t use pull as the necessary files won’t be there.
Despite that warning, there is a way to set up a push subscription; click through for that way.
Drew Furgiuele (who always starts off with the red ring) explains three important replication stored procedures:
We all have that one scenario where replication caused us some major headaches. It’s through these headaches, though, that the search for better ways to handle problems comes up. Through my time with replication, I’ve researched and used a lot of system tables, views, and stored procedures to diagnose or fix replication problems. There’s actually a lot of things you can get into if you’re so inclined. If you put me on the spot though, and asked me “Drew, what do you think are the most helpful replication internal objects to know?” I’d tell you I have three, and they’re all stored procedures.
Here then, in no particular order, are three system stored procedures that I feel any DBA that has to deal with transactional replication should at least be aware of. They each have a unique purpose, and each can be used to quickly monitor, fix, or even destroy replication forcibly, respectively.
Read the whole thing.
So let’s say you ran this script (or, maybe someone checked it in as a database change to production). For a while, things are great: you’re making changes to data on your publisher and things are flowing nicely to your subscribers. Sooner or later though, someone’s going to ask you to set up a new subscription (or maybe you need to reinitialize one). Let’s simulate that on my lab: we’re going to remove Person.Address from replication and we’re going to put it back, and then create a snapshot. The key difference here is that now, Person.Address has system versioning turned on. When we try and add the table back to the publication, we’re in for a shock:
This could come back to bite you, so if you use replication and are interested in temporal tables, read this closely.
In the above picture, we can see that it did replicate the execute statement, and that it affected 19,972 rows on the replica, and it only took 67ms! Sounds awesome, doesn’t it? Here’s a way to handle large batch updates at your publishers without overwhelming your replication setup. But before you go changing everything, you should probably understand that this has some really, really bad side effects if you’re not careful. Let’s look at three really big ones.
All in all, it’s a fairly risky move but might be worth the performance improvements.
But what happens if we set up a transactional replication publication on this database and do a snapshot? Remember that when you create a publication, your distributor and subscriber(s) need to know which network share (or FTP server) to drop all the data and schema definitions to so they can be read in by the distribution agent and recreated. In my example, I’m dropping them to a network share. Once the snapshot completes, let’s go check out our subscriber database…
Uh oh. The same query returned zero results at the subscriber. Which means no encryption! Replication won’t replicate encryption, at all. So if you have a requirement to encrypt your data at the source, you’ll need to do it on your subscribers too.
Drew points out a couple important gotchas which might lead to you exposing information you didn’t intend to make available.
Extended events are great; they have all the goodness of profiler except you don’t use profiler. Win/win! More to the point, extended events let you quickly and easily view, sort, and aggregate events that occur on your instances. They also have powerful filters (really, a “where” clause) to limit noise. You have way more control over what you monitor, how you store the data, and how you view and use it. This makes them perfect use to track replicated transactions, since we want to measure at both an individual level and the aggregate.
I fired up management studio and went to “New Session” looking for some replication event goodness and I found…
… nothing. I tried looking for events that had even parts of the name replication in it. No such thing, apparently.
This doesn’t deter Drew and he ends up building some interesting events to infer the correct answers.