SQL Server offers two flavors of optimistic locking for traditional disk-based tables: Read Committed Snapshot Isolation (RCSI), and Snapshot Isolation. They are each great tools to reduce blocking and make applications faster, particularly in databases that have lots of tiny reads and writes that need to be quick, or mixed-use patterns (lots of little reads and writes + larger reporting queries).
Both isolation levels are available in SQL Server Standard Edition and Web Edition as well as Enterprise. That’s right, they’re cheap and easy. Each of them are controlled by database level settings, and default to disabled when you install SQL Server.
The moral of the story: both of these are awesome, both have potential drawbacks, and both need testing. I’ve had good experiences with RCSI, but even then, maybe about 1% of procedures need specific locking hints (either NOLOCK or an explicit lock) to maintain previous application behavior and to deal with the problem Kendra brought up. Moral of the story: test, test, test.
My story with this half-baked product (the Dashboard you are about to see), is that I needed some way of tracking performance on a couple of Analysis Services (SSAS) query servers. There are a lot of good posts and talks about how to collect and store performance counters and SSAS logs out there, and I suggest you look into this, this or that, if you need inspiration.
The current data set is about 200K rows, as I am sampling each server every 5th minute.
Both of these are valuable tools in a Microsoft BI environment.
So here is the thing. When you change one you change them all. That means if you don’t specify a precision when you can then you get the default. That’s not exactly a common problem though. Usually what you are changing is the precision (or possibly the datatype). What is a common mistake is not specifying the nullability.
When modifying DDL, make sure that you keep it consistent and complete.
Capacity planning is difficult for DBAs who expect growth. Will there be enough CPU, Memory or I/O to serve the anticipated load? One category falls outside those three, logical contention.
Logical contention is a problem where excessive blocking causes throughput to suffer. It would be great to get advanced warning. One essential strategy is to make use of the blocked process report. The problem is that blocked process reports are an alarm metric, not a guage metric. In other words, the blocked process report can indicate when there is a problem, but it is poor at giving advanced notice.
This is a nice visual tool to begin to understand the topic. The same principles apply to road traffic, water flow, etc.
Do you see the problem? Really, there are two but the one I’m focused on is the use of GETDATE to determine which branch of logic is executed. Today is Monday and I need to test the logic that runs on Friday. Yes, I can run these steps in isolation and given that I’m not updating the logic that fiddles with the branches, my change shouldn’t have an adverse effect but by golly, that sucks from an testing perspective. It’s also really hard to develop unit tests when your input data is server date. What are you going to do, allocate 5 to 7 days for testing or change the server clock. I believe the answer is No and OH HELL NAH!
This isn’t just an SSIS thing, either. I’ve seen the above logic in TSQL as well. If you pin your logic to getdate/current_timestamp calls, then your testing is going to be painful.
I liken this to solving dependency injection problems in general: make the caller define the date or date part. That way, your test callers can define other dates and the “smarts” around which branch to take move up to a more swappable layer.
If you landed on this post, I’m guessing you know the answer to this, so I’ll be quick. xp_sqlagent_enum_jobs is an undocumented (and thus, unsupported) extended procedure that gives information on whether a job is currently running, when it last ran, when it runs next, etc.
The supported alternative is to use sp_help_job (which calls xp_sqlagent_enum_jobs), but sp_help_job is difficult to use in automation because trying to do INSERT #temp EXEC sp_help_job will result in an error due to nested INSERT…EXEC statements. As a result, people often turn to calling the unsupported xp directly.
I never actually knew about this procedure; I always just wrote queries against the msdb tables. Andy’s solution is more elegant than what I normally come up with, though.
If you specify COPY_ONLY for a full backup, it will not affect the following differential backups. I.e., the following differential backups will be based on the last full backup which was not performed with COPY_ONLY. Another way of looking at this is that a full backup using COPY_ONLY will not reset the DIFF (also known as DCM) page, which is page 6 and repeated approximately every 4 GB in each database file.
If you specify COPY_ONLY for a log backup, it will not affect the following log backups. I.e., the log backup chain will be without the one(s) produced using COPY_ONLY. Another way of looking at this is that a log backup using COPY_ONLY does not empty (truncate) the log.
That’s it! COPY_ONLY for a full backup does not, in any way, affect the restoreability for log backups.
The copy-only is a great feature, but understand what it does and how it works.
This architecture can be used when your organization does not value their secondary data center the same as the primary. It is a best practice to have matched or similar hardware between your primary and disaster recovery sites but that is not always possible. When costs need to be reduced it is better to have one failover server that you know can handle the work load rather than two servers which are under powered. Under powered hardware can easily become an effective outage if they cause timeouts as soon as a production work load is placed on them.
For many small to medium-sized enterprises, this might be the easiest route to sell to management—it’s hard to get management to have “redundant” servers which normally don’t get used.
As you can see, Page Life Expectancy (PLE) on this graph dips, gradually climbs, and then dips again. With a collection every five minutes you may not catch the exact peak – all you know is that the PLE was 50,000 at 12:55am and then only 100 at 1:00am on 03/13. It may have climbed higher than that before it dipped, but by 1:00am it had dipped down to around 100 (coincidentally at 1am the CheckDB job had kicked off on a large database).
If you really need to know (in this example) exactly how high PLE gets before it dips, or exactly how low it dips, or at what specific time it valleys or dips, you need to actively watch or set up a collector with a more frequent collection. You will find that in most cases this absolute value isn’t important – it is sufficient to know that a certain item peaks/valleys in a certain five minute interval, or that during a certain five minute interval (“The server was slow last night at 3am”) a value was in an acceptable/unacceptable range.
Andy also gives us a set of counters he uses by default and how to set up automated counter collection. Left to the reader is integrating that into an administrator’s workflow.
In this post we ran through the list of Extended Events that you’ll want to collect along with an explanation on the type of performance-related information you can derive from them. Depending on the type of instance you’re monitoring, we’re only really only talking about 5 (multidimensional) or 6 (tabular) events that need to be captured.
If you are in charge of administering Analysis Services cubes, Bill’s series is an excellent way of making sure you’re keeping up on good monitoring practices.