Press "Enter" to skip to content

Category: Administration

Preparing For Disaster Recovery

Kendra Little has a 30-minute video and explanation for how to prepare for a failover event:

The fact that you’re thinking about this is great!

You’re right, there are two major types of fail-overs that you have to think about:

  • Planned failover, when you can get to the original production system (at least for a short time)
  • Unplanned failover, when you cannot get to it

Even when you’re doing a planned failover, you don’t have time to go in and script out settings and jobs and logins and all that stuff.

Timing is of the essence, so you need minimal manual actions.

And you really should have documentation so that whomever is on call can perform the failover, even if they aren’t you.

The short answer is, test, test, test.  Test where it can’t hurt, and then test where it can.  But do read/watch the whole thing.

Comments closed

Monitoring Apache Spark

Swaroop Ramachandra has started a series on monitoring Apache Spark:

Spark provides metrics for each of the above components through different endpoints. For example, if you want to look at the Spark driver details, you need to know the exact URL, which keeps changing over time–Spark keeps you guessing on the URL. The typical problem is when you start your driver in cluster mode. How do you detect on which worker node the driver was started? Once there, how do you identify the port on which the Spark driver exposes its UI? This seems to be a common annoying issue for most developers and DevOps professionals who are managing Spark clusters. In fact, most end up running their driver in client mode as a workaround, so they have a fixed URL endpoint to look at. However, this is being done at the cost of losing failover protection for the driver. Your monitoring solution should be automatically able to figure out where the driver for your application is running, find out the port for the application and automatically configure itself to start collecting metrics.

For a dynamic infrastructure like Spark, your cluster can get resized on the fly. You must ensure your newly spawned components (Workers, executors) are automatically configured for monitoring. There is no room for manual intervention here. You shouldn’t miss out monitoring newer processes that show up on the cluster. On the other hand, you shouldn’t be generating false alerts when executors get moved around. A general monitoring solution will typically start alerting you if an executor gets killed and starts up on a new worker–this is because generic monitoring solutions just monitor your port to check if it’s up or down. With a real time streaming system like Spark, the core idea is that things can move around all the time.

Spark does add a bit of complexity to monitoring, but there are solutions in place.  Read the whole thing.

Comments closed

Data Protection Manager

Tom Roush discusses gotchas around Microsoft’s Data Protection Manager:

You’ve got DPM installed, and for the most part, configured.  It’s working, but you have transaction log drives filling up on some of your servers, and it’s not really clear why.

Wanna know why?

Here’s the answer:

It’s because the UI is very unclear, because the documentation is unclear, (there was a hint of it on page 83) and because the things that would be obvious to a DBA simply aren’t mentioned.

Tom has a very detailed post on the topic, making it a must-read if you use this tool.

Comments closed

Database Snapshots

Kenneth Fisher discusses database snapshots:

Here is where it starts getting interesting. A snapshot initially takes up little to no space. As changes are made to the source database the snapshot grows in size. In fact the snapshot is the size of all of the pages changed in the source database since the creation of the snapshot. Basically as a page is changed in the source database a copy of the original page is made and stored in the snapshot, but only the first time. (Note: The files used to store these pages are called sparse files.) This means that if you change the same page over and over again it will only be written to the snapshot once. It then logically follows that the largest a snapshot can get is the size of the source database at the time the snapshot was taken. Since most of the time we change a very small portion of the database at any given point in time this means that snapshots tend to be much smaller than the source database. In fact you could load millions of rows into the source database (assuming they are mostly/all in new pages) and it will have little to no effect on the size of the snapshot.

My favorite use of database snapshots was so developers could test their changes in QA and then revert back to a pre-snapshot environment.  That way, they could preserve data for future runs.

Comments closed

Indirect Checkpoint

Kendra Little notes that indirect checkpointing is now the default in SQL Server 2016:

SQL Server 2016 introduces big new features, but it also includes small improvements as well. Many of these features are described in the “It Just Runs Faster” series of blog posts by Bob Ward and Bob Dorr.

One article in this series explained that new databases created in SQL Server 2016 will use “Indirect Checkpoint” by default. Indirect checkpoint was added in SQL Server 2012, but has not previously been enabled by default for new databases. The article emphasizes this point:

Check out the comments as well.

Comments closed

ElasticSearch Production Checklist

Darren Perucci has a checklist for spinning up a production ElasticSearch environment:

When you are looking for a node or a cluster, it is a good idea to have a name which describes what you will need to find and separates one from another.

The default cluster name of “elasticsearch ” could allow any nodes to join the cluster, even if this was not intended. Thus, it is a good idea to give the cluster a distinct identifier instead.

There are a few things to think about in this post.

Comments closed

Create An SSIS Catalog

Andy Leonard shows how to create an SSIS catalog:

Check the “Enable CLR Integration” checkbox to enable the other controls on the form.

I recommend you also check the “Enable automatic execution of Integration Services stored procedure at SQL Server startup” checkbox. This feature causes a stored procedure to execute whenever SQL Server starts. The stored procedure will identify any SSIS packages in a running (or other “active”) status and mark them as “Ended Unexpectedly.” You want this. Trust me. (As my friend Kevin Boles (LinkedIn | @thesqlguru) says, “Push the trust me button and let’s move on,” (paraphrased).

You cannot alter the name of the SSIS Catalog database. It is SSISDB. And, as in Highlander, there can be only one SSIS Catalog per instance of SQL Server.

This post is full of helpful notes if you’ve never used the SSISDB database before.

Comments closed

Finding File Growth

Andy Galbraith shows how to use extended events to track file growth events:

…but what does this have to do with Extended Events?

As I started out, I was looking for something slick I do with a trace that I could replicate in Extended Events, and this was a great candidate.

The catch as I discovered, is that while file growths are captured in the default trace, they are *not* in the system health session…

Fortunately, you can create your own XEvent and Andy gives us the details.

Comments closed

Database Scoped Configurations

Niko Neugebauer looks at a new feature in SQL Server 2016:

In the first version/iteration of the Database Scoped Configurations, there are 5 functionalities are supported on per Database level:
– Procedure Cache cleaning
– MAXDOP control
– Query Optimizer Cardinalitiy Estimation Model activation control
– Parameter Sniffing control
– Query Optimizer Hotfixes enabling

Niko also shows an example of how the different MAXDOP settings interact.

Comments closed