Jepsen: MongoDB 3.4.0-rc3

Kyle Kingsbury takes a new look at MongoDB:

In April 2015, we discussed stale and dirty reads in MongoDB 2.6.7. However, writes appeared to be safe; update-only workloads with majority write concern were linearizable. This conclusion was not entirely correct. In this Jepsen analysis, we develop new tests which show the MongoDB v0 replication protocol is intrinsically unsafe, allowing the loss of majority-committed documents. In addition, we show that the new v1 replication protocol has multiple bugs, allowing data loss in all versions up to MongoDB 3.2.11 and 3.4.0-rc4. While the v0 protocol remains broken, patches for v1 are available in MongoDB 3.2.12 and 3.4.0, and now pass the expanded Jepsen test suite. This work was funded by MongoDB, and conducted in accordance with the Jepsen ethics policy.

Mongo has grown up when it comes to data integrity, though be sure you’re using the v1 replication protocol.

Memory-Optimized Table Warnings

Robert Davis looks at messages in the error log related to memory-optimized tables:

The server on which we are running in-memory OLTP is a really hefty server with 128 logical cores and 1.5 TB of RAM (1.4 TB allocated to SQL Server). We are limiting in-memory’s memory usage with Resource Governor, which also makes it easy to see how much it is using. Needless to say, even with a limited percentage of 1.4 TB of RAM is still a lot of memory. The highest I have seen in-memory usage for this one database reach at peak activity levels is ~43 GB. In production, when the heavy in-memory OLTP processes complete, I see the system reclaim the in-memory buffers pretty quickly, though not completely. During a normal day, I often see the in-memory memory usage hovering between 1 and 3 GB even when there is virtually no traffic.

When testing in-memory on a dev server that only I was using before deploying to production, I noticed that the memory usage would stay at whatever high level it reached. This makes me believe that in-memory buffers are cleaned up and reclaimed as needed, and if not needed, they just hang around as in-memory buffers. And it appears that some of the buffers end up hanging around. Perhaps they wouldn’t if the server was memory starved. I have not tested that theory.

It’s a conjecture, but seems pretty solid.  Also worth reiterating is that they’re warnings, not errors.

Doomed Transactions

Michael Swart talks about doomed transactions:

So the procedure was complicated and it used explicit transactions, but I couldn’t find any TRY/CATCH blocks anywhere! What I needed was a stack trace, but for T-SQL. People don’t talk about T-SQL stack traces very often. Probably because they don’t program like this in T-SQL. We can’t get a T-SQL stack trace from the SQLException (the error given to the client), so we have to get it from the server.

Michael shows how to get stack trace information and provides some advice on the process (mostly, “don’t do what we did”).

Error Refreshing Views

Kenneth Fisher diagnoses an error when calling sp_refreshview:

Msg 297, Level 16, State 3, Procedure sp_refreshsqlmodule_internal, Line 74 [Batch Start Line 2]
The user does not have permission to perform this action.

Click through for the solution to this problem.

SSIS Fast Load

Chris Taylor runs into an issue with the OLE DB Destination’s fast load option in Integration Services:

What I do want to bring to your attention is the differences between the two when it comes to redirecting error rows, specifically rows that are truncated. One of the beauties of SSIS is the ability to output rows that fail to import through the error pipeline and push them into an error table for example. With fast load there is a downside to this, the whole batch will be output even if there is only 1 row that fails, there are ways to handle this and a tried and tested method is to push those rows into another OLE DB Destination where you can run them either in smaller batches and keep getting smaller or simply push that batch to run in row-by-row to eventually output the 1 error you want. Take a look at Marco Schreuder’s blog for how this can be done.

One of the issues we have exerienced in the past is that any truncation of a column’s data in fast load will not force the package to fail. What? So a package can succeed when in fact the data itself could potentially not be complete!?! Yes this is certainly the case, lets take a quick look with an example.

Read on for details and potential workarounds.

Error Handling With Extended Events, Part 2

Dave Mason continues his discussion of using Extended Events to handle errors:

In the last post, we explored a couple of examples of using Extended Events to enhance T-SQL error handling. There was some potential there. But a hard-coded SPID was necessary: we couldn’t use the code examples for anything automated. It was cumbersome, too. Let’s change that, shall we?

To make the code easier to work with, I moved most of it into three stored procs: one each to create an XEvent session, get the XEvent session data, and drop the XEvent session. There’s also a table type. This will negate the need to declare a temp table over and over. The four objects can be created in any database you choose. I opted to create them in [tempdb]. The code for each is below in the four tabs.

This is a very interesting solution.

Error Handling Extended Event

Dave Mason shows how to use an Extended Event to capture error data:

Here’s an example for DBCC CHECKDB on a corrupt database. Remember from the last post that in this scenario, control never passes to the CATCH block. So we’ll need to check the Event Session data after END CATCH. You can also run this as a single batch in SSMS, but you’ll need a corrupt database to get similar results. As before, replace “2016” with your SPID.

There are a lot of working parts to this, so read the scripts carefully if you’re interested in implementing something similar yourself.

Troubleshooting Event Hub Issues

Ginger Grant walks through a couple of issues you might run into with Event Hub:

The input for this stream is set to an event hub which has a standard subscription. The basic subscription, which is of course cheaper, has one default consumer group. With a standard subscription multiple consumer groups can be created and more importantly named. When setting up the inputs there is a blank for the name of the consumer group. If you have a basic subscription this will be empty. If it is empty, then the event hub won’t pass data to the stream analytics job. Perhaps there is a way to get a basic event hub to work with a stream analytics job, but I couldn’t make it happen. When I created an event hub with a standard subscription and created a consumer group and added that name to the input of a streaming analytics job, it worked.

Read on for details.

Powershell Failing On Error

Michael Bourgon has a helpful tip for those CmdExec SQL Agent jobs which run Powershell scripts which won’t fail on error:

However, when run via SQL Agent, it succeeds.  GAH!
I tried 50 different variations; modifying the script, various TRY..CATCH blocks found on the internet.  Nothing.  Every single one of them succeeded.

Then I remembered that by default, even though it had an error, by default errors always continue.  ($ErrorActionPreference=”Continue”.  So I added this line at the top:

Read on for the answer.

Try-Catch Blocks

Dave Mason discusses a major problem with try-catch blocks:

We didn’t just lose one error. We lost them all! DBCC CHECKDB on a corrupt database just came back “clean” and the CATCH block never executed. We’ve moved into dangerous territory now. This is no longer an inconvenience or an annoyance. This could lead to data loss, financial harm to your company, maybe even some risk to your career.

Dave promises a better approach in his next post.

Categories

August 2019
MTWTFSS
« Jul  
 1234
567891011
12131415161718
19202122232425
262728293031