Press "Enter" to skip to content

Category: Error Handling

Surviving a Kafka Outage

Jakub Korab walks us through availability features in Kafka as well as what to expect if your brokers are unavailable:

In the case of an outage, you have to ensure that these messages can be processed eventually. Keeping unsent messages around and retrying indefinitely in the hopes that the outage will rectify may eventually result in your application running out of memory. This is a crucial consideration in high-throughput applications.

If business functions are performed by systems downstream of Kafka, and the sending application only acts as an ingestion point, the situation is slightly more relaxed. If Kafka is unavailable to send messages to, then no external activity has taken place. For these systems, a Kafka outage might mean that you do not accept new transactions. In such a case, it may be reasonable to return an error message and allow the external third party to retry later. Retail applications typically fall into this category.

Read the whole thing.

Leave a Comment

Windows Server Failover Clustering Error Code 5054

Josh Darnell walks us through an error when setting up an Availability Group:

For setting up the environment, I was following this really in-depth guide from former Data Platform MVP and current Microsoft employee Ryan J. Adams: Build a SQL Cluster Lab Part 1

The guide is generally fantastic, and provides a lot of good insight into the non-SQL Server related aspects of setting up an Availability Group. I’d highly recommend checking it out if you’re interested in that sort of thing.

Relevant to this post, he has provided a diagram of how the different networks are configured:

If you’re very experienced with networking, you may already have some idea of what the problem is going to be. Don’t spoil it for everyone else okay?

I’ll admit I did not have an idea of what the problem was.

Comments closed


Alex Stuart hits a weird error:

Conversion/overflow errors aren’t that unusual – normally a data flow broken by some unexpected data (“no, there’s no chance that field would ever have a character in it”), or perhaps a column hitting max size (“INT will be enough for years, like, 5 years. I’ll have left the company by then”)

But that wasn’t the case here – the package and user tables involved were checked by the dev team and there was no possible overflow. I’d checked system databases for maxed-out identity columns and found nothing. Heads were scratched.

Read on for the post-head-scratch answer.

Comments closed

THROW and Linked Servers

Chad Baldwin hits on an interesting result when using THROW across a linked server:

The THROW command is non-terminating if it is used in a stored procedure over a linked-server.

I don’t know the details to why it works this way. The THROW command returns an error message with a severity level of 16, which, according to my RAISERROR Cheatsheet, does not stop execution.

There’s something special about the THROW command beyond raising an error message. Behind the scenes, there is likely some extra information being passed to tell SQL Server that execution needs to stop in that moment, and that extra bit of information does not appear to be passed between linked servers.

Click through for a demo.

Comments closed

When You Can’t Win for Trying: SQL Agent Failures

Garry Bargsley troubleshoots a strange issue:

What are your troubleshooting steps when a job failure is reported?

1. Open SQL Agent Job History for the failed job
2. Look at the SQL Server Log
3. This was an Ola job, so look at the CommandLog table
4. Look at the text log file stored on the file system
5. Open the job step and get the code being executed and run in a new query window

Now that you and I have performed the same steps and found no smoking gun, where do you go next?

But what happens when all of your indicators look fine, yet the job is still failing? Read on for one possible answer.

Comments closed

Error Messages on SSDT Database Project Deployments

Chris Johnson has some advice if you’re hitting an error when deploying a SQL Server Data Tools database project:

Today I’d like to talk about three error messages you might see when deploying an SSDT database project, either through Visual Studio or via a dacpac and script. I’m going to focus here on what you see from inside of Visual Studio, but you will see similar errors returned when you deploy using a script and the reasons behind them will be the same.

Read on for Chris’s findings. These errors definitely aren’t a complete survey of possible messages, but they do hit some of the less obvious cases.

Comments closed

Configurable Retry in Microsoft.Data.SqlClient

Hasan Savran notes an improvement to the Microsoft.Data.SqlClient library:

You need to watch for Transient errors if you use SQL Server in Azure. Transient errors or Retriable errors can occur any time and your application should be smart enough to retry these failed operations. Azure might quickly shift hardware resources of your database to give you a better load-balance, when this happens your application might not be able to connect to the database. Since these reconfiguration events completes quickly, your application needs to be designed to handle these faults.This adds more complexity to your code because you need to write code to handle this manually. 

      Preview version of  Microsoft.Data.SqlClient library now supports RetryLogic function, you do not need to write any manual code to handle Transient or retriable errors anymore. 

Click through for more details as well as a demonstration. I’m surprised it took this long, to be honest—useful retry logic is exactly the type of thing which should be in the bowels of a library rather than littered throughout business code (or worse, not even in business code).

Comments closed

Bad Request when Debugging an Azure Data Factory Pipeline

Ed Elliott ran into a problem:

Now, whenever I am troublehooting something in Azure and I come to the activity logs I am always hopeful but also always dissapointed that they don’t show more details. The bit that really annoys me is that I know Micrsoft see more detailed error information as I have been screen sharing with a support tech who used log exporer to see more detailed error messages than I see – grrrr, just show us the data! Anyway, I digress – so in the activity log, does it give a clue as to what is wrong?

No, in a word no it doesn’t. 

Read on for the conclusion, which rates as “Should have been an easy fix but the error message was completely unhelpful.”

Comments closed

Row-Level Security and UseRelationship

Teo Lachev points out an issue when combining row-level security with the USERELATIONSHIP() function in a Tabular model:

You’ve created a beautiful, wide-open Tabular model. You use USERELATIONSHIP() to switch relationships on and off. Everything works and everyone is pleased. Then RLS sneaks in, such as when external users need access, and you must secure on some dimension table. You create a role, specify a row filter, test the role, and get greeted with:

The UseRelationship() and CrossFilter() functions may not be used when querying ‘<dimension table>’ because it is constrained by row-level security defined on ‘<dimension table>’ or related tables.

Read on to learn what the issue is and one potential workaround.

Comments closed