Press "Enter" to skip to content

Curated SQL Posts

Parallel CHECKDB

Arun Sirpal shows us when DBCC CHECKDB can go parallel:

DBCC CHECKDB has the ability to perform parallel checking of objects. However, it absolutely depends on the edition of SQL Server, it only happens when using enterprise edition.

Let’s see this in action. I propose the following tests for this blog post:

  • Test on a SQL Server Enterprise Edition.
  • Test on a non-enterprise edition of SQL Server.

I don’t have 2017 Enterprise at hand but I do have 2014 Enterprise and Express handy so it makes sense to use these versions for my “experiment”.

Read on to see the difference.

Comments closed

DataRow To JSON With Powershell

Rob Sewell shows how to convert a .NET DataRow into its JSON form using Powershell:

I wanted to be able to Mock $variable. I wrapped the code above in a function, let’s call it Run-Query

Which meant that I could easily separate it for mocking in my test. I ran the code and investigated the $variable variable to ensure it had what I wanted for my test and then decided to convert it into JSON using ConvertTo-Json

Read on to see the fun mess that ConvertTo-Json made and then Rob’s simplification.

Comments closed

Perceiving Forms: Gestalt Principles

Meagan Longoria describes the basics of the Gestalt school of thought:

Gestalt principles can help us highlight patterns and reduce noise in data visualization. They can also help us create a visual hierarchy and employ symmetry in our designs for a more pleasing user experience.

Some Gestalt principles are very similar to our understanding of preattentive attributes. This is demonstrated in the set of 3 charts below.

Meagan gives a good overview of the concepts and uses a Power BI example to apply them.  There’s some really good advice in here.

Comments closed

Alerting On SQL Server Blocking

Andy Mallon has built a process to e-mail him when there’s excessive blocking:

  1. Only alert if there’s actually a problem that is actionable. Please don’t throw email alerts at me if there’s nothing for me to do. Don’t be the boy who cried wolf. You probably already have an email rule that ignores alerts you don’t care about. Don’t make more spam.

  2. Only alert if the problem is critical. In addition to my #1 requirement that it be actionable, I should have to act now. Email alerts are a cry for help, and aren’t appropriate for every problem. For lower-priority problems, I prefer digest reports, where I can set aside time in my day to work on many lower-priority issues and take care of them all at once.

I cannot agree enough with the cited section—this is so important for alerts, but something we tend not to think about.  That’s why I have tens of thousands of unread alerts in my inbox tagged for auto-deletion (most of which alerts I didn’t create and don’t relate to me at all, but I can’t go breaking somebody else’s workflow).

Comments closed

So You *Really* Want To Monitor Kafka…

Yeva Byzek walks through Confluent Platform:

Kafka exposes hundreds of metrics. Some of them are per broker, per client, per topic, and per partition, and so the number of metrics scales up as the cluster grows. For an average-size Kafka cluster, the number of metrics very quickly bloats to the thousands.

Warning: I am about to disappoint you. You probably recognize that you realistically cannot monitor every single available metric. So you are probably hoping that in this blog post I will filter down the list of metrics to a dozen of the most critical ones, which you would then push through some generic monitoring tool, and then be done with setting up “monitoring.” However, monitoring distributed systems like Kafka is not that simple, and so there is no such list. Keep reading to understand the problems you should be solving, and how to solve them in a robust monitoring solution specifically designed for Kafka.

A common pitfall of generic monitoring tools is that they import all available metrics from a variety of systems into a metrics swamp. Even with a comprehensive list of metrics, there is a limit to what can be achieved with no Kafka context nor Kafka expertise to determine which metrics are important and which ones are not. A metrics swamp cannot produce valuable insight from the data nor provide answers to the critical business questions we asked earlier.

This is an information-dense post that you’ll want to read if you work with Apache Kafka.

Comments closed

Hadoop 3.0 Ships

Alex Woodie reports that Hadoop 3.0 is officially out there, and looks at what’s forthcoming in 3.1 and 3.2:

As we told you about last week, Hadoop 3.0 brings two big new features that are compelling in their own right. That includes support for erasure coding, which should boost storage efficiency by 50% thanks to more efficient data replication; and YARN Federation, which should allow Hadoop clusters to scale up to 40,000 nodes.

The delivery of Hadoop 3.0 shows that open open source community is responding to demands of industry, said Doug Cutting, original co-creator of Apache Hadoop and the chief architect at Cloudera.

“It’s tremendous to see this significant progress, from the raw tool of eleven years ago, to the mature software in today’s release,” he said in a press release.  “With this milestone, Hadoop better meets the requirements of its growing role in enterprise data systems.

But some of the new features in Hadoop 3.0 weren’t designed to bring immediate rewards to users. Instead, they pave the way for the Apache Hadoop community to deliver more compelling features with versions 3.1 and versions 3.2, according to  Hortonworks director of engineering Vinod Kumar Vavilapalli, who’s also a committer on the Apache Hadoop project.

“Hadoop 3.0 is actually a building block, a foundation, for more exciting things to come in 3.1 and 3.2,” he said.

Click through to see some of those exciting things.

Comments closed

So You Want To Monitor Kafka…

Ben Summer reviews a Kafka catstrophe and explains how to avoid it:

Here at New Relic, the Edge team is responsible for the pipelines that handle all the data coming into our company. We were an early adopter of Apache Kafka, which we began using to power this data pipeline. Our initial results were outstanding. Our cluster handled any amount of data we threw at it; it showed incredible fault tolerance and scaled horizontally. Our implementation was so stable for so long that we basically forgot about it. Which is to say, we totally neglected it. And then one day we experienced a catastrophic incident.

Our main cluster seized up. All graphs, charts, and dashboards went blank. Suddenly we were totally in the dark — and so were our customers. The incident lasted almost four hours, and in the end, an unsatisfactory number of customers experienced some kind of data loss. It was an epic disaster. Our Kafka infrastructure had been running like a champ for more than a year and suddenly it had ground to a halt.

This happened several years ago, but to this day we still refer to the incident as the “Kafkapocalypse.”

Ben has a couple interesting stories and some good rules of thumb for maintaining a Kafka cluster.

Comments closed

JDBC Roundtrip Costs

Lukas Eder shows the performance cost of doing row-by-row data retrieval:

The obvious difference between the JDBC benchmark and the PL/SQL one is the fact that the JDBC call has to traverse a vast amount of logic, APIs, “barriers” between the JVM and the Oracle kernel before it can actually invoke the really interesting part. This includes:

  • JVM overhead
  • JDBC logic
  • Network overhead
  • Various “outer” layers inside the Oracle database
  • Oracle’s API layers to get into the SQL and PL/SQL execution engines
  • The actual code running in the PL/SQL engine

In Toon’s talk (which again, you should definitely watch), the examples are running SQL code, not PL/SQL code, but the results are the same. The actual logic is relatively cheap inside of the database (as we’ve seen in the PL/SQL only benchmark), but the overhead is significant when calling database logic from outside the database.

Thus: It is very important to minimise that overhead

This particular example focuses on Oracle and JDBC, but it certainly applies to other database platforms and distributed architectures.

Comments closed

Considerations When Using HTTPS For TFS

Hamish Watson walks us through what to do when we want to start using a certificate to encrypt Team Foundation Server traffic:

I will assume that you already have TFS setup and are just using HTTP and want to make things a bit more secure with HTTPS. I am also assuming that you will be using port 443 for HTTPS traffic.

To update TFS to use HTTPS you need to do a couple of things:

  1. Have a legitimate certificate installed on the server that you can bind to

  2. Have an IP address on the server and have firewall access setup to that IP address on port 443

But there are a few more steps as well, so click through to see them all.

Comments closed

Running SQL On Linux In Windows For Linux

Anthony Nocentino troubleshoots an error when trying to run SQL Server on Linux using the Windows Subsystem For Linux:

The first thing I had to do was reproduce the issue. So on my Windows 10 test VM I installed the Windows Subsystem for Linux, steps to do so are here and I installed the Ubuntu app.

Then, I fired up a bash shell using WSL and then I installed SQL Server on Linux for Ubuntu as documented here.

Now, I completed the installation of SQL Server on Linux using mssql-conf when that program completes it attempts to start SQL Server on Linux. BOOM! I’m able to reproduce the same error.

Looking at the error, I decided to see if I could run SQL Server on Linux from the shell as the user mssql. This would remove systemd and mssql-conf from the picture. Basically I wanted to see if I could get another, more descriptive, error to pop out.

Anthony digs out a very useful debugging tool in Linux, strace.  Sadly, he’s not able to solve the problem at the moment, but at least gets us a step in the right direction.

Comments closed