Press "Enter" to skip to content

Month: December 2017

Multi-Object JSON Arrays In SQL Server

Bert Wagner shows how to build JSON arrays in SQL Server:

When using FOR JSON PATH, ALL rows and columns from that result set will get converted to a single JSON string.

This creates a problem if, for example, you want to have a column for your JSON string and a separate column for something like a foreign key (in our case, HomeId). Or if you want to generate multiple JSON strings filtered on a foreign key.

The way I chose to get around this is to use CROSS APPLY with a join back to our Home table — this way we get our JSON string for either Cars or Toys created but then output it along with some additional columns.

Impedance mismatch?  What impedance mismatch?

Comments closed

Parallel CHECKDB

Arun Sirpal shows us when DBCC CHECKDB can go parallel:

DBCC CHECKDB has the ability to perform parallel checking of objects. However, it absolutely depends on the edition of SQL Server, it only happens when using enterprise edition.

Let’s see this in action. I propose the following tests for this blog post:

  • Test on a SQL Server Enterprise Edition.
  • Test on a non-enterprise edition of SQL Server.

I don’t have 2017 Enterprise at hand but I do have 2014 Enterprise and Express handy so it makes sense to use these versions for my “experiment”.

Read on to see the difference.

Comments closed

DataRow To JSON With Powershell

Rob Sewell shows how to convert a .NET DataRow into its JSON form using Powershell:

I wanted to be able to Mock $variable. I wrapped the code above in a function, let’s call it Run-Query

Which meant that I could easily separate it for mocking in my test. I ran the code and investigated the $variable variable to ensure it had what I wanted for my test and then decided to convert it into JSON using ConvertTo-Json

Read on to see the fun mess that ConvertTo-Json made and then Rob’s simplification.

Comments closed

Perceiving Forms: Gestalt Principles

Meagan Longoria describes the basics of the Gestalt school of thought:

Gestalt principles can help us highlight patterns and reduce noise in data visualization. They can also help us create a visual hierarchy and employ symmetry in our designs for a more pleasing user experience.

Some Gestalt principles are very similar to our understanding of preattentive attributes. This is demonstrated in the set of 3 charts below.

Meagan gives a good overview of the concepts and uses a Power BI example to apply them.  There’s some really good advice in here.

Comments closed

Alerting On SQL Server Blocking

Andy Mallon has built a process to e-mail him when there’s excessive blocking:

  1. Only alert if there’s actually a problem that is actionable. Please don’t throw email alerts at me if there’s nothing for me to do. Don’t be the boy who cried wolf. You probably already have an email rule that ignores alerts you don’t care about. Don’t make more spam.

  2. Only alert if the problem is critical. In addition to my #1 requirement that it be actionable, I should have to act now. Email alerts are a cry for help, and aren’t appropriate for every problem. For lower-priority problems, I prefer digest reports, where I can set aside time in my day to work on many lower-priority issues and take care of them all at once.

I cannot agree enough with the cited section—this is so important for alerts, but something we tend not to think about.  That’s why I have tens of thousands of unread alerts in my inbox tagged for auto-deletion (most of which alerts I didn’t create and don’t relate to me at all, but I can’t go breaking somebody else’s workflow).

Comments closed

So You *Really* Want To Monitor Kafka…

Yeva Byzek walks through Confluent Platform:

Kafka exposes hundreds of metrics. Some of them are per broker, per client, per topic, and per partition, and so the number of metrics scales up as the cluster grows. For an average-size Kafka cluster, the number of metrics very quickly bloats to the thousands.

Warning: I am about to disappoint you. You probably recognize that you realistically cannot monitor every single available metric. So you are probably hoping that in this blog post I will filter down the list of metrics to a dozen of the most critical ones, which you would then push through some generic monitoring tool, and then be done with setting up “monitoring.” However, monitoring distributed systems like Kafka is not that simple, and so there is no such list. Keep reading to understand the problems you should be solving, and how to solve them in a robust monitoring solution specifically designed for Kafka.

A common pitfall of generic monitoring tools is that they import all available metrics from a variety of systems into a metrics swamp. Even with a comprehensive list of metrics, there is a limit to what can be achieved with no Kafka context nor Kafka expertise to determine which metrics are important and which ones are not. A metrics swamp cannot produce valuable insight from the data nor provide answers to the critical business questions we asked earlier.

This is an information-dense post that you’ll want to read if you work with Apache Kafka.

Comments closed

Hadoop 3.0 Ships

Alex Woodie reports that Hadoop 3.0 is officially out there, and looks at what’s forthcoming in 3.1 and 3.2:

As we told you about last week, Hadoop 3.0 brings two big new features that are compelling in their own right. That includes support for erasure coding, which should boost storage efficiency by 50% thanks to more efficient data replication; and YARN Federation, which should allow Hadoop clusters to scale up to 40,000 nodes.

The delivery of Hadoop 3.0 shows that open open source community is responding to demands of industry, said Doug Cutting, original co-creator of Apache Hadoop and the chief architect at Cloudera.

“It’s tremendous to see this significant progress, from the raw tool of eleven years ago, to the mature software in today’s release,” he said in a press release.  “With this milestone, Hadoop better meets the requirements of its growing role in enterprise data systems.

But some of the new features in Hadoop 3.0 weren’t designed to bring immediate rewards to users. Instead, they pave the way for the Apache Hadoop community to deliver more compelling features with versions 3.1 and versions 3.2, according to  Hortonworks director of engineering Vinod Kumar Vavilapalli, who’s also a committer on the Apache Hadoop project.

“Hadoop 3.0 is actually a building block, a foundation, for more exciting things to come in 3.1 and 3.2,” he said.

Click through to see some of those exciting things.

Comments closed

So You Want To Monitor Kafka…

Ben Summer reviews a Kafka catstrophe and explains how to avoid it:

Here at New Relic, the Edge team is responsible for the pipelines that handle all the data coming into our company. We were an early adopter of Apache Kafka, which we began using to power this data pipeline. Our initial results were outstanding. Our cluster handled any amount of data we threw at it; it showed incredible fault tolerance and scaled horizontally. Our implementation was so stable for so long that we basically forgot about it. Which is to say, we totally neglected it. And then one day we experienced a catastrophic incident.

Our main cluster seized up. All graphs, charts, and dashboards went blank. Suddenly we were totally in the dark — and so were our customers. The incident lasted almost four hours, and in the end, an unsatisfactory number of customers experienced some kind of data loss. It was an epic disaster. Our Kafka infrastructure had been running like a champ for more than a year and suddenly it had ground to a halt.

This happened several years ago, but to this day we still refer to the incident as the “Kafkapocalypse.”

Ben has a couple interesting stories and some good rules of thumb for maintaining a Kafka cluster.

Comments closed

JDBC Roundtrip Costs

Lukas Eder shows the performance cost of doing row-by-row data retrieval:

The obvious difference between the JDBC benchmark and the PL/SQL one is the fact that the JDBC call has to traverse a vast amount of logic, APIs, “barriers” between the JVM and the Oracle kernel before it can actually invoke the really interesting part. This includes:

  • JVM overhead
  • JDBC logic
  • Network overhead
  • Various “outer” layers inside the Oracle database
  • Oracle’s API layers to get into the SQL and PL/SQL execution engines
  • The actual code running in the PL/SQL engine

In Toon’s talk (which again, you should definitely watch), the examples are running SQL code, not PL/SQL code, but the results are the same. The actual logic is relatively cheap inside of the database (as we’ve seen in the PL/SQL only benchmark), but the overhead is significant when calling database logic from outside the database.

Thus: It is very important to minimise that overhead

This particular example focuses on Oracle and JDBC, but it certainly applies to other database platforms and distributed architectures.

Comments closed

Considerations When Using HTTPS For TFS

Hamish Watson walks us through what to do when we want to start using a certificate to encrypt Team Foundation Server traffic:

I will assume that you already have TFS setup and are just using HTTP and want to make things a bit more secure with HTTPS. I am also assuming that you will be using port 443 for HTTPS traffic.

To update TFS to use HTTPS you need to do a couple of things:

  1. Have a legitimate certificate installed on the server that you can bind to

  2. Have an IP address on the server and have firewall access setup to that IP address on port 443

But there are a few more steps as well, so click through to see them all.

Comments closed