Press "Enter" to skip to content

Curated SQL Posts

Kafka Consumer

I build a consumer and aggregator of Kafka data:

From here, I hook into the OnMessage event just like before, and like before we decode the Kafka payload and turn it into a string.  Unlike before, however, I call Newtonsoft’s DeserializeObject method and return a Flight type, which I’ve defined above.  This is the same definition as in the Producer, so in a production-quality environment, I’d pull that out to a single location rather than duplicating it.

Going back to the main function, I call the consumer.Start() method and let ‘er rip.  When I’m ready to aggregate, I’ll hit the enter key and that’ll call consumer.Stop().  When that happens, I’m going to have up to 7 million records in a list called flights.  Out of all of this information, I only need two attributes:  the destination state and the arrival delay in minutes.  I get those by using the map function on my sequence of flights, taking advantage of F#’s match syntax to get all relevant scenarios safely and put the result into a tuple.  The resulting sequence of tuples is called flightTuple.  I pass that into the delaysByState function.

By the time I give this presentation, I’m going to change the way I aggregate just a little bit to cut down on the gigs of RAM necessary to do this operation.  But hey, at least it works…

Comments closed

Computed Columns And Indexes

Derik Hammer looks at computed columns with a bonus section on unique indexes (or maybe vice versa):

What can an index do that a constraint cannot?

  • Set FILL FACTOR.
  • Add non-key (INCLUDED) columns.
  • Data compression.
  • Lock management settings.

What can a constraint do that an index cannot?

The only potential benefit I was able to find, or think of, is that constraints can be disabled. If you could enable and disable a unique constraint, that could be one feature that the unique index does not. This is counter-intuitive knowing that the unique constraint is enforced with a unique index behind the scenes.

For me, the big benefit of unique constraints over unique indexes is that they provide a logical separation.  Unique constraints show up in the Keys section in Management Studio and let you say, “Yes, I made this thing because the data model requires uniqueness here.”  By contrast a unique index can tell the same thing, but could also say “for the subset of data in a filtered index, I can assume uniqueness” or “for performance reasons, this combination is unique, but its uniqueness is not important to the logical data model.”  I will happily admit that I’m splitting the hair pretty thin on this one and that in practice, there are benefits to using unique indexes across the board.

Comments closed

Snippets

Kenneth Fisher has a demonstration of snippets in Management Studio:

A while ago I talked about Templates. This is an easy way to get a, well, template of a piece of code. But a much faster way to get a simple template is to use asnippet. With a simple key command (ctrl-K, ctrl-X) you can open up the snippet picker and quickly navigate to the snippet you need.

Note: You can also open the snippet picker by right clicking and selectingInsert Snippet… or from the menu Edit -> Intellisense -> Insert Snippet…

Further note: snippet picker sounds really odd but it was the name in BOL so we’ll go with it.

Click through for a demo.

Comments closed

Page And Key WaitResources For Deadlocks

Kendra Little explains page and key information in deadlock graphs and blocking chains:

1.4) Can I see the data on the page that was locked?

Well, yes. But … do you really need to?

This is slow even on small tables. But it’s kinda fun, so… since you read this far… let’s talk about %%physloc%%!

%%physloc%% is an undocumented piece of magic that will return the physical record locator for every row. You can  use %%physloc%% with sys.fn_PhysLocFormatter in SQL Server 2008 and higher.

This was a very interesting read; check it out.

Comments closed

Containerized SQL Server 2016 Express Edition

The SQL Server team announces a Docker image for SQL Server 2016 Express Edition:

SQL Server 2016 in a Windows container would be ideal when you want to:

  1. Quickly create and start a set of SQL Server instances for development or testing.

  2. Maximize density in test or production environments, especially in microservice architectures.

  3. Isolate and control applications in a multi-tenant infrastructure.

Works on Windows only.  Given that SQL Server on Linux is going to be a thing, I could see general release through Docker there as well, but we’ll see.

Comments closed

Rebuild Or Reorganize?

Kendra Little answers the age-old question:

With very large indexes, rebuilds take longer, generate more log, impact performance more while they’re running.

If you’re using high availability features like Availability Groups or database mirroring that stream the log between replicas, generating a lot of log data very quickly can create problems.

Your replica/ mirrors may fall behind. Depending on the latency between replicas, the size of the indexes rebuilt, and other operations in the database, they may be out of your Recovery Point Objective / Recovery Time objective for a long time.

In this situation, it’s particularly attractive to drip changes into the log more slowly. One of the ways to do this is to use REORGANIZE for those large indexes.

There’s a lot of nuance here, so give it a read (or watch the video).

Comments closed

Spark Overview

Jen Underwood provides an overview of the Apache Spark project:

Spark provides a comprehensive framework to manage big data processing with a variety of data set types including text and graph data. It can also handle batch pipelines and real-time streaming data. Using Spark libraries, you can create big data analytics apps in Java, Scala, Clojure, and popular R and Python languages.

Spark brings analytics pros an improved MapReduce type query capability with more performant data processing in memory or on disk. It can be used with datasets that are larger than the aggregate memory in a cluster. Spark also has savvy lazy evaluation of big data queries which helps with workflow optimization and reuse of intermediate results in memory. TheSpark API is easy to learn.

One of my taglines is, Spark is not the future of Hadoop; Spark is the present of Hadoop.  If you want to get into this space, learn how to work with Spark.

Comments closed