Press "Enter" to skip to content

Curated SQL Posts

More Collation Comparison Fun

Solomon Rutzky wraps up his collation comparison series:

Just in case you were wondering: No, the test directly above test does not prove the documentation (as quoted at the top of Part A) correct. Yes, the documentation did state that characters would be converted to the Code Page specified by the Collation of the Database or Column, which does appear to be what is going on here. But, the differences are:

  1. The documentation states that the transformation happens if you reference a Unicode datatype, but what we just saw in the most recent test is the exact opposite:

    • only the NVARCHAR columns of the “Subscript 2” row match because they are still the “Subscript 2” character, while the NVARCHAR columns of the other two rows do not match due to being either “2” or “?”.
    • transformation did occur in the Latin1 and Hebrew VARCHAR columns, which is how it matched both rows with “2” in the Latin1 column and both rows with “?” in the Hebrew column.
  2. In the scenario involving another column where it would help to prefix the string literal with a capital-“N” (i.e. a VARCHAR column using a Collation that specifies a different Code Page than the Database’s Code Page), the Collation of the Database is not used for the transformation; it is only the referenced column’s Collation. The assumption here is that the string literal without the capital-“N” is being used in a Database where the Collation specifies a Code Page that has all of the characters.

  3. In the scenarios where the Database’s Collation, via its specified Code Page, did transform a string literal that was not prefixed with a capital-“N” (the two tests in the previous post), there would have already been unintended behavior no matter how the string literal was used.

I still think it’s a duck.

Comments closed

Using List.Accumulate

Reza Rad shows off a Power Query function which is not available in the UI:

List.Accumulate is a function that can easily save a number of steps in your Power Query transformations, instead of applying multiple steps, you can simply use List.Accumulate to overcome what you want. List.Accumulate function loops through the list and accumulate a value as a result. This function needs usually three parameters; the list itself, seed, and accumulator. Here are parameters explained in details;

best way to learn about seed and accumulator is through some examples, let’s apply some transformations with List.Accumulate and see how these two parameters are working.

Read on to see how to use it.

Comments closed

Optimized Bitmaps On Columnstore Indexes

Joe Obbish digs into bitmap filters and clustered columnstore indexes:

The position of the bitmap has changed so that it’s evaluated after the key lookup. That makes sense because the key lookup returns the column to be filtered against. However, the bitmap filter still reduces the estimated number of key lookups from 3000000 to 3000. This is impossible. The filter can only be applied after the key lookup, so it does not make sense for the bitmap to reduce the number of estimated executions of the key lookup.

Performance is significantly worse with the query now requiring 12199107 logical reads from the rowstore table and 13406 CPU time overall. We can see that the query did three million key lookups:

This is a fairly deep post, so you’ll probably want to check out the Paul White post on bitmaps first.

Comments closed

Units Of Measure In The ShowPlan Schema

Grant Fritchey shows off the ShowPlan Schema:

Because the showplan schema contains notes throughout stating what the units of measure are, what each of the values means. For instance, I can explain why SerialDesiredMemory, DesiredMemory, RequestedMemory are identical:

…Provide memory grant estimate as well as actual runtime memory grant information. Serial required/desired memory attributes are estimated during query compile time for serial execution. The rest of attributes provide estimates and counters for query execution time considering actual degree of parallelism. SerialRequiredMemory: Required memory in KB if the query runs in serial mode. The query will not start without this memory. SerialDesiredMemory: Memory estimated to fit intermediate results in KB if the query runs in serial mode. RequiredMemory: Required memory in KB for the chosen degree of parallelism. If the query runs in serial mode, this is the same as SerialRequiredMemory. …

That’s taken directly from the 2017 schema. The units of measure are KB.

I’d never seen this before, so that’s going on my to-read list.

Comments closed

Kafka Streams And Time-Based Batching

Vladimir Vajda provides a warning for people using Kafka Streams:

To completely understand the problem, we will first go into detail how ingestion and processing occur by default in Kafka Streams. For example purposes, the punctuate method is configured to occur every ten seconds, and in the input stream, we have exactly one message per second. The purpose of the job is to parse input messages, collect them, and, in the punctuate method, do a batch insert in the database, then to send metrics.

After running the Kafka Stream application, the Processor will be created, followed by the initmethod. Here is where all the connections are established. Upon successful start, the application will listen to input topic for incoming messages. It will remain idle until the first message arrives. When the first message arrives, the process method is called — this is where transformations occur and where the result is stored for later use. If no messages are in the input topic, the application will go idle again, waiting for the next message. After each successful process, the application checks if punctuate should be called. In our case, we will have ten process calls followed by one punctuate call, with this cycle repeating indefinitely as long as there are messages.

A pretty obvious behavior, isn’t it? Then why is one bolded?

Read on for more, including how to handle this edge case.

Comments closed

Kafka And GDPR

Ben Stopford has some ideas for using Kafka in a GDPR world:

The simplest way to remove messages from Kafka is to simply let them expire. By default, Kafka will keep data for two weeks, and you can tune this to arbitrarily large periods of time as required. There is also an Admin API that lets you delete messages explicitly if they are older than some specified time or offset. But what if we are keeping data in the log for a longer period of time, say for Event Sourcing architectures or as a source of truth? For this, you can make use of compacted topics, which allow messages to be explicitly deleted or replaced by key.

Data isn’t removed from compacted topics in the same way as in a relational database. Instead, Kafka uses a mechanism closer to those used by Cassandra and HBase where records are marked for removal then later deleted when the compaction process runs. Deleting a message from a compacted topic is as simple as writing a new message to the topic with the key you want to delete and a null value.  When compaction runs the message will be deleted forever.

Click through for more information.

Comments closed

DevOps And Culture

Alex Yates points out a problem in how many people think of DevOps:

There is so much that is wrong with that conversation.

  • We could talk about the bottlenecks and the large amount of work in progress backed up in test – and the ways that could be fixed,

  • We could talk about the infrequent ‘big bang’ release in three months and the manual, error prone heroics that will probably be required to deliver it – and the ways that could be fixed,

  • We could talk about the misguided approach regarding branching strategies or the shared development database – and the ways they could be fixed,

  • We could talk about testing silos and the likelihood of drift and inconsistencies between different environments – and the ways they could be fixed,

  • We could talk about the word “DevOps-ing” – and why it should be burned along with anyone who uses it un-ironically. (And anyone who uses the word “irony” inappropriately or puts their commas at the end of the line.),

But I’m not going to talk about any of those things. I’m not going to talk about any of the things the customer said. I’m not going to talk about any of the technical issues or the possible solutions to those problems.

I’m going to talk about something much, much more important.

Read on to see what is much, much more important:  culture.

Comments closed

Troubleshooting Client Disconnections

Mike Hays looks at a trace flag that can help you troubleshoot why client connections drop:

About once a month, I get support ticket regarding SQL Server dropping an application’s or user’s connection.  The problem is SQL Server does not just randomly drop a connection and continue to work normally.  Some force outside the control of SQL Server breaks the connection.  By default, SQL Server does not record when this event occurs.

In my history of working with SQL Server, only in extreme situations have I ever seen SQL Server drop its connections.  The most common example is when SQL Server is in the process of shutting down.

Click through for the trace flag and details.

Comments closed

Collations Used In NVarchar To Varchar Conversions

Solomon Rutzky digs into collations:

So, the actual question is a bit more specific than would reasonably fit into a title, and it is:

In a WHERE condition (in Microsoft SQL Server, if that’s not obvious), when a string constant containing Unicode characters but not prefixed with a capital-N is compared to a column, which Collation (and hence Code Page) is used to do the implicit conversion of the Unicode characters into VARCHAR? Is it the Database’s default Collation, or the column’s Collation?

The answer has a lot more nuance than you might first think.  And for the record, it’s a duck.

Comments closed

Thinking About Scalable Persistent Memory

Ned Otter has a good post thinking about Scalable Persistent Memory:

There were other potential issues when using Persistent Memory, detailed in this blog post. But what’s not covered in that post is the fact that deploying NVIDMM-N reduced the memory speed and/or capacity, because they are not compatible with LRDIMM. This causes you to use RDIMM, which reduces capacity, and because NVDIMM-N operates at a slower speed than RDIMM, it also affects total memory speed.

HP has since released Gen10 servers, and they have changed the landscape for those seeking reduced latency by storing larger data sets in memory. For one thing, they raise the bar for what’s now referred to as Scalable Persistent Memory, with a total server capacity of 1TB. To be clear, NVDIMM-N is not used in this configuration. Instead, regular DIMMs are used, and they are persisted to flash via a power source (this was also the case for NVDIMM-N, but both the flash, DIMM, and power source were located on the NVDIMM-N).

Check it out.  I’m happy that things are improving, but it sounds like this won’t be a panacea.

Comments closed