Press "Enter" to skip to content

Curated SQL Posts

Tips For Running Kafka Streams On AWS

Ian Duffy and Nina Hanzlikova have some advice if you’re looking to spin up some EC2 instances to run Kafka Streams:

With upgrades in the underlying Kafka Streams library, the Kafka community introduced many improvements to the underlying stream configuration defaults. Where in previous, more unstable iterations of the client library we spent a lot of time tweaking config values such as session.timeout.ms, max.poll.interval.ms, and request.timeout.ms to achieve some level of stability.

With new releases we found ourselves discarding these custom values and achieving better results. However, some timeout issues persisted on some of our services, where a service would frequently get stuck in a rebalancing state. We noticed that reducing the max.poll.records value for the stream configs would sometimes alleviate issues experienced by these services. From partition lag profiles we also saw that the consuming issue seemed to be confined to only a few partitions, while the others would continue processing normally between re-balances. Ultimately we realised that the processing time for a record in these services could be very long (up to minutes) in some edge cases. Kafka has a fairly large maximum offset commit time before a stream consumer is considered dead (5 minutes) but with larger message batches of data this timeout was still being exceeded. By the time the processing of the record was finished the stream was already marked as failed and so the offset could not be committed. On rebalance, this same record would once again be fetched from Kafka, would fail to process in a timely manner and the situation would repeat. Therefore for any of the affected applications we introduced a processing timeout, ensuring there was an upper bound on the time taken by any of our edge cases.

There are some interesting tidbits in here.

Comments closed

Testing Cosmos DB Performance With Geospatial Data

Vincent-Philippe Lauzon has done some performance testing of Cosmos DB when querying geospatial data:

Here are the main attributes of the sample set:

  • There are 1 200 000 documents
  • Documents are distributed on 4000 logical partitions with 300 documents per logical partition
  • %33 of documents (i.e. 400 000 documents) have a location node with a geospatial “point” in there
  • Points are scattered uniformly on the geospatial rectangle
  • There are no correlation between the partition key and the geospatial point coordinates

We ran the tests with 4 different Request Units (RUs) configurations:

  • 2500

  • 10000

  • 20000

  • 100000

Read on for the test results and his findings.

Comments closed

Data Set Robustness

Tomaz Kastrun shows how robust the iris data set is:

Conclusion, IRIS dataset is – due to the nature of the measurments and observations – robust and rigid; one can get very good accuracy results on a small training set. Everything beyond 30% for training the model, is for this particular case, just additional overload.

The general concept here is, how small can you arbitrarily slice the data and still come up with the same result as the overall data set?  Or, phrased differently, how much data do you need to collect before predictions stabilize?  Read on to see how Tomaz solves the problem.

Comments closed

Powershell Runtime Prompt

Chrissy LeMaire shows off a really cool Powershell prompt:

A few months back, dbatools wizard Fred created a prompt that was so awesome, I never had to use Measure-Commandagain. It was cool enough that a number of us ended up adopting it, so I figured I’d share.

Performance is important to us so that’s what the prompt is all about. Nothing fancy, just the current working directory and how long the command took to run.

The prompt shows how long the previous command took to run.  Click through for the code to do this.

Comments closed

Asynchronous Stats Updates

Monica Rathbun explains why you might want to turn on asynchronous statistics updates in your OLTP environment:

By default, when Auto Update Statistics is set to True, the SQL Server Query Optimizer will automatically update statistics when data has met a threshold of changes (insert, update, delete, or merge) and the estimated rows are now potentially stale. When statistics are stale, execution plans can become suboptimal which can lead to degradation in performance.

This best practice option ensures your statistics stay up to date as much as possible. Each time a cached query plan is executed the Optimizer checks for data changes and potentially generates new statistics. This behavior is exactly what we want, but there is a catch. The caveat to this is that a cached query plan will be “held” while the statistics are updated and will recompile to use the new values before running. This caveat can slow down the execution process dramatically.

The advice Monica provides is generally sound, though there are rare cases when asynchronous statistics updates end up causing more problems than they solve, so test the change first.

Comments closed

SQL Server Agent On A Linux Container

Andrew Pruski shows how to get the SQL Agent service running on a Linux container:

Now, one of the benefits of attending SQL Saturdays is that you get to pick the brains of a lot of very clever people and luckily for me, Jan Van Humbeek (blog|twitter) was there.

Jan said that he had gotten the SQL Agent running in Linux containers so I asked if he could send on his code and he very kindly obliged.

So, the disclaimer for this blog post is that I didn’t write the code here, Jan did. All I’ve done is drop it into a dockerfile so that an image can be built. Thank you very much Jan!

Click through for Jan’s code and Andrew’s presentation of the process.

Comments closed

Cleaning Up Foreign Keys

Adrian Buckman looks at how to clean up untrusted foreign key constraints:

SQL Server tracks untrusted Foreign keys in sys.Foreign keys with a column called is_not_trusted, there may be a number of reasons why a Foreign key may have become untrusted below are a couple of examples:

  • Foreign key was disabled using the ‘NOCHECK’ option then re-enabled using ‘CHECK’ (not to be confused with ‘WITH CHECK’)
  • Foreign key was disabled using the ‘NOCHECK’ option , Primary key data was Deleted and the Foreign key was Enabled only using ‘CHECK’ (Again not to be confused with ‘WITH CHECK’)

So what happens when you try and enable a Foreign key ‘WITH CHECK’ (Check existing data for referential integrity), if the data is consistent then this is going to succeed however if Rows have been deleted and the Primary key data no longer exists but the Foreign key data does for example then this is going to fail miserably.

What I like about this post is that he does more than just saying “hey, here’s how you get the key constraint to be trusted again;” he goes further and shows you how to figure out if it will work beforehand.

Comments closed

Valuable Extended Events Sessions

Derik Hammer has a few useful extended events sessions that he’s sharing with us:

Extended Events are the all-around smart choice. They take a little bit of time to get used to, however. With thousands of new events and data points, it can be difficult to create an event session in a pinch. That is why it is important to have event sessions pre-scripted or pre-implemented on your SQL Server instances. A little bit of up-front work can save you a lot of time when you need information on the spot. Having them pre-scripted also prevents you from jumping back to Profiler, which has a much heavier footprint on your server.

When I create Extended Event sessions, I tend to use the SQL Server Management Studio wizard to find the events and actions (additional fields) that I want. Then, I will script it out and save it for later.

Below are five Extended Events sessions that I have found particularly useful and recommend you add to your toolbox.

Click through for all of those scripts, as well as queries to shred the resulting XML.

Comments closed

Automating Cache Cleanup

Tracy Boggiano has a process to automate cleaning up different caches in SQL Server:

First, we need to create a table to store our information on the caches we would like to clear on an automated basis and populate it with values.

For example, we clear SQL Plans if we 10,000 plans are Adhoc or Prepared plans that take up 5GBs of memory or Single Used Plans is greater than 10,000 or the memory used for Adhoc or Prepared plans if more than 50% of memory.  We clear Transactions cache if is more than 2 GBs and Lock Manager : Node 0 if it is more than 2 GBs.

Read on for the script.

Comments closed

Gathering Information On Memory-Optimized Objects

Ned Otter has a diagnostic script for learning about how your servers are using memory-optimized objects:

Instance level evaluates the following:

  • which databases are memory-optimized
  • if running Enterprise, if there are any resource groups defined, and which databases are bound to them
  • version/edition of SQL server
  • ‘max memory’ setting
  • whether or not instance-level collection of execution statistics has been enabled for all natively compiled stored procedures
  • memory clerks for the buffer pool and In-Memory OLTP
  • the value of the committed_target_kb column from sys.dm_os_sys_info
  • display any event notifications (because they conflict with deploying In-Memory OLTP

Database level evaluates the following:

For each memory-optimized database:

  • database files, including container names, size, and location

  • memory-optimized tables

  • indexes on all memory-optimized tables

  • count of indexes per memory-optimized table

  • natively compiled stored procedures

  • whether or not the collection of execution statistics is enabled for any natively compiled procedures

  • count of natively compiled procedures

  • if using the temporal feature for memory-optimized tables, the amount of memory consumed by hidden temporal internal tables

  • memory structures for LOB columns (off-row)

  • average chain length for HASH indexes

Ned provides the script on his blog, so click through to get that.  This looks great if you’re trying to build up some basic information on how developers in your environment use memory-optimized objects.

Comments closed