Author: Kevin Feasel

Cosmos DB Cheat Sheet

Published 2017-10-27 by Kevin Feasel

Melody Zacharias shows us a cheat sheet for Cosmos DB:

The Cosmos DB by Microsoft is their globally distributed, horizontally scalable, multi-model database service that is available through Azure. Released in 2014, it is the ideal DB for globally distributed applications. Formerly called DocumentDB Cosmos it now supports querying documents using SQL as a JSON query language. As a Schema-free platform, it provides automatic indexing of JSON documents without requiring an explicit schema or creation of secondary indexes. For those of use not well versed in JSON, this query cheat-sheet, has come to our rescue. It outlines common queries to retrieve information from 2 JSON documents.

Microsoft has put together this cheat-sheet to help you write your queries faster. This quick reference is a single page PDF that you can print, or keep in a handy computer file. This is version 4, so it just keeps getting better!

Click through for the link to the cheat sheet.

Comments closed

Phantom Reads

Published 2017-10-27 by Kevin Feasel

Arun Sirpal sees not-quite-there-yet transactions:

With Halloween around the corner what better topic to discuss than phantom reads. A phantom read occurs when rows have been inserted after a read operation and becomes visible in a follow-up read operation within the same transaction.

I will show you what this looks like with an example. Please note the code will not work for you because I dropped the foreign key constraint within the person.person table just so I could show you the example.

Arun mentions that the serializable isolation level prevents phantom reads. So do the repeatable read and snapshot isolation levels.

Comments closed

Updating Multiple Tables In One Statement

Published 2017-10-27 by Kevin Feasel

Denis Gobo tries different methods of updating two tables in one statement:

Every now and then you will get someone asking how to update two tables with one statement in SQL Server. The answer is usually, no that is not possible… the person then walks away muttering something about how MySQL allows it.

So I decided to try to see if I could update two tables with one statement. I decided to try a couple of different things

view
common table expression
indexed view
instead of trigger

The results are generally negative. I get the benefit of updating multiple tables in a single query, particularly when you have fairly complex logic around determining which records to update, but if you just wrap things in a transaction, at least you get the benefits of atomicity. And if you often have to update a bunch of different tables at the same time, there might be an issue with your data model.

Comments closed

Stress Testing SQL Server

Published 2017-10-27 by Kevin Feasel

Jes Borland shows how to use ostress to perform load testing against a SQL Server instance:

Ostress allows you to specify one file, or a folder that contains multiple files, to run. You can also specify a number of connections to be made to the database, to simulate multiple users or applications running the same query. Each connection can then run the file one or more times.

The next thing you’ll need is one or more .sql files that the tool will run.

To run a load test, you’ll open RML cmd prompt and enter your command.

Ostress isn’t as nice as a replayable trace for generating production loads, but it’s an easy method to stress test a server.

Comments closed

Tips For Running Kafka Streams On AWS

Published 2017-10-26 by Kevin Feasel

Ian Duffy and Nina Hanzlikova have some advice if you’re looking to spin up some EC2 instances to run Kafka Streams:

With upgrades in the underlying Kafka Streams library, the Kafka community introduced many improvements to the underlying stream configuration defaults. Where in previous, more unstable iterations of the client library we spent a lot of time tweaking config values such as session.timeout.ms, max.poll.interval.ms, and request.timeout.ms to achieve some level of stability.

With new releases we found ourselves discarding these custom values and achieving better results. However, some timeout issues persisted on some of our services, where a service would frequently get stuck in a rebalancing state. We noticed that reducing the max.poll.records value for the stream configs would sometimes alleviate issues experienced by these services. From partition lag profiles we also saw that the consuming issue seemed to be confined to only a few partitions, while the others would continue processing normally between re-balances. Ultimately we realised that the processing time for a record in these services could be very long (up to minutes) in some edge cases. Kafka has a fairly large maximum offset commit time before a stream consumer is considered dead (5 minutes) but with larger message batches of data this timeout was still being exceeded. By the time the processing of the record was finished the stream was already marked as failed and so the offset could not be committed. On rebalance, this same record would once again be fetched from Kafka, would fail to process in a timely manner and the situation would repeat. Therefore for any of the affected applications we introduced a processing timeout, ensuring there was an upper bound on the time taken by any of our edge cases.

There are some interesting tidbits in here.

Comments closed

Testing Cosmos DB Performance With Geospatial Data

Published 2017-10-26 by Kevin Feasel

Vincent-Philippe Lauzon has done some performance testing of Cosmos DB when querying geospatial data:

Here are the main attributes of the sample set:

There are 1 200 000 documents

Documents are distributed on 4000 logical partitions with 300 documents per logical partition

%33 of documents (i.e. 400 000 documents) have a location node with a geospatial “point” in there

Points are scattered uniformly on the geospatial rectangle

There are no correlation between the partition key and the geospatial point coordinates

We ran the tests with 4 different Request Units (RUs) configurations:

2500
10000
20000
100000

Read on for the test results and his findings.

Comments closed

Data Set Robustness

Published 2017-10-26 by Kevin Feasel

Tomaz Kastrun shows how robust the iris data set is:

Conclusion, IRIS dataset is – due to the nature of the measurments and observations – robust and rigid; one can get very good accuracy results on a small training set. Everything beyond 30% for training the model, is for this particular case, just additional overload.

The general concept here is, how small can you arbitrarily slice the data and still come up with the same result as the overall data set? Or, phrased differently, how much data do you need to collect before predictions stabilize? Read on to see how Tomaz solves the problem.

Comments closed

Powershell Runtime Prompt

Published 2017-10-26 by Kevin Feasel

Chrissy LeMaire shows off a really cool Powershell prompt:

A few months back, dbatools wizard Fred created a prompt that was so awesome, I never had to use Measure-Commandagain. It was cool enough that a number of us ended up adopting it, so I figured I’d share.

Performance is important to us so that’s what the prompt is all about. Nothing fancy, just the current working directory and how long the command took to run.

The prompt shows how long the previous command took to run. Click through for the code to do this.

Comments closed

Asynchronous Stats Updates

Published 2017-10-26 by Kevin Feasel

Monica Rathbun explains why you might want to turn on asynchronous statistics updates in your OLTP environment:

By default, when Auto Update Statistics is set to True, the SQL Server Query Optimizer will automatically update statistics when data has met a threshold of changes (insert, update, delete, or merge) and the estimated rows are now potentially stale. When statistics are stale, execution plans can become suboptimal which can lead to degradation in performance.

This best practice option ensures your statistics stay up to date as much as possible. Each time a cached query plan is executed the Optimizer checks for data changes and potentially generates new statistics. This behavior is exactly what we want, but there is a catch. The caveat to this is that a cached query plan will be “held” while the statistics are updated and will recompile to use the new values before running. This caveat can slow down the execution process dramatically.

The advice Monica provides is generally sound, though there are rare cases when asynchronous statistics updates end up causing more problems than they solve, so test the change first.

Comments closed

SQL Server Agent On A Linux Container

Published 2017-10-26 by Kevin Feasel

Andrew Pruski shows how to get the SQL Agent service running on a Linux container:

Now, one of the benefits of attending SQL Saturdays is that you get to pick the brains of a lot of very clever people and luckily for me, Jan Van Humbeek (blog|twitter) was there.

Jan said that he had gotten the SQL Agent running in Linux containers so I asked if he could send on his code and he very kindly obliged.

So, the disclaimer for this blog post is that I didn’t write the code here, Jan did. All I’ve done is drop it into a dockerfile so that an image can be built. Thank you very much Jan!

Click through for Jan’s code and Andrew’s presentation of the process.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31