Press "Enter" to skip to content

Month: May 2019

Unique Key Constraints in Cosmos DB

Hasan Savran shows how you can set unique key constraints on Cosmos DB containers:

Unique key names are case-sensitive, I have good experience on this. If your unique key is in lowercase letters but your data has field with uppercase, CosmosDB will insert null value into unique key first time, you will get an error second time when it tries to insert null again. CosmosDB does not support sparse unique keys. If your unique key is /SSN, you can have only one null value in this field.

    If you like to use unique keys in Azure CosmosDB, you have to them when you create your containers. You cannot add a unique key to an existing container. Only way to add a unique key to an existing container is, to create a new container and move your data from older container to the new one. Also, you cannot update unique keys just like partition keys. Picking a wrong unique key can be an expensive error.

Looks like you’ll need to have a bit of foresight when choosing keys (or choosing not to use keys).

Comments closed

Comparing Sparklines

Lisa Charlotte Rost takes us through sparklines:

Sparklines are curious things. They’re supposed to show a trend, and a trend only. They’re supposed to show when something (like stocks) increase and decrease, where the peaks and the valleys are. But sparklines are not supposed to be comparable with each other.

So when you’re seeing two sparklines with the same height, the ebbs and flows of the first one could play out between 0 and 10 (e.g. US-Dollar), while the other sparkline’s peak is at 10,000.

But that’s odd, no? Doesn’t that invite people to make totally false assumptions?

I like sparklines a lot, but I’m apt to violate this particular rule and make them cross-comparable unless I know people will never care about comparisons between elements. One way to get around the “what if the range is big?” problem is to plot sparkline heights as logs so that 1000 is a bit bigger than 100, which is a bit bigger than 10. The argument I make for doing that is you still see size differences and sparkline comparisons are imprecise to begin with, so magnitudes are more important than exact values.

Comments closed

Defining and Setting Deadlock Priority

Dave Bland explains how you can use DEADLOCK_PRIORITY to control which process gets rolled back:

Before getting into how to set the DEADLOCK_PRIORITY, let’s quickly go over what the options are.  There are two ways to set the priority of a process. The first option is to use one of the keywords, LOW, NORMAL or HIGH. The second approach is to use a numeric value between -10 and 10.  The lowest value will be chosen as the victim.  For example, LOW will be the victim of the other process is HIGH and -1 will be the victim if the other process is greater than -1.

As I recall, index operations (like rebuilds) are automatically set to a low priority.

Comments closed

Getting Version Info From dbatools

Jess Pomfret shows how you can get your operating system and SQL Server versions from the dbatools Powershell module:

With these dates on the horizon it’s a good time to look at our estate and make sure we have a good understanding of the versions we currently support. I’m going to show you how to do that easily with a couple of dbatools functions. Then, bonus content, I’ll show you how to present it for your managers with one of my other favourite PowerShell modules ImportExcel.

Jess gets bonus points for avoiding the dreaded pie chart at the end.

Comments closed

Data Classifications on Azure SQL DW

Meagan Longoria takes us through data classifications on Azure SQL Data Warehouse:

Data classifications in Azure SQL DW entered public preview in March 2019. They allow you to label columns in your data warehouse with their information type and sensitivity level. There are built-in classifications, but you can also add custom classifications. This could be an important feature for auditing your storage and use of sensitive data as well as compliance with data regulations such as GDPR. You can export a report of all labeled columns, and you can see who is querying sensitive columns in your audit logs. The Azure Portal will even recommend classifications based upon your column names and data types. You can add the recommended classifications with a simple click of a button.

But read the whole thing, as Meagan sees a problem with it when you use a popular loading technique.

Comments closed

BimlExpress 2019

Cathrine Wilhelmsen digs into BimlExpress 2019:

This is the first major BimlExpress release this year, called the R1 release. There are no new features in this release, but BimlExpress now supports both Visual Studio 2019 and SSIS 2019!

While there are no new features in this release of BimlExpress, there are two changes to Visual Studio that you should be aware of as a Biml developer.

Read on to see what those two changes are.

Comments closed

Parameter Sniffing in the Wild

Erik Darling is a parameter sniffing anthropologist:

A while back, I put together a pretty good rundown of this on the DBA Stack Exchange site.

In the plan cache, it’s really hard to tell if a query is suffering from parameter sniffing in isolation.

By that I mean, if someone sends you a cached plan that’s slow, how can you tell if it’s because of parameter sniffing?

Read on to see what Erik does to discover parameter sniffing problems.

Comments closed

Minimal Logging with FastLoadContext

Paul White takes us through another way to perform minimally logged bulk loads with SQL Server:

This post provides new information about the preconditions for minimally logged bulk load when using INSERT...SELECT into indexed tables.

The internal facility that enables these cases is called FastLoadContext. It can be activated from SQL Server 2008 to 2014 inclusive using documented trace flag 610. From SQL Server 2016 onward, FastLoadContext is enabled by default; the trace flag is not required.

Without FastLoadContext, the only index inserts that can be minimally logged are those into an empty clustered index without secondary indexes, as covered in part two of this series. The minimal logging conditions for unindexed heap tables were covered in part one.

Click thorugh for a highly informative article.

Comments closed

Kafka Schema Registry Tips

Yeva Byzek shares 17 tips for managing your Kafka Schema Registry:

Mistake #5: Configuring different names for the schemas topic in different Schema Registry instances

There is a commit log with all the schema information, which gets written to a Kafka topic. All Schema Registry instances should be configured to use the same schemas topic, whose name is set by the configuration parameter kafkastore.topic. This topic is the schema’s source of truth, and the primary instances read the schemas from this topic. The name of this topic defaults to _schemas, but sometimes customers choose to rename it. This has to be the same for all Schema Registry instances, otherwise it may result in different schemas with the same ID.

Read on for sixteen more.

Comments closed

Removing Serial Correlation

Vincent Granville has an easy trick for removing serial correlation from a data set:

Here is a simple trick that can solve a lot of problems.

You can not trust a linear or logistic regression performed on data if the error term (residuals) are auto-correlated. There are different approaches to de-correlate the observations, but they usually involve introducing a new matrix to take care of the resulting bias. See for instance here.  

Click through for the alternative.

Comments closed