Press "Enter" to skip to content

Curated SQL Posts

Supplementary Characters In SQL Server 2017

Solomon Rutzky tests SQL Server 2017 collations to determine whether they support supplementary characters:

As you scroll through the list, does anything appear to be missing? Go ahead, take another look. I’ll wait :wink:. Figure it out? Yep, that’s right: none of those Collations end in “_SC“. Collations ending in “_SC” were added in SQL Server 2012 and support the full UTF-16 character set. Without the “_SC” ending built-in functions and comparisons / sorting only fully support the base UCS-2 character set (i.e. the first 65,536 Unicode Code Points). You can, of course, store and view all Unicode Code Points, even Supplementary Characters, in non-“_SC” Collations, but they will be interpreted as being two separate “unknown” characters instead of as a single character.

To illustrate this point, the following query shows: the two surrogate Code Points (not actual characters by themselves, but when used in pairs of any of the assigned combinations, produce a single character), the resulting Supplementary Character, and how the LEN function interprets that sequence in both non-“_SC” and “_SC” Collations.

Read on for Solomon’s testing, which includes a cat face emoji.  I know that I, for one, thought there was insufficient cat face emoji representation in SQL Server prior to 2017.

Comments closed

Performance Tuning TVFs With Optional Parameters

Arvind Shyamsundar walks us through a scenario with user-defined functions with optional parameters:

If you notice carefully, the above query is an example of ‘optional parameters’ wherein the same query caters to situations where there are specific values for the parameters as well as other cases where there are none. Due to the implementation of the query (specifically the usage of ISNULL(@paramname, ColName)) what ends up happening is that the query plan thus generated will not leverage any indexes on the table. While this query can be refactored to separate versions for cases where the parameter values are supplied, and where they are not, another viable option is to use OPTION (RECOMPILE) on the statement level. This is an acceptable solution in most cases because the cost of scanning the table is often far higher than the cost of recompiling this query. So here is how we used OPTION RECOMPILE in this case:

Arvind walks us through three separate solutions.  My fourth solution is, don’t use user-defined table-valued functions.

Comments closed

Biml Metadata, With And Without BimlStudio

Ben Weissman has a pair of posts regarding metadata models in Biml.  First up, he gives us the high-roller solution:

If you’re lucky enough to be a BimlStudio user, you have access to the Biml Metadata feature! This feature allows you to build a Metadata model that fits your exacts needs which can then be browsed and used through a Metadata Instance using a dynamic object model.

As you probably still want to maintain your metadata outside of BimlStudio, we’ve build this little piece of code. It will ready your meta-Schema from a given SQL Database and build a Biml Metadata-Model from it. In a second step, it will also import the contents of your model into an instance:

If your company doesn’t want to shell out the cash to buy a license for BimlStudio, Ben also has a version for people using the free BimlExpress tool:

So maybe you’ve noticed our blog post on deriving metadata from SQL Server into BimlStudio, but you are using BimlExpress and therefore don’t have access to the feature described in there? While it’s true, that BimlExpress doesn’t support the Metadata features of the Biml language, there are similar ways of achieving a flexible metadata model in Biml.

This post shows you, how you can build a model in SQL Server, import it to Biml including derived relationships etc. and use it in a very flexible way.

To get started, we need to set up a sample model in SQL Server first. You can either download the scripts from https://solisyon.de/files/SetupDerivedMetadata.sql or scroll to the very end of that page. Although your individual model can and will differ from this one, we suggest you follow along using our example and start tweaking it afterwards!

Once you really get how Biml converts metadata to packages, life gets so much easier.

Comments closed

Using dbatools To Determine SQL Server Versions

Simone Bizzotto walks us through a new dbatools feature:

You get back on a jiffy:
– the Build
– the Major Release
– the Service Pack
– the Cumulative Update
– the KB related to that version
– when the support for that version ends
– if all of the above are matching a verified build
– if a warning is shown, you passed a bad build or the JSON must be updated

Getting the build is easy; getting some of this other information is where they add a lot of value.

Comments closed

Master Data In Azure

Matt How explains why Master Data Services isn’t a great cloud-based master data management solution and offers up an alternative:

Excel is easy to use, but not user friendly

Excel is on nearly every desktop in any Windows based organisation and with the Master Data Services Add-in, it puts the data well within the reach of the users. Whilst it is simple it is in no way user friendly when compared to other applications that your users may be using. Not to mention that for most this will be the only part of the solution they see! Wouldn’t it be great if there was a way to supply the same data but with an intuitive, mobile ready front end that people enjoy using?

Developers are tightly constrained

Developers like to develop, not choose options from drop down menus in a web based portal. With MDS, not only can Devs not make use of Visual Studio and a like but they are very tightly constrained by the business rules engine. At this point we should be able to make use of our preferred IDE so that we can benefit from source control, frameworks and customised business logic.

Not scalable according to modern expectations

Finally, MDS cannot scale to handle any kind of “big data”. It’s a bit of buzz word but as businesses collect more and more data, we need a data management option that can grow with that data. Due to the fact that MDS must be deployed from a server, there is no easy way to meet those big data requirements.

There are a few pieces to Matt’s solution, making for an interesting read.

Comments closed

Unintentional Data

Eric Hollingsworth describes data science as the cost of collecting data approaches zero:

Thankfully not only have modern data analysis tools made data collection cheap and easy, they have made the process of exploratory data analysis cheaper and easier as well. Yet when we use these tools to explore data and look for anomalies or interesting features, we are implicitly formulating and testing hypotheses after we have observed the outcomes. The ease with which we are now able to collect and explore data makes it very difficult to put into practice even basic concepts of data analysis that we have learned — things such as:

  • Correlation does not imply causation.
  • When we segment our data into subpopulations by characteristics of interest, members are not randomly assigned (rather, they are chosen deliberately) and suffer from selection bias.
  • We must correct for multiple hypothesis tests.
  • We ought not dredge our data.

All of those principles are well known to statisticians, and have been so for many decades. What is newer is just how cheap it is to posit hypotheses. For better and for worse, technology has led to a democratization of data within organizations. More people than ever are using statistical analysis packages and dashboards, explicitly or more often implicitly, to develop and test hypotheses.

This is a thoughtful essay well worth reading.

Comments closed

Crossing The Streams With Kafka

Himani Arora shows how to join two Kafka streams together:

KStream-KStream Join

It is a sliding window join, that means, all tuples close to each other with regard to time are joined. Time here is the difference up to size of the window.

These joins are always windowed joins because otherwise, the size of the internal state store used to perform the join would grow indefinitely.

In the following example, we perform an inner join between two streams. The output the joined stream will be of type KStream<K, ...>

Read on to learn more about two additional join types.

Comments closed

Measuring Semantic Relatedness

Sandipan Dey re-works a university assignment on semantic relatedness in Python:

Let’s define the semantic relatedness of two WordNet nouns x and y as follows:

  • A = set of synsets in which x appears
  • B = set of synsets in which y appears
  • distance(x, y) = length of shortest ancestral path of subsets A and B
  • sca(x, y) = a shortest common ancestor of subsets A and B

This is the notion of distance that we need to use to implement the distance() and sca() methods in the WordNet data type.

It looks like a helpful assignment for understanding natural language processing a little better.

Comments closed

Trigger Nuance

Denis Gobo offers up some good advice on triggers:

Most common mistake people make when first starting writing triggers is that they write it in such a way that it will only work if you insert/update/delete one row at a time. A trigger fires per batch not per row, you have to take this into consideration otherwise your DML statements will blow up. How to do this is explained in this post Coding SQL Server triggers for multi-row operations, there is no point recreating that post here.
Another problem that I see is that some people think a trigger is SQL Server’s version of crontab, you will see code that sends email, kicks off jobs, runs stored procedures. This is the wrong approach, a trigger should be lean and mean, it should execute as fast as possible, if you need to do some additional things then dump some data from the trigger into a processing table and then use that table to do your additional tasks. Don’t use triggers as a messaging system either, SQL Server comes with Service Broker, use that instead.

Good reading.  There are valid reasons for triggers, and ignoring them altogether is almost as bad as misusing them.

Comments closed