Press "Enter" to skip to content

Curated SQL Posts

Query Tuning With The APPLY Operator

Daniel Janik walks through using the APPLY operator to tune a couple of queries:

Recently we were doing a project that heavily focused on query tuning and many tables had various outer joins. My co-worker pointed out that many of these could be converted to an apply rather than a join.

Apply gives you both CROSS and OUTER. Think of CROSS APPLY like an INNER JOIN and OUTER APPLY like an OUTER JOIN.

Let’s compare some code to see how APPLY stacks up.

I like the APPLY operator so much that I created an entire presentation on it.  It’s not a cure-all by any means, but if you understand the intent, you can find places where it improves your code significantly.

Comments closed

Finding The Real Character Set: Unicode And SQL Server Identifiers

Solomon Rutzky wraps up his series on Unicode and regular identifiers:

The question that I’m trying to answer is: what are the valid “letters” and “decimal numbers” from other national scripts?

I tried using the online research tool “UnicodeSet”, but that gave slightly different results compared (using the “alphabetic” and “numeric_type = decimal” properties) to what I discovered SQL Server actually accepts.

I then loaded the actual Unicode 3.2 data files only to find that the number of characters having either the “alphabetic” or “numeric_type = decimal” properties was different than both the online search and what SQL Server actually accepts.

And so…..

Click through to find the real Unicode killer.

Comments closed

Wait Stats Related To Columnstore Indexes

Niko Neugebauer has some documentation on important wait stats around columnstore:

I split the known wait types into the following distinct groups:
– Batch Execution Mode (HTBUILD, HTDELETE, HTMEMO, HTREINIT, HTREPARTITION, PWAIT_QRY_BPMEMORY, BPSORT)
– Columnstore Indexes (ROWGROUP_VERSION, ROWGROUP_OP_STATS, SHARED_DELTASTORE_CREATION, COLUMNSTORE_BUILD_THROTTLE, COLUMNSTORE_MIGRATION_BACKGROUND_TASK, COLUMNSTORE_COLUMNDATASET_SESSION_LIST)

With an appearance in the next SQL Server version of the Batch Execution Mode for the RowStore Indexes, the first group of the waits will suddenly be becoming more important for every single SQL Server user and mixing it together with the Wait Types specific to the internal structures and functions of the Columnstore Indexes makes no sense.

Read on to learn more about these important wait types.

Comments closed

Finding System-Generated Constraint Names

Michael J. Swart has a script which helps you find system-generated constraint names:

Names for constraints are optional meaning that if you don’t provide a name when it’s created or cannot afford one, one will be appointed to you by the system.
These system provided names are messy things and I don’t think I have to discourage you from using them. Kenneth Fisher has already done that in Constraint names, Say NO to the default.

But how do you know whether you have any?

Check out the informative comments as well.

Comments closed

Using Hive: Tiered Or Decoupled Storage?

Brandon Wilson and Gopal Vijayaraghavan compare a series of Hive queries against EC2 instances with persistent storage and S3:

There are advantages and disadvantages to each approach. The tiered approach has the most flexibility for an operator to tune the performance of the cluster while trading off size of the hot data zone for better performance or smaller resource footprint. The downside of this approach is that, having data on HDFS, resizing the cluster is a slow and tedious process due to HDFS needing to be rebalanced to achieve performance and fault-tolerance expectations. Thus this architecture is generally only used for statically sized clusters with steady, well-known workloads.

The decoupled architecture, on the other hand, enables maximum flexibility for cluster growth and reduction. For example, a cluster could run at 100 nodes during the day to support analytics and reporting and then shrink to 24 nodes overnight to support smaller ETL workloads. Historically, the disadvantage to decoupling is that cloud storage is not local and therefore could drastically affect runtime of the analytical workloads (hence the hybrid approach of tiered storage). However, the advent of LLAP in Hive 2.0 has limited this overhead making the approach far more attractive. The dynamic cache within LLAP also means that we do not need to statically define what data is hot. It can be inferred at query time (i.e., as users access the data, that data will become hot). We will look closer at how LLAP closes the runtime gap in the next section.

Historically, the argument was that you should avoid S3 in part because it’s relatively flaky compared to disks (in terms of performance and in its eventual consistency model).  It looks like that’s no longer a pressing concern.

Comments closed

Filtering On Kafka Streams

Robin Moffatt has a new series showing how to use Kafka Streams for dealing with syslog data:

syslog is one of those ubiquitous standards on which much of modern computing runs. Built into operating systems such as Linux, it’s also commonplace in networking and IoT devices like IP cameras. It provides a way for streaming log messages, along with metadata such as the source host, severity of the message, and so on. Sometimes the target is simply a local logfile, but more often it’s a centralised syslog server which in turn may log or process the messages further.

As a high-performance, distributed streaming platform, Apache Kafka® is a great tool for centralised ingestion of syslog data. Since Apache Kafka also persists data and supports native stream processing we don’t need to land it elsewhere before we can utilise the data. You can stream syslog data into Kafka in a variety of ways, including through Kafka Connect for which there is a dedicated syslog plugin.

In this post, we’re going to see how KSQL can be used to process syslog messages as they arrive in realtime.

Check it out.

Comments closed

Configuring Azure SQL Analytics

Esat Erkec has a guide showing how to configure and use Azure SQL Analytics on Azure SQL Database:

The most important and challenging responsibility of a database administrator is monitoring performance metrics. Because monitoring performance and troubleshooting performance issues are considered to be difficult. For this reason, we need diagnostic and monitoring tools to measure performance counters and metrics. For Azure SQL there is a tool which is named SQL Analytics. With this tool, we can measure and monitor Azure SQL databases and elastic pools. At the same time, we can create alerts for notifications. SQL Analytics offers performance metrics in graphical form. In this article, we will learn how to enable Azure SQL Analytics.

This is a long and screenshot-filled post, which is helpful if you’re getting started.

Comments closed

Powershell Splats And Quotes

Shane O’Neill shows when you need to use single quotes when splatting in Powershell:

If you’ve checked the examples in that post – and I recomment that you do – then you’ll see that it takes the syntax of Parameter = 'Value'.

Notice the Parameter portion is not in quotes? It also works perfectly well if you have the Parameter name in quotes e.g. 'Parameter' = 'Value' (double quotes works too).

Why would you use one instead of the other?

There is a special circle in the Inferno for people who put spaces in their parameter names.

Comments closed

Managing Central Management Server

Warren Estes explains how he keeps track of his servers using a Central Management Server:

We use a CMS server for each domain and I can’t imagine life without it. Kind of like when I discovered Amazon prime, or bought my first memory foam mattress.

The real magic of a CMS comes from being able to push jobs, or evaluate policies, on any server (targets) you want.

You can also execute T-SQL against all, or a subset of servers with either registered servers or CMS.

There are some caveats to look out for like collation differences and version specific DMVs when running queries across instances. Also security needs to be addressed. However, that is outside the scope of this post. You can find that information in the links in the first section.

I liked CMS when I had to deal with a dozen instances.  With hundreds of instances, I wouldn’t want to administer anything without one.

Comments closed

Powershell: CSVs And NoTypeInformation

Dan Blank lays down some information about saving headerless CSV files in Powershell:

Anyway, a question was posted recently using the #pshelp hashtag on Twitter.  How do you replace the header line of multiple csv files at once?

I saw this and maybe cockily thought to myself, “that’s got to be an easy 2 liner, bet I can boost my ego and quickly write out a solution”.

Spoilers:  it turned out to be slightly more complex.

Comments closed