Press "Enter" to skip to content

Author: Kevin Feasel

Scripting Foreign Key Constraints

Louis Davidson has a process to script out foreign key constraints and includes an example which loads those constraints into a utility table:

As noted in my previous post, I am in the middle of building a utility (for work, and for my next SQLBLOG post), that will help when you need to drop the foreign key constraints on a table, but then replace them without having to save off the script manually. In that post, I showed how to manually create a foreign key constraint in three possible states.

Next, in order to create the utility to script the FOREIGN KEY constraints before dropping them, I need a way to script the constraint. In this post I present a user defined function that will generate a script that does the scripting of a foreign key constraint. The code is largely based on some code from Aaron Bertrand in this blog (with his permission naturally!) with a few edits to script constraints as enabled, disabled or untrusted if the source constraint was in that condition (or you can force the constraints to a certain way if you so desire as well.)

This could be particularly useful for ETL jobs.

Comments closed

Computed Column Performance

Paul White has a great article on when computed columns perform poorly:

A major cause of poor performance is a simple failure to use an indexed or persisted computed column value as expected. I have lost count of the number of questions I have had over the years asking why the optimizer would choose a terrible execution plan when an obviously better plan using an indexed or persisted computed column exists.

The precise cause in each case varies, but is almost always either a faulty cost-based decision (because scalars are assigned a low fixed cost); or a failure to match an expanded expression back to a persisted computed column or index.

The match-back failures are especially interesting to me, because they often involve complex interactions with orthogonal engine features. Equally often, the failure to “match back” leaves an expression (rather than a column) in a position in the internal query tree that prevents an important optimization rule from matching. In either case, the outcome is the same: a sub-optimal execution plan.

Definitely read the whole thing if you’re thinking about setting trace flag 176 on.

Comments closed

Finding Queries Which Generate Waits

Kendra Little explains how to find which queries are causing waits in SQL Server:

I once had an extremely busy system where we had growing CMEMTHREAD waits. This is an unusual wait, and our question was: is this being caused by a single query, just a few queries, or all queries? We were able to answer this by setting up an extended events trace and looking at waits by query, but this had some downsides:

  1. Extended Events has no GUI in 2008R2, and setting up and testing the scripts took a bit of time (more minor issue)
  2. Generating wait information by query on a very busy system generates a lot of output, so we had to be careful to set up sampling and filtering so we didn’t impact performance (more major issue)

But we were able to use this to figure out that the wait was associated with all queries, not a few queries, which helped us down our troubleshooting path.

Kendra answers this (mostly) in the context of SQL Server 2008 R2, as that was the version the questioner had, but she does mention where later versions make life easier.

Comments closed

Waits: External But Not Preemptive

Ewald Cress has some thoughts on external wait types:

Previously I dug into preemptive waits in SQLOS, and to be honest, I equated “preemptive” with “external”. For the most part the two go hand in hand after all.

To recap, a preemptive wait isn’t necessarily a wait at all. What happens is that a worker needs to run some code that can’t be trusted to play by cooperative scheduling rules. And rather than put the SQLOS scheduler (and all its sibling workers) at the mercy of that code, the worker detaches itself from the scheduler and cedes control to a sibling runnable worker.

Read the whole thing.

Comments closed

Built-In SQL Server Functions

Tywan Terrell has a listing of various functions built into SQL Server:

SQL Server starting with 2012 ship with a robust set of functions that can be used to make code perform faster, with fewer lines of code. The functions  can be used in ETL Process to provide better error handling. A example of this would be the Try_Parse function that allows you to check if a value can be converted.

Another example would be using the FIRST_VALUE() and LAST_VALUE() functions which work against a set of data. These functions are very useful when looking for things like month over month averages and when doing calculations. The table below contain a list of function that are supported starting with 2012 along with some examples of how to use them.

He breaks them down into four categories and provides examples.  Functions can bring their own set of problems with query performance, but most of them can be very useful.

Comments closed

Finding Database Growth Events

Arun Sirpal has an Extended Events session which tracks growth events on a database file:

A quick post that is hopefully useful, I wanted a quick way to find the time, size of the database file size change and who caused it.

I went down the extended events route using sqlserver.database_file_size_change event. By the way I am no Extended Events expert, I write a lot via trial and error I am trying to wean off Profiler.

Read on for the script as well as a query which shreds the XML and returns a result set.

Comments closed

Spark Changes In HDP 2.6

Vinay Shukla and Syed Mahmood talk about what’s new with Spark and Zeppelin in the Hortonworks Data Platform 2.6 update:

SPARKR & PYSPARK

Most data scientists use R & Python and with SparkR & PySpark respectively they can continue to leverage their familiarity with the R & Python languages. However, they need to use the Spark API to leverage Machine learning with Spark and to take advantage of distributed computations. Both SparkR & PySpark are evolving rapidly and SparkR now supports a number of machine learning algorithms such as LDA, ALS, RF, GMM GBT etc. Another key improvement in SparkR is the ability to deploy a package interactively. This will help Data Scientists deploy their favorite R package in their own environment without stepping on other users.

PySpark now also supports deploying VirtualEnv and this will allow PySpark users to deploy their libraries in their own individual deployments.

There are several large changes, so check it out.

Comments closed

Custom ggplot2 Subplots

Ilya Kashnitsky shows how to create custom subplots using ggplot2:

Actually, ggplot2 is a very powerful and flexible tool that allows to draw figures with quite a complex layout. Today I want to show the code that aligns six square plots (actually, maps) just as in the figure above. And it’s all about the handy function ggplot2::annotation_custom(). Since I used the layout more than once, I wrapped the code that produced it into a function that takes a list of 6 square plots as an input and yields the arranged figure with arrows as an output. Here is the commented code of the function.

This is the difference between “I’m just going to throw some stuff on there” (which is how I tend to operate) versus well thought out visual layout.

Comments closed

Microsoft R Open 3.4.0

David Smith announces Microsoft R Open 3.4.0:

R 3.4.0 (upon which MRO 3.4.0 is based) is a major update to the R language, with many fixes and improvements. Most notably, R 3.4.0 introduces a just-in-time (JIT) compiler to improve performance of the scripts and functions that you write. There have been a few minor tweaks to the language itself, but in general functions and packages written for R 3.3.x should work the same in R 3.4.0. As usual, MRO points to a fixed CRAN snapshot from May 1 2017, but you can use the built-in checkpoint package to access packages from an earlier date (for compatibility) or a later date (to access new and updated packages).

The version of Microsoft R Server shipping with SQL Server 2017 will still be based on 3.3.3, but I’m going to guess that a new version of Microsoft R Server supporting 3.4.0 will ship in the next several months.

Comments closed

Disabling The Identity Cache

Andrew Pruski looks at a configuration setting in SQL Server 2017 which prevents SQL Server from bumping the next identity value on failover or restart:

Now we can check the data in the table: –

SELECT * FROM [dbo].[TestTable]
GO

As expected, there’s a gap in the IDs. Why it jumps to 1002 is discussed in the connect item.

OK, now let’s try running the same code again but this time we will disable the identity cache.

This doesn’t eliminate gaps altogether; those can still happen upon rollback of a transaction.  This is reason #1 why you should not use identity columns as proper sequences.

Comments closed