Press "Enter" to skip to content

Author: Kevin Feasel

Saving Your Biml Outputs

Tim Cost shows how to save the Biml which gets generated behind the scenes when you go to generate a set of files:

One of the first things I started wondering about as I got used to reading OPC (other peoples code) is just EXACTLY what is BIML doing at any given point in the code.  You can make some educated guesses based on the SSIS packages (in my case I’m exclusively interested in BIML for SSIS but of course it can do a lot more than that), but it’s easy to get lost, especially when there’s a lot of BIML script and some of it is only used to establish a data model in memory or to create / fill variables that will be used in SSIS.  I was delighted to discover the following piece of code that can show you exactly what BIML is doing based on the code you are writing.

If you don’t have BimlStudio, this trick is vital for figuring out what’s going wrong.

Comments closed

Dealing With Database Changes

Vladimir Oselsky walks through his database deployment workflow:

When it comes to actual deployment to Test and production servers, it is handled by application update program that runs scripts on the target server one by one in alphabetical order. Since we have clients running different versions, scripts always have to be applied in order, for example, if the customer is on version 1.5 before the could get 2.5 they need 2.0. This ensures that database changes are applied in correct order, and I don’t have to worry about something breaking.

One last problem that I have to deal with on a regular basis is Version-drift. This is caused when I manually patch a client for a fix without going through the proper build process. In those cases, I just have to manually merge changes into development to guarantee that it will make it out to other clients. Once in a while, it becomes quite complicated to keep track of different clients running different versions and how to ensure that if they need a fix, it is not something that could be resolved through update versus manual code changes.

Version drift can be a big pain, but check out Vlad’s workflow.

Comments closed

47 Incorrect Deployment Assumptions

Brent Ozar has a list of 47 assumptions regarding database deployments that turn out not always to be true:

30. The deployment person wouldn’t dream of only highlighting some of it and running it.

31. The staff who were supposed to work with you during the deployment will be available.

32. The staff, if available at the start of the call, will be available during the entire call.

33. The staff won’t come down with food poisoning halfway through the deployment call, forget to mute their home office phone, step into the bathroom, and leave the bathroom door open.

I’ve never had item #33 happen to me, but that’s a pretty solid list of stuff that can go wrong.

Comments closed

Database Deployment: Growing Up

Ryan Booz uses schooling as an extended metaphor for database deployment:

In general, the biggest issues we hit continue to be client customizations to the database (even ones we sanction) and an ever growing set of core-pop data that we manage and have to proactively defend against client changes.  This is an area we just recently admitted we need to take a long, hard look at and figure out a new paradigm.

I should mention that it was also about this time that we were finally able to proactively get our incremental changes into source  control.  All of our final scripts were in source somewhere, but the ability to use SQL Compare and SQL Source Control allowed our developers to finally be a second set of eyes on the upgrade process.  No longer were we weeding through 50K lines of SQL upgrade just to try and find what changed.  Diffing whole scripts doesn’t really provide any good context… especially when we couldn’t guarantee that the actions in the script were in the same order from release to release.  This has been another huge win for us.

This is a view from someone in the middle of the process.  Ryan’s group isn’t pushing everything automatically, but they’re building out to that.

Comments closed

Breakpoint Extended Event

Arun Sirpal is a dangerous man of mystery and danger, but mostly danger:

I did a dangerous thing, and I want to make sure that YOU DO NOT do the same.

I was creating a couple of extended events sessions and was playing around with some actions. I ended up with the following code where I was after a guy called Shane:

The probability that you intend to set a breakpoint in SQL Server via Extended Event is quite low (low enough that if you’re doing it, you should already know what you’re doing), but click through to see exactly what damage you can do.

Comments closed

EF Core Merge Statements

Richie Rump looks at SQL that Entity Framework Core generates when inserting a batch of records:

If you’re an experienced SQL tuner, you’ll notice some issues with this statement. First off the query has not one but two table variables. It’s generally better to use temp tables because table variables don’t have good statistics by default. Secondly, the statement uses a MERGE statement. The MERGE statement has had more than it’s fair share of issues. See Aaron’s Bertrand’s post “Use Caution with SQL Server’s MERGE Statement” for more details on those issues.

But that got me wondering, why would the EF team use SQL features that perform so poorly? So I decided to take a closer look at the SQL statement. Just so you know the code that was used to generate the SQL saves three entities (Katana, Kama, and Tessen) to the database in batch. (Julie used a Samurai theme so I just continued with it.)

Yeah…I’m not liking the MERGE statement very much here.

Comments closed

Genomic Analysis In Spark

Tom White and Jonathan Keebler show off hail, a package to allow you to perform genomic analysis in Apache Spark:

One of the most important downstream analyses is finding genetic trait associations. Association studies look for statistical associations between genetic variation and phenotypic traits, that is, an observable characteristic of an individual, such as hair color or disease. With the increasing availability of whole-genome sequence data, it’s possible to look for variants from across the whole genome that may be associated with a disease, rather than heavily relying only on commonly known variants as in a traditional genome-wide association study (GWAS).

The challenge for downstream processing is scale. Tools that can cope with a few hundred or even a few thousand genomes, such as the well-known 1000 Genomes dataset, can’t handle datasets that are one or more orders of magnitude larger. These datasets are now becoming commonplace, thanks to the multiple sequencing efforts taking place around the world like the 100,000 Genomes Project in the UK and the Precision Medicine Initiative in the US.

Genomic analysis has been right in Hadoop’s wheelhouse for a while.

Comments closed

Grid Features In SQL Prompt

Derik Hammer shows off some of the grid functionality in Red Gate’s SQL Prompt:

Even more common than scripting out INSERT statements, I may need to copy a set of values and format them for an IN clause. Normally I would use a text editor such as Notepad++ to reformat the multiple lines of values. SSMS can also be used but I find Notepad++’s find/replace features better.

Now I do not have to worry about copying/pasting the values and making changes. SQL Prompt delivers a direct conversion from values to IN clause.

Click through for some animated GIFs showing how to use this functionality.

Comments closed

Foreign Key Check Options

Louis Davidson shows how to create a foreign key constraint which is enabled or disabled, trusted or untrusted:

I am in the middle of building a utility (for work, and for my next SQLBLOG post), that will help when you need to drop the foreign key constraints on a table if you want to truncate the tables, but holds the script in a table to replace the script.  The first thing though, is to make sure I have all of the scripting possibilities understood.

When I started hunting around to remember how to create a disabled constraint, I couldn’t easily find anything, so I figures I would make this a two-parter. (My blogging rule is if I look for something and find a good article about it, reference it, then tweet the article out. If it is too hard to find, blog about it!) So today I will review how to create a FOREIGN KEY constraint in three ways:

  • Enabled, and Trusted – Just as you would normally create one

  • Enabled, Not Trusted – The “quick” way, not checking data to see if any wrong data already exists, but not allowing new, bad data in

  • Disabled, Not Trusted – The constraint is basically documentation of the relationship, but you are on your own to make sure the data matches the constraint

In an ideal world, all of your constraints are enabled and trusted, but when you’re building a general-purpose script, you can’t always assume that will be the case.  Click through for examples on how to create foreign key constraints fitting each of these scenarios.

Comments closed

Dynamic Markdown YAML

Steph Locke shows how to use the params section of a YAML header to enable parameter reuse:

You may already know the trick about making the date dynamic to whatever date the report gets rendered on by using the inline R execution mode of rmarkdown to insert a value.

---
title: "My report"
date: "`r Sys.Date()`"
output: pdf_document
---

What you may not already know is that YAML fields get evaluated sequentially so you can use a value created further up in the params section, to use it later in the block.

Click through to see how it’s done.

Comments closed