Press "Enter" to skip to content

Month: May 2017

Visualizing Emergency Room Visits

Eugene Joh has a great blog post showing how to parse ICD-9 codes using regular expressions and then visualize the results as a treemap:

It looks like there is a header/title at [1], numeric grouping  at [2] “1.\tINFECTIOUS AND PARASITIC DISEASES”,  subgrouping by ICD-9 code ranges, at [3] “Intestinal infectious diseases (001-009)” and then 3-digit ICD-9 codes followed by a specific diagnosis, at [10] “007\tOther protozoal intestinal diseases”. At the end we want to produce three separate data frames that we’ll categorize as:

  1. Groups: the title which contains the general diagnosis grouping

  2. Subgroups: the range of ICD-9 codes that contain a certain diagnosis subgroup

  3. Classification: the specific 3-digit ICD-9 code that corresponds with a diagnosis

It’s a beefy article full of insight.

Comments closed

Subplots In Maps

Ilya Kashnitsky shows how to embed subplots within a map using ggplot2:

So, with this map I want to show the location of more and less urbanized NUTS-2 regions of Europe. But I also want to show – with subplots – how I defined the three subregions of Europe (Eastern, Southern, and Western) and what is the relative frequency of the three categories of regions (Predominantly Rural, Intermediate, and Predominantly Rural) within each of the subregions. The logic of actions is simple: first prepare all the components, then assemble them in a composite plot. Let’s go!

This is very useful information, well worth the read.

Comments closed

Scripting Foreign Key Constraints

Louis Davidson has a process to script out foreign key constraints and includes an example which loads those constraints into a utility table:

As noted in my previous post, I am in the middle of building a utility (for work, and for my next SQLBLOG post), that will help when you need to drop the foreign key constraints on a table, but then replace them without having to save off the script manually. In that post, I showed how to manually create a foreign key constraint in three possible states.

Next, in order to create the utility to script the FOREIGN KEY constraints before dropping them, I need a way to script the constraint. In this post I present a user defined function that will generate a script that does the scripting of a foreign key constraint. The code is largely based on some code from Aaron Bertrand in this blog (with his permission naturally!) with a few edits to script constraints as enabled, disabled or untrusted if the source constraint was in that condition (or you can force the constraints to a certain way if you so desire as well.)

This could be particularly useful for ETL jobs.

Comments closed

Computed Column Performance

Paul White has a great article on when computed columns perform poorly:

A major cause of poor performance is a simple failure to use an indexed or persisted computed column value as expected. I have lost count of the number of questions I have had over the years asking why the optimizer would choose a terrible execution plan when an obviously better plan using an indexed or persisted computed column exists.

The precise cause in each case varies, but is almost always either a faulty cost-based decision (because scalars are assigned a low fixed cost); or a failure to match an expanded expression back to a persisted computed column or index.

The match-back failures are especially interesting to me, because they often involve complex interactions with orthogonal engine features. Equally often, the failure to “match back” leaves an expression (rather than a column) in a position in the internal query tree that prevents an important optimization rule from matching. In either case, the outcome is the same: a sub-optimal execution plan.

Definitely read the whole thing if you’re thinking about setting trace flag 176 on.

Comments closed

Finding Queries Which Generate Waits

Kendra Little explains how to find which queries are causing waits in SQL Server:

I once had an extremely busy system where we had growing CMEMTHREAD waits. This is an unusual wait, and our question was: is this being caused by a single query, just a few queries, or all queries? We were able to answer this by setting up an extended events trace and looking at waits by query, but this had some downsides:

  1. Extended Events has no GUI in 2008R2, and setting up and testing the scripts took a bit of time (more minor issue)
  2. Generating wait information by query on a very busy system generates a lot of output, so we had to be careful to set up sampling and filtering so we didn’t impact performance (more major issue)

But we were able to use this to figure out that the wait was associated with all queries, not a few queries, which helped us down our troubleshooting path.

Kendra answers this (mostly) in the context of SQL Server 2008 R2, as that was the version the questioner had, but she does mention where later versions make life easier.

Comments closed

Waits: External But Not Preemptive

Ewald Cress has some thoughts on external wait types:

Previously I dug into preemptive waits in SQLOS, and to be honest, I equated “preemptive” with “external”. For the most part the two go hand in hand after all.

To recap, a preemptive wait isn’t necessarily a wait at all. What happens is that a worker needs to run some code that can’t be trusted to play by cooperative scheduling rules. And rather than put the SQLOS scheduler (and all its sibling workers) at the mercy of that code, the worker detaches itself from the scheduler and cedes control to a sibling runnable worker.

Read the whole thing.

Comments closed

Built-In SQL Server Functions

Tywan Terrell has a listing of various functions built into SQL Server:

SQL Server starting with 2012 ship with a robust set of functions that can be used to make code perform faster, with fewer lines of code. The functions  can be used in ETL Process to provide better error handling. A example of this would be the Try_Parse function that allows you to check if a value can be converted.

Another example would be using the FIRST_VALUE() and LAST_VALUE() functions which work against a set of data. These functions are very useful when looking for things like month over month averages and when doing calculations. The table below contain a list of function that are supported starting with 2012 along with some examples of how to use them.

He breaks them down into four categories and provides examples.  Functions can bring their own set of problems with query performance, but most of them can be very useful.

Comments closed

Finding Database Growth Events

Arun Sirpal has an Extended Events session which tracks growth events on a database file:

A quick post that is hopefully useful, I wanted a quick way to find the time, size of the database file size change and who caused it.

I went down the extended events route using sqlserver.database_file_size_change event. By the way I am no Extended Events expert, I write a lot via trial and error I am trying to wean off Profiler.

Read on for the script as well as a query which shreds the XML and returns a result set.

Comments closed

Spark Changes In HDP 2.6

Vinay Shukla and Syed Mahmood talk about what’s new with Spark and Zeppelin in the Hortonworks Data Platform 2.6 update:

SPARKR & PYSPARK

Most data scientists use R & Python and with SparkR & PySpark respectively they can continue to leverage their familiarity with the R & Python languages. However, they need to use the Spark API to leverage Machine learning with Spark and to take advantage of distributed computations. Both SparkR & PySpark are evolving rapidly and SparkR now supports a number of machine learning algorithms such as LDA, ALS, RF, GMM GBT etc. Another key improvement in SparkR is the ability to deploy a package interactively. This will help Data Scientists deploy their favorite R package in their own environment without stepping on other users.

PySpark now also supports deploying VirtualEnv and this will allow PySpark users to deploy their libraries in their own individual deployments.

There are several large changes, so check it out.

Comments closed

Custom ggplot2 Subplots

Ilya Kashnitsky shows how to create custom subplots using ggplot2:

Actually, ggplot2 is a very powerful and flexible tool that allows to draw figures with quite a complex layout. Today I want to show the code that aligns six square plots (actually, maps) just as in the figure above. And it’s all about the handy function ggplot2::annotation_custom(). Since I used the layout more than once, I wrapped the code that produced it into a function that takes a list of 6 square plots as an input and yields the arranged figure with arrows as an output. Here is the commented code of the function.

This is the difference between “I’m just going to throw some stuff on there” (which is how I tend to operate) versus well thought out visual layout.

Comments closed