Press "Enter" to skip to content

Month: August 2018

Configuring An Azure Runbook For Index Maintenance

Jim Donahoe explains how to perform index and statistics maintenance for Azure SQL Database, where you don’t have SQL Agent available:

I had a lot of issues when I created my first one, and after discussing with some folks, they had the same issues.  I searched for the best blog posts that I could find on the subject, and the one I LOVED the most was here: Arctic DBA.  He broke it down so simply, that I finally created my own pseudo installer and I wanted to share it with all of you.  Please, bear in mind, these code snippets may fail at anytime due to changes in Azure.

**IMPORTANT**

These next steps assume the following:

You have created/configured your Azure Automation Account and credential to use to execute this runbook.

Read on for a reasonably short Powershell script and a modified version of Ola Hallengren’s index maintenance procedures.

Comments closed

Using The system_health Extended Event Session

Matthew McGiffen walks us through what the system_health Extended Events session gives us:

When Microsoft introduced Extended Events (XE) in 2008, they also gave us a built-in XE session called system_health.

This is a great little tool. I mainly use it for troubleshooting deadlocks as it logs all the information for any deadlocks that occur. No more having to mess about making sure specific trace flags are enabled to ensure deadlock information is captured in the error log.

It also captures the SQL text and Session Id (along with other relevant data) in a number of other scenarios you may need to troubleshoot:

  • Where an error over severity 20 is encountered

  • Where a session has waited on a latch for over 15 seconds

  • Where a session has waited on a lock for over 30 seconds

  • Sessions that have encountered other long waits (the threshold varies by wait type)

Simply knowing what this session includes can give you a leg up on troubleshooting, especially when it’s a machine you haven’t seen before.

Comments closed

Wanted: Per-Database Wait Stat Collection Built In

Erik Darling wants configurable wait stat collections on a database level built into SQL Server:

I’m hoping that a feature like this could solve some intermediate problems that Query Store doesn’t.

Namely, being lower overhead, not collecting any PII, and not taking up a lot of disk space — after all, we’re not storing any massive stored proc text or query plans, here, just snapshots of wait stats.

This will help even if you’re already logging wait stats on your own. You still don’t have a clear picture of which database the problem is coming from. If you’ve got a server with lots of databases on it, figuring that out can be tough.

Understanding what waits (and perhaps bottlenecks) a single database is experiencing can also help admins figure out what kind of instance size they’d need as part of a migration, too.

It’s an interesting approach.  If you agree with Erik, go vote.

Comments closed

Performance Test: Loading CSV Versus Loading Excel In Power Query

Chris Webb lays out a performance test which shows how quickly Power Query can read data from a CSV versus from an Excel spreadsheet:

The black line in the graph above is the amount of data read (actually the offset values showing where in the file the data is read from, which is the same thing as a running total when Power Query is reading all the data) from the Excel file; the green line is the amount of data read from the CSV file (the same data shown in the first graph above). A few things to mention:

  • Running Process Monitor while this second query was refreshing had a noticeable impact on its performance – in fact it was almost 20 seconds slower

  • The initial values of 80 million bytes seem to be where data is read from the end of the Excel file. Maybe this is Power Query reading some file metadata? Anyway, it seems as though it takes 5 seconds before it starts to read the data needed by the query.

  • There’s a plateau between the 10 and 20 second mark where not much is happening; this didn’t happen consistently and may have been connected to the fact that Process Monitor was running

The results were remarkable; check them out.

Comments closed

Highlighting Data With gghighlight

Laura Ellis shows off the gghighlight package, which allows you to highlight selectively certain sets of data in ggplot:

While the above methodology is quite easy, it can be a bit of a pain at times to create and add the new data frame.  Further, you have to tinker more with the labelling to really call out the highlighted data points.

Thanks to Hiroaki Yutani, we now have the gghighlight package which does most of the work for us with a small function call!!   Please note that a lot of this code was created by looking at examples on her introduction document.

The new school way is even simplier:

  1. Using ggplot2, create a plot with your full data set.

  2. Add the gghighlight() function to your plot with the conditions set to identify your subset.

  3. Celebrate! This was one less step AND we got labels!

That’s a very cool package.  H/T R-Bloggers

Comments closed

Stoppable, Async Shiny Interfaces

Ian at Fells Stats wants to make a long-running Shiny app a bit more user-friendly:

Shiny operates in a reactive programming framework. Fundamentally this means that any time any UI element that affects the result changes, so does the result. This happens automatically, with your analysis code running every time a widget is changed. In a lot of cases, this is exactly what you want and it makes Shiny programs concise and easy to make; however in the case of long running processes, this can lead to frozen UI elements and a frustrating user experience.

The easiest solution is to use an Action Button and only run the analysis code when the action button is clicked. Another important component is to provide your user with feedback as to how long the analysis is going to take. Shiny has nice built in progress indicators that allow you to do this.

There are a couple of false starts in there but by the time you reach the third act, the story makes sense.  H/T R-Bloggers

Comments closed

Classes And Vectors In R

Dave Mason continues his journey toward learning R.  He looks next at the class() function:

Note the value assigned to horse_power is a whole number (integer) and the value assigned to miles_per_gallon is a rational number. But R tells us they are both of the “numeric” class. R does have an integer class. A variable’s class will be an integer if the value is followed by a capital “L”. Let’s reassign a value to horse_power to demonstrate:

> horse_power <- 240L
> class(horse_power)
[1] "integer"

Another way to determine the class of a variable is to use one of the is.*() functions. For example, is.integer() and is.numeric() tell us the miles_per_gallon is not an integer, and is a numeric:

> is.integer(miles_per_gallon)
[1] FALSE
> is.numeric(miles_per_gallon)
[1] TRUE

There’s also the typeof() function and the mode() function, and all three can differ under certain circumstances.

Next up, Dave hits vectors, the simplest of the interesting data types in R:

It’s important to know that the elements of a vector must be of the same class (data type). If the values passed to the c() function are of different classes, some of them will be coerced to a different class to ensure all classes of the vector are the same. Below, the parameter classes passed to the c() function include character, numeric, and integer. The corresponding numeric and integer parameter values are coerced to character within the vector:

> some_data <- c("a", "b", 7.5, 25L)
> some_data
[1] "a"   "b"   "7.5" "25" 
>

Read on for more about vectors.

Comments closed

Configuring SQL Server Management Studio

Brent Ozar shares his configuration settings for SQL Server Management Studio:

Under Query Results, SQL Server, Results to Grid, I change my XML data size to unlimited so that it brings back giant query plans. (Man, does my job suck sometimes.)

A lot of presenters like to check the box for “Display results in a separate tab” and “Switch to results tab after the query executes” because this gives them more screen real estate for the query and results. I’m just really comfortable with Control-R to hide the results pane.

And I just went and removed a bunch of menu bar icons I never use…  Good advice from Brent.

Comments closed

Pivoting And Unpivoting Data In T-SQL

Jeanne Combrinck shows how to use the PIVOT and UNPIVOT operators in SQL Server:

One thing that I still get confused about writing is pivot queries. I find myself needing to lookup the syntax every time. Basically you use Pivot and Unpivot to change the output of a table. If you would like rows turned into columns you can use pivot and for the opposite you can use unpivot.

One thing to note is the column identifiers in the unpivot clause follow the catalog collation. For SQL Database, the collation is always SQL_Latin_General_CP1_CI_AS. For SQL Server partially contained databases, the collation is always Latin1_General_100_CI_AS_KS_WS_SC. If the column is combined with other columns, then a collate clause (COLLATE DATABASE_DEFAULT) is required to avoid conflicts.

Click through for an example of each.

Comments closed

Blocking A Truncate Statement

Arun Sirpal shows that the TRUNCATE command needs to take locks like any other data modification command:

The truncate option is fast and efficient but did you know that it takes a certain lock where you could actually be blocked?

What am I talking about? When you issue a truncate it takes a Sch-M lock and it uses this when it is moving the allocation units to the deferred drop queue. So if it takes this lock and you look at the locking compatibility matrix below you will see what can cause a conflict (C).

Arun includes an image which shows what can block what, and also shows us an example.

Comments closed