Press "Enter" to skip to content

Curated SQL Posts

Waits: External But Not Preemptive

Ewald Cress has some thoughts on external wait types:

Previously I dug into preemptive waits in SQLOS, and to be honest, I equated “preemptive” with “external”. For the most part the two go hand in hand after all.

To recap, a preemptive wait isn’t necessarily a wait at all. What happens is that a worker needs to run some code that can’t be trusted to play by cooperative scheduling rules. And rather than put the SQLOS scheduler (and all its sibling workers) at the mercy of that code, the worker detaches itself from the scheduler and cedes control to a sibling runnable worker.

Read the whole thing.

Comments closed

Built-In SQL Server Functions

Tywan Terrell has a listing of various functions built into SQL Server:

SQL Server starting with 2012 ship with a robust set of functions that can be used to make code perform faster, with fewer lines of code. The functions  can be used in ETL Process to provide better error handling. A example of this would be the Try_Parse function that allows you to check if a value can be converted.

Another example would be using the FIRST_VALUE() and LAST_VALUE() functions which work against a set of data. These functions are very useful when looking for things like month over month averages and when doing calculations. The table below contain a list of function that are supported starting with 2012 along with some examples of how to use them.

He breaks them down into four categories and provides examples.  Functions can bring their own set of problems with query performance, but most of them can be very useful.

Comments closed

Finding Database Growth Events

Arun Sirpal has an Extended Events session which tracks growth events on a database file:

A quick post that is hopefully useful, I wanted a quick way to find the time, size of the database file size change and who caused it.

I went down the extended events route using sqlserver.database_file_size_change event. By the way I am no Extended Events expert, I write a lot via trial and error I am trying to wean off Profiler.

Read on for the script as well as a query which shreds the XML and returns a result set.

Comments closed

Spark Changes In HDP 2.6

Vinay Shukla and Syed Mahmood talk about what’s new with Spark and Zeppelin in the Hortonworks Data Platform 2.6 update:

SPARKR & PYSPARK

Most data scientists use R & Python and with SparkR & PySpark respectively they can continue to leverage their familiarity with the R & Python languages. However, they need to use the Spark API to leverage Machine learning with Spark and to take advantage of distributed computations. Both SparkR & PySpark are evolving rapidly and SparkR now supports a number of machine learning algorithms such as LDA, ALS, RF, GMM GBT etc. Another key improvement in SparkR is the ability to deploy a package interactively. This will help Data Scientists deploy their favorite R package in their own environment without stepping on other users.

PySpark now also supports deploying VirtualEnv and this will allow PySpark users to deploy their libraries in their own individual deployments.

There are several large changes, so check it out.

Comments closed

Custom ggplot2 Subplots

Ilya Kashnitsky shows how to create custom subplots using ggplot2:

Actually, ggplot2 is a very powerful and flexible tool that allows to draw figures with quite a complex layout. Today I want to show the code that aligns six square plots (actually, maps) just as in the figure above. And it’s all about the handy function ggplot2::annotation_custom(). Since I used the layout more than once, I wrapped the code that produced it into a function that takes a list of 6 square plots as an input and yields the arranged figure with arrows as an output. Here is the commented code of the function.

This is the difference between “I’m just going to throw some stuff on there” (which is how I tend to operate) versus well thought out visual layout.

Comments closed

Microsoft R Open 3.4.0

David Smith announces Microsoft R Open 3.4.0:

R 3.4.0 (upon which MRO 3.4.0 is based) is a major update to the R language, with many fixes and improvements. Most notably, R 3.4.0 introduces a just-in-time (JIT) compiler to improve performance of the scripts and functions that you write. There have been a few minor tweaks to the language itself, but in general functions and packages written for R 3.3.x should work the same in R 3.4.0. As usual, MRO points to a fixed CRAN snapshot from May 1 2017, but you can use the built-in checkpoint package to access packages from an earlier date (for compatibility) or a later date (to access new and updated packages).

The version of Microsoft R Server shipping with SQL Server 2017 will still be based on 3.3.3, but I’m going to guess that a new version of Microsoft R Server supporting 3.4.0 will ship in the next several months.

Comments closed

Disabling The Identity Cache

Andrew Pruski looks at a configuration setting in SQL Server 2017 which prevents SQL Server from bumping the next identity value on failover or restart:

Now we can check the data in the table: –

SELECT * FROM [dbo].[TestTable]
GO

As expected, there’s a gap in the IDs. Why it jumps to 1002 is discussed in the connect item.

OK, now let’s try running the same code again but this time we will disable the identity cache.

This doesn’t eliminate gaps altogether; those can still happen upon rollback of a transaction.  This is reason #1 why you should not use identity columns as proper sequences.

Comments closed

Building A Concatenated Tooltip In Power BI

Devin Knight has started a new series, walking through problems his clients have faced implementing Power BI solutions.  In this edition, Devin wants to build a comma-delimited list to display on a tooltip:

This works perfectly for Stock because it automatically summarizes the value but, you’ll notice above that the tooltip for Subcategory has an interesting behavior. Rather than displaying the list of the values in Subcategory it actually just show the very first value. This happens because the Tooltip field requires that any column used in it be able to aggregate or roll up the values into what’s shown on the chart. Since Subcategory is just a text field Power BI automatically applies the FIRST function to return back the first value that appears. You could optionally change this from FIRST to either LAST, COUNT, or COUNTDISTINCT.

So the real problem I want to solve here is rather than only showing the first subcategory how do I list all the subcategories in a comma separated list in the tooltip? Let’s walk through a couple possible designs to this solution.

Read on for two different designs, including the code to implement the solutions.

Comments closed

Fisher’s Exact Test

Mala Mahadevan explains Fisher’s Exact Test and provides examples in T-SQL and R:

The decision rule in two sample tests of hypothesis depends on three factors :
1 Whether the test is upper, lower or two tailed (meaning the comparison is greater, lesser or both sides of gender and speaker count)
2 The level of significance or degree of accuracy needed,
3 The form of test statistic.
Our test here is to just find out if gender and speaker count are related so it is a two tailed test. The level of significance we can use is the most commonly used 95% which is also the default in R for Fischer’s Test. The form of the test statistic is P value. So our decision rule would be that gender and speaker category are related if P value is less than 0.05.

Click through for the R code followed by a code sample which should explain why you don’t want to do it in T-SQL.

1 Comment

Generating Comma-Delimited Strings

Shane O’Neill has a good reason to upgrade to SQL Server 2017, which is not having to deal with FOR XML PATH hacks anymore:

Commas are all the rage nowadays:

There are a fair number of questions nowadays about returning data from a database in a comma separated string. Sure the application should probably do that but hey, database servers are expensive, why not get some bang for your bucks!

Protip:  I use the colon as a separator because sometimes commas will get stuck in the Ethernet cable.

Comments closed