Press "Enter" to skip to content

Month: May 2017

Microsoft R Open 3.4.0

David Smith announces Microsoft R Open 3.4.0:

R 3.4.0 (upon which MRO 3.4.0 is based) is a major update to the R language, with many fixes and improvements. Most notably, R 3.4.0 introduces a just-in-time (JIT) compiler to improve performance of the scripts and functions that you write. There have been a few minor tweaks to the language itself, but in general functions and packages written for R 3.3.x should work the same in R 3.4.0. As usual, MRO points to a fixed CRAN snapshot from May 1 2017, but you can use the built-in checkpoint package to access packages from an earlier date (for compatibility) or a later date (to access new and updated packages).

The version of Microsoft R Server shipping with SQL Server 2017 will still be based on 3.3.3, but I’m going to guess that a new version of Microsoft R Server supporting 3.4.0 will ship in the next several months.

Comments closed

Disabling The Identity Cache

Andrew Pruski looks at a configuration setting in SQL Server 2017 which prevents SQL Server from bumping the next identity value on failover or restart:

Now we can check the data in the table: –

SELECT * FROM [dbo].[TestTable]
GO

As expected, there’s a gap in the IDs. Why it jumps to 1002 is discussed in the connect item.

OK, now let’s try running the same code again but this time we will disable the identity cache.

This doesn’t eliminate gaps altogether; those can still happen upon rollback of a transaction.  This is reason #1 why you should not use identity columns as proper sequences.

Comments closed

Building A Concatenated Tooltip In Power BI

Devin Knight has started a new series, walking through problems his clients have faced implementing Power BI solutions.  In this edition, Devin wants to build a comma-delimited list to display on a tooltip:

This works perfectly for Stock because it automatically summarizes the value but, you’ll notice above that the tooltip for Subcategory has an interesting behavior. Rather than displaying the list of the values in Subcategory it actually just show the very first value. This happens because the Tooltip field requires that any column used in it be able to aggregate or roll up the values into what’s shown on the chart. Since Subcategory is just a text field Power BI automatically applies the FIRST function to return back the first value that appears. You could optionally change this from FIRST to either LAST, COUNT, or COUNTDISTINCT.

So the real problem I want to solve here is rather than only showing the first subcategory how do I list all the subcategories in a comma separated list in the tooltip? Let’s walk through a couple possible designs to this solution.

Read on for two different designs, including the code to implement the solutions.

Comments closed

Fisher’s Exact Test

Mala Mahadevan explains Fisher’s Exact Test and provides examples in T-SQL and R:

The decision rule in two sample tests of hypothesis depends on three factors :
1 Whether the test is upper, lower or two tailed (meaning the comparison is greater, lesser or both sides of gender and speaker count)
2 The level of significance or degree of accuracy needed,
3 The form of test statistic.
Our test here is to just find out if gender and speaker count are related so it is a two tailed test. The level of significance we can use is the most commonly used 95% which is also the default in R for Fischer’s Test. The form of the test statistic is P value. So our decision rule would be that gender and speaker category are related if P value is less than 0.05.

Click through for the R code followed by a code sample which should explain why you don’t want to do it in T-SQL.

1 Comment

Generating Comma-Delimited Strings

Shane O’Neill has a good reason to upgrade to SQL Server 2017, which is not having to deal with FOR XML PATH hacks anymore:

Commas are all the rage nowadays:

There are a fair number of questions nowadays about returning data from a database in a comma separated string. Sure the application should probably do that but hey, database servers are expensive, why not get some bang for your bucks!

Protip:  I use the colon as a separator because sometimes commas will get stuck in the Ethernet cable.

Comments closed

Understanding CROSS APPLY

Andy Levy has a T-SQL programming breakthrough:

Finally, this week I had a breakthrough. I was working on updating a bunch of data but it was breaking on a small subset of that data. In this case, I was attempting to JOIN two tables on fields that should have been INTs, but in a very small number of cases one side was using a comma-delimited string. The user told me that someone else had done these updates in the past and didn’t encounter the problem I was having (so I knew that it was something i was doing “wrong”), but given that it was only a handful of broken updates she was OK with manually doing the updates (we were scripting it because we were updating potentially tens of thousands of records).

I am not OK with manually fixing this in the future. I wanted to know how the other DBA had done it before. I dug into some history and found CROSS APPLY. My nemesis. I was determined to figure out how to use it this time.

The APPLY operator is extremely powerful in the right set of circumstances.  Andy shows the “classic” use case, but there are a number of other uses for the operator.

Comments closed

Figuring Out Virtual Sockets And Cores

Denny Cherry looks at a few considerations regarding virtual sockets and cores for a VM running SQL Server:

Standard Edition

You wants 1×6 (one socket, 6 cores) because standard edition will only use the first 4 sockets in a server (up to 16 cores combined). There’s no getting around that.

From a NUMA perspective as long a vNUMA at the Hypervisor is disabled then it doesn’t matter as SQL Server standard edition isn’t NUMA aware (NUMA awareness is an Enterprise Edition feature).

Read on for a more nuanced answer when it comes to Enterprise Edition.

Comments closed

Using mssql-scripter

David Alcock looks at a new Python-based command line tool, mssql-scripter:

Earlier today two new command line tools were announced for SQL Server, one an experimental Linux tools DBFS which enables access to live DMVs without using a UI like SSMS and secondly a tool that enables script generation of objects within SQL rather like the Generate SQL Scripts option in SSMS.

In this post I’m going to run through the installation of the script generator tool and provide a very quick demo. The reason I’m going through this is because in order to install the tool we need to use something called PIP. PIP is a package management system that enables us install and use packages written in Python. Yeah, Python again!

I’m pretty interested in DBFS, as it seems well-placed to make crusty Linux sysadmins happier with SQL Server, and that’s a big positive in my book.

Comments closed

Collecting Rows Written From SSISDB

Andy Leonard has a script showing how to get the number of records written to an OLEDB destination, as reflected in the SSIS Catalog:

I prefer to passively gather performance metrics – even if it’s a little convoluted and more work (for me). I don’t often need these metrics immediately available, so I execute queries similar to the query below and store the performance metrics in a table. I can (and do) build dashboards to track SSIS performance (perhaps I should blog about them…) using this passively-collected performance metrics.

As with Lookup Transformation messages, OLE DB Destinations in Data Flow Tasks record specific messages using the same format every time. Because of this (hard-coded) consistency, you and I can passively collect the number of rows written while executing packages in the SSIS Catalog using the (default) Basic logging level. We can use the following Transact-SQL query to collect this execution metadata post-execution:

Click through for the script.

Comments closed