R 3.4.0 Now Available

Kevin Feasel

2017-04-24

R

A new version of R is now available:

  • Accumulating vectors in a loop is faster – Assigning to an element of a vector beyond the current length now over-allocates by a small fraction. The new vector is marked internally as growable, and the true length of the new vector is stored in the truelength field. This makes building up a vector result by assigning to the next element beyond the current length more efficient, though pre-allocating is still preferred. The implementation is subject to change and not intended to be used in packages at this time.

There’s a big list of changes, so check it out and think about upgrading.

Building An Online R Training Environment

Kevin Feasel

2017-04-24

R

Steph Locke has shared how she put together a training lab for her R workshop:

This starts with the tidyverse & Rstudio then:

  • adds the requisite programs for dependencies in my package and whois for mkpasswd to be able to work

  • installs packages from github, notably the one designed to facilitate the day of text analysis

  • get the shell script and the csv from the gist

  • make the shell script executable and then run it

I loved the business card touch.  It’s easy enough to print out little strips of paper with the username and password, but this has a bit more staying power.

Getting Off Of Profiler, A Twelve-Step Program

Wayne Sheffield has a blast from the past, repeating an old T-SQL Tuesday to show how to use Extended Events:

Now that you have this XE session scripted out, it can be easily installed on multiple servers. If you encounter a deadlock problem, you can easily start the XE session and let it run to trap your deadlocks. They will be persisted to a file dedicated for the deadlocks. You can use my Deadlock Shredder script at http://bit.ly/ShredDL to read the deadlocks from the file and shred the deadlock XML into a tabular output.

Note that the default system_health XE session also captures deadlocks. I like to have a dedicated session for just deadlocks. As lightweight as XE is, sometimes it may benefit a server to turn off the system_health session. Additionally, Jonathan Kehayias has a script that will take a running trace and completely script out an XE session for it. This script can be found at https://www.sqlskills.com/blogs/jonathan/converting-sql-trace-to-extended-events-in-sql-server-2012/. Even though this script is available, I like to figure things out for myself so that I can learn what is actually going on.

Extended Events are extremely useful for administrators, typically with a fraction of the overhead cost of  server-side (much less Profiler) traces.

Secret Mysteries Of SQL Server 2017

Erik Darling goes spelunking for new and modified internal objects in SQL Server 2017:

sys.dm_db_missing_index_group_stats

sys.dm_db_missing_index_group_stats
What have we here?

 

Of particular interest are last_sql_handle, query_hash, and query_plan_hash. It appears that we’ll finally be able to easily tie missing index requests to their queries, without doing a lot of painful XML processing. I had planned on adding something like this, but couldn’t find a good fit between 1) adding XML processing to sp_BlitzIndex, or adding more DMV queries and rather unpleasant XML processing to sp_BlitzCache. This will make implementing it far easier, assuming it works the way it looks like it will work.

Erik has three examples of interesting additions in CTP 2.0.

Diagnosing Database Restore Wait Times

Bob Ward notes that the “100 percent processed” message doesn’t mean everything is quite finished yet in a database restoration:

Notice the “100 percent…” message has detailed about “bytes processed”. Since my data is around 13Mb this tells me that the progress indicators are all about the data transferred step of RESTORE. Notice the time gap in the messages to “Waiting for Log zeroing…” and “Log Zeroing is complete”. That gap in time is around 2 minutes. Exactly the time it took between the the 100% Complete message in the SSMS Window and the final restore message.

From this evidence I can conclude that having a transaction log file size that takes a long time to initialize can be a possible cause of seeing the behavior of getting the 100% Complete Message quickly but the overall RESTORE takes longer to complete.

There’s a lot worth reading packed into this post, as you’d expect from Bob.  Read the whole thing.

Fitness In Modeling

Leila Etaati notes the Scylla and Charybdis of models:

However, in the most machine learning experiences, we will face two risks :Over fitting and under fitting.
I will explain these two concepts via an example below.
imagine that we have collected information about the number of coffees that have been purchased in a café from 8am to 5pm.

Overfitting tends to be a bigger problem in my experience, but they’re both dangerous.

Dealing With NULL

Kevin Feasel

2017-04-24

Syntax

Jeff Mlakar has a pair of comparisons for NULL handling, with ISNULL vs COALESCE and CONCAT vs + for concatenation:

We expect this much from IsNull. However, coalesce is a little different. will take the data type from the first non-null value passed and use that for the table definition. This might not always be what you want because if you pass bits you might get integers. If you pass an array of integers and floats you will get numeric. Be aware if this isn’t what you wanted.

Read the whole thing.

Resumable Online Index Rebuilds

Niko Neugebauer introduces a new feature in SQL Server 2017:

After multiple executions, the first process (Resumable Online Index Rebuild) on the average took 65.8 seconds, while the second one (a simple online) took only 60.8 seconds, representing 8% of the improvement of the overall performance. I can’t say if it looks acceptable to you or not, but for me this is something I will be definitely considering to be as an advantage for the cases where the resumable process is needed.

I decided to run a test on much bigger table, the lineitem which for 10GB TPCH database contains 60 Million Rows. My expectation here was to see if the percentage would stay the same or will jump to a whole new level (please make sure that you do execute the following script at least a couple of times, to get the real results and not the results of your disk-drive prefetching :)):

The big table example result was somewhat surprising.  Niko is his normal, informative self, so definitely read the whole thing.

Graph Data In SQL Server

Terry McCann has a first look at SQL Server 2017’s graph data capabilities:

SQL Graph is a similar concept to what is described above, but built in to the core SQL Server engine. This means 2 new table types NODE and EDGE and a few new TSQL functions in particular MATCH(). SQL Graph at the time of writing is only available in SQL 2017 ctp 2.0. You can read more and download ctp2.0 here https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/19/sql-server-2017-community-technology-preview-2-0-now-available/. Once ctp 2.0 is installed there is nothing else you need to do to enable the new graph syntax and storage.

There is an example you can download from Microsoft which is a similar set up to the example in the image above. However I have used some real data shredded from IMDB the internet movie database. This data is available to download from Kaggle https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset

Click through for a video demonstration as well.

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930