Press "Enter" to skip to content

Author: Kevin Feasel

Microsoft R Open 3.3.3

David Smith reports that Microsoft R Open 3.3.3 is now available:

Microsoft R Open (MRO), Microsoft’s enhanced distribution of open source R, has been upgraded to version 3.3.3, and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to R 3.3.3, upgrades the installer, and updates the bundled packages.

R 3.3.3 makes just a few minor fixes compared to R 3.3.2 (see the full list of changes here), so you shouldn’t encounter any compatibility issues when upgrading from MRO 3.3.2. For CRAN packages, MRO 3.3.3 points to CRAN snapshot taken on March 15, 2017 but as always, you can use the built-in checkpoint package to access packages from an earlier date (for compatibility) or a later date (to access new and updated packages).

Click through for more details.  As a side note, CRAN R 3.4 is scheduled for release this month, so given their recent cadence, I’d guess MRO 3.4 to be out late this year.

Comments closed

Network Navaigator Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Network Navigator Power BI Custom Visual.  You may find the need to use the Network Navigator when you’re trying to find links between different attributes in a dataset. It does this by visualizing each attribute as a node and the strength of activity between those nodes can be represented in multiple ways.

Click through to get to Devin’s video.  This visual looks interesting for graphical analysis, like trying to tease out common connections or discovering dependencies.

Comments closed

Improving Read-HostSpecial

Jana Sattainathan makes Jeff Hicks’s Read-HostSpecial better:

Some shortcomings of Read-HostSpecial

While it does a lot already, I did find some very minor shortcomings that I wanted to address

  1. Bad inputs killed the program – If you are prompting for a series of inputs (like a wizard) and the user mis-keyed one input by mistake, the error recovery is very hard and the user has to start-over from the beginning. This called for a RepromptOnError switch which issues a gentle warning and then allows the user to input a valid value upon encountering validation errors.

  2. I needed a couple of more canned validations like ValidateFolder and ValidateFile.

  3. Too bad, there is no Write-HostSpecial – I wanted Read-HostSpecial to display some pretty text and not wait for input (like Write-Host) using the same nomenclature for fonts/look/feel/usability as Read-HostSpecial. So, I needed a NoWait switch.

Read on for Jana’s script and usage examples.

Comments closed

Finding Identity Columns Close To Overflow

Rob Sewell demos another dbatools feature:

It would be very useful to be able to quickly see what the current values of the identity columns are and how close they are to being full so that we can plan for and be able to take action before we end up with shouty smart suits at our desk. If we could do it with just one line of code that would be even easier.
Step forward dbatools.  This PowerShell module is a community based project written by excellent, brilliant people in their own time and available to you free. To find out more and how to use and install it visit https://dbatools.io
There is a command called Test-DbaIdentityUsage This command was created by Brandon Abshire. You can find Brandon blogging at netnerds.net. Thank you Brandon

That’s quite useful for taking a quick look at identity columns across a database or instance.

Comments closed

DataRows Everywhere

Shane O’Neill shows how to deal with .NET DataRows in Powershell:

Now I don’t know about you but when I query stuff in a SQL database, it’s to do something to/with the results. They could be a list of servers that I monitor, they could be a list of databases that I want to check the recovery model of, or it could be a list of tables that I want to see how much space they are using. The main point is that I want to do something with the results.

But for this simple case, I just want to list out the customer name from this table. Simple? Yes, but this is just a test case to prove a point.

So let’s PowerShell this!

The moral of the story is to Get-Member early and Get-Member often.

Comments closed

Connection Retry Intervals

Arvind Shyamsundar explains a recent Management Studio change:

One such change SSMS got for free is the connection resiliency logic within the SqlConnection.Open() method. To improve the default experience for clients which connect to Azure SQL Database, the above method will (in the case of initial connection errors / timeouts) now retry 1 time after sleeping for 10 seconds. These numbers are configurable by properties called ConnectRetryCount (default value 1) and ConnectRetryInterval (default value 10 seconds.) The previous versions of the SqlConnection class would not automatically retry in cases of connection failure.

In general, because of these changes, transient errors (across slow networks or when working with Azure SQL Database) are less frequent. However, when you consider that a lot of SSMS users still use it with ‘regular’ SQL Server – either in a VM in the cloud or on-premises, there is a subtle but distinct impact of these changes which may affect administrators of ‘regular’ SQL Server databases.

Read on for details on why Management Studio hangs for 10-second periods, and see how to disable connection retry yourself.

Comments closed

Keeping Up To Date

Daniel Janik shows that query-level wait stats depend on the version of Management Studio:

The waitstats don’t appear at all in my older Surface which has a newer version of SQL. So what is 4202.2?  It’s a refresh for Master Data Services and R. Could that really have broken my query plan waitstats?

I doubt it but maybe. I updated to make the two equal. Did the waitstats go away?

No.

When reading the solution, it seems obvious, but this is a good reminder that there are a lot of moving parts here, and one of the early troubleshooting steps for “It works here, so why not over here?” types of issues is to make sure software is at the same version number.

Comments closed

Re-Incrementing Identity Columns

Kenneth Fisher explains how to change the increment value of a table which already has an identity column:

The first thing to do is remember that all tables are partitioned. Sort of. What we think of as non-partitioned tables are really just tables with a single partition. Every table is listed in sys.partitions and in fact you can use it to quickly see how many rows there are in a table. Since there is no partition scheme/function we can’t do splits or merges, but we can do a SWITCH.

What we are going to do is create a new, virtually identical table, then switch the data from the old table (partition) to the new table (partition). The trick is that while in order to do the switch almost everything has to be identical, the properties of the identity column are part of that almost.

I love these types of solutions:  hacks in the most positive connotation of the term.

Comments closed

Understanding Boosted Trees

Maria Jesus Alonso explains decision trees and their subsequent improvements:

Bagging (or Bootsrap Aggregating), the second prediction technique brought to the BigML Dashboard and API, uses a collection of trees (rather than a single one), each tree built with a different random subset of the original dataset for each model in the ensemble. Specifically, BigML defaults to a sampling rate of 100% (with replacement) for each model. This means some of the original instances will be repeated and others will be left out. Bagging performs well when a dataset has many noisy features and only one or two are relevant. In those cases, Bagging will be the best option.

Random Decision Forests extend the Bagging technique by only considering a random subset of the input fields at each split of the tree. By adding randomness in this process, Random Decision Forests help avoid overfitting. When there are many useful fields in your dataset, Random Decision Forests are a strong choice.

Click through for how boosted trees change this model a bit.

Comments closed

Columnstore Partitioning In SQL Server 2016

Niko Neugebauer demonstrates some performance improvements to partitioned columnstore indexes in SQL Server 2016:

266ms was the partitioned table under SQL Server 2016 (compatibility level 120) while 353ms of the total elapsed time was obtained on SQL Server 2014! This represents a solid 25% improvement

All execution plans will have the same iterators, but will differ on the overall estimated cost (the non-partitioned queries will be way lower than the partitioned ones), as well as the distribution of the estimated costs within the execution plan, but as for the rest – it will be quite similar, like the one shown on the image below:

These improvements were swamped by the aggregate predicate pushdown improvements in 2016, at least in Niko’s example, but I’ll take a free 25%-33% performance improvement.

Comments closed