Press "Enter" to skip to content

Curated SQL Posts

Collations

Robert Sheldon has an article on collations:

The ideal solution is to choose a collation when setting up SQL Server that can be used for all your user databases and character columns. Using one collation removes any issues you might encounter when querying the data in different ways. It can also be the best approach in terms of performance if multiple collations impact your queries. However, this approach works only if the same language and collation settings are appropriate for all your users and applications—or at least a good majority of them.

If you support multi-cultural environments, you’ll need to take into account a number of considerations. To begin with, you should pick collations that support the most users, and you should use Unicode data types where possible because they can help avoid code page conversion issues. Just keep in mind the storage requirements that come with Unicode’s two bytes per character.

My inclination is to say Unicode everywhere possible.  There are cases in which Unicode doesn’t fit, but it’s easy to do and if you have enough data to worry about the extra bytes Unicode characters take up, Unicode compression is available.

Comments closed

Microsoft R Open 3.3.3

David Smith reports that Microsoft R Open 3.3.3 is now available:

Microsoft R Open (MRO), Microsoft’s enhanced distribution of open source R, has been upgraded to version 3.3.3, and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to R 3.3.3, upgrades the installer, and updates the bundled packages.

R 3.3.3 makes just a few minor fixes compared to R 3.3.2 (see the full list of changes here), so you shouldn’t encounter any compatibility issues when upgrading from MRO 3.3.2. For CRAN packages, MRO 3.3.3 points to CRAN snapshot taken on March 15, 2017 but as always, you can use the built-in checkpoint package to access packages from an earlier date (for compatibility) or a later date (to access new and updated packages).

Click through for more details.  As a side note, CRAN R 3.4 is scheduled for release this month, so given their recent cadence, I’d guess MRO 3.4 to be out late this year.

Comments closed

Network Navaigator Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Network Navigator Power BI Custom Visual.  You may find the need to use the Network Navigator when you’re trying to find links between different attributes in a dataset. It does this by visualizing each attribute as a node and the strength of activity between those nodes can be represented in multiple ways.

Click through to get to Devin’s video.  This visual looks interesting for graphical analysis, like trying to tease out common connections or discovering dependencies.

Comments closed

Improving Read-HostSpecial

Jana Sattainathan makes Jeff Hicks’s Read-HostSpecial better:

Some shortcomings of Read-HostSpecial

While it does a lot already, I did find some very minor shortcomings that I wanted to address

  1. Bad inputs killed the program – If you are prompting for a series of inputs (like a wizard) and the user mis-keyed one input by mistake, the error recovery is very hard and the user has to start-over from the beginning. This called for a RepromptOnError switch which issues a gentle warning and then allows the user to input a valid value upon encountering validation errors.

  2. I needed a couple of more canned validations like ValidateFolder and ValidateFile.

  3. Too bad, there is no Write-HostSpecial – I wanted Read-HostSpecial to display some pretty text and not wait for input (like Write-Host) using the same nomenclature for fonts/look/feel/usability as Read-HostSpecial. So, I needed a NoWait switch.

Read on for Jana’s script and usage examples.

Comments closed

Finding Identity Columns Close To Overflow

Rob Sewell demos another dbatools feature:

It would be very useful to be able to quickly see what the current values of the identity columns are and how close they are to being full so that we can plan for and be able to take action before we end up with shouty smart suits at our desk. If we could do it with just one line of code that would be even easier.
Step forward dbatools.  This PowerShell module is a community based project written by excellent, brilliant people in their own time and available to you free. To find out more and how to use and install it visit https://dbatools.io
There is a command called Test-DbaIdentityUsage This command was created by Brandon Abshire. You can find Brandon blogging at netnerds.net. Thank you Brandon

That’s quite useful for taking a quick look at identity columns across a database or instance.

Comments closed

DataRows Everywhere

Shane O’Neill shows how to deal with .NET DataRows in Powershell:

Now I don’t know about you but when I query stuff in a SQL database, it’s to do something to/with the results. They could be a list of servers that I monitor, they could be a list of databases that I want to check the recovery model of, or it could be a list of tables that I want to see how much space they are using. The main point is that I want to do something with the results.

But for this simple case, I just want to list out the customer name from this table. Simple? Yes, but this is just a test case to prove a point.

So let’s PowerShell this!

The moral of the story is to Get-Member early and Get-Member often.

Comments closed

Connection Retry Intervals

Arvind Shyamsundar explains a recent Management Studio change:

One such change SSMS got for free is the connection resiliency logic within the SqlConnection.Open() method. To improve the default experience for clients which connect to Azure SQL Database, the above method will (in the case of initial connection errors / timeouts) now retry 1 time after sleeping for 10 seconds. These numbers are configurable by properties called ConnectRetryCount (default value 1) and ConnectRetryInterval (default value 10 seconds.) The previous versions of the SqlConnection class would not automatically retry in cases of connection failure.

In general, because of these changes, transient errors (across slow networks or when working with Azure SQL Database) are less frequent. However, when you consider that a lot of SSMS users still use it with ‘regular’ SQL Server – either in a VM in the cloud or on-premises, there is a subtle but distinct impact of these changes which may affect administrators of ‘regular’ SQL Server databases.

Read on for details on why Management Studio hangs for 10-second periods, and see how to disable connection retry yourself.

Comments closed

Keeping Up To Date

Daniel Janik shows that query-level wait stats depend on the version of Management Studio:

The waitstats don’t appear at all in my older Surface which has a newer version of SQL. So what is 4202.2?  It’s a refresh for Master Data Services and R. Could that really have broken my query plan waitstats?

I doubt it but maybe. I updated to make the two equal. Did the waitstats go away?

No.

When reading the solution, it seems obvious, but this is a good reminder that there are a lot of moving parts here, and one of the early troubleshooting steps for “It works here, so why not over here?” types of issues is to make sure software is at the same version number.

Comments closed

Re-Incrementing Identity Columns

Kenneth Fisher explains how to change the increment value of a table which already has an identity column:

The first thing to do is remember that all tables are partitioned. Sort of. What we think of as non-partitioned tables are really just tables with a single partition. Every table is listed in sys.partitions and in fact you can use it to quickly see how many rows there are in a table. Since there is no partition scheme/function we can’t do splits or merges, but we can do a SWITCH.

What we are going to do is create a new, virtually identical table, then switch the data from the old table (partition) to the new table (partition). The trick is that while in order to do the switch almost everything has to be identical, the properties of the identity column are part of that almost.

I love these types of solutions:  hacks in the most positive connotation of the term.

Comments closed

Understanding Boosted Trees

Maria Jesus Alonso explains decision trees and their subsequent improvements:

Bagging (or Bootsrap Aggregating), the second prediction technique brought to the BigML Dashboard and API, uses a collection of trees (rather than a single one), each tree built with a different random subset of the original dataset for each model in the ensemble. Specifically, BigML defaults to a sampling rate of 100% (with replacement) for each model. This means some of the original instances will be repeated and others will be left out. Bagging performs well when a dataset has many noisy features and only one or two are relevant. In those cases, Bagging will be the best option.

Random Decision Forests extend the Bagging technique by only considering a random subset of the input fields at each split of the tree. By adding randomness in this process, Random Decision Forests help avoid overfitting. When there are many useful fields in your dataset, Random Decision Forests are a strong choice.

Click through for how boosted trees change this model a bit.

Comments closed