U-SQL Deprecation Notices

Michael Rys has a couple pieces of U-SQL syntax which will be deprecated.  First is partition by bucket:

In the upcoming refresh, we are removing the deprecated syntax PARTITION BY BUCKET and will raise an error.

Thus, if you have not yet updated your table definitions with the previously announced new syntax, please do so now or your scripts will fail starting some day in February!

The second post involves credentials:

Back in October, we announced that we simplified the U-SQL Credentials by merging the password secrets that are being created in Powershell and the other parts of the credential object into credentials that are being completely created with a Powershell command. This reduces one statement from the creation process.

During the initial phase, we did provide support for both kinds of credential objects, and still supported the old syntax.

In the upcoming February refresh, we are now automatically migrating the existing old credentials into the new format and remove the CREATE CREDENTIAL, ALTER CREDENTIAL and DROP CREDENTIAL statements.

If you’re writing U-SQL code, you’ll want to read up on the ramifications and alternatives here.

Principal Component Analysis Using R

Francisco Lima explains what principal component analysis is and shows how to do it in R:

Three lines of code and we see a clear separation among grape vine cultivars. In addition, the data points are evenly scattered over relatively narrow ranges in both PCs. We could next investigate which parameters contribute the most to this separation and how much variance is explained by each PC, but I will leave it for pcaMethods. We will now repeat the procedure after introducing an outlier in place of the 10th observation.

PCA is extremely useful when you have dozens of contributing factors, as it lets you narrow in on the big contributors quickly.

Project Normalization In UDFs

Dmitry Pilugin looks into how the optimizer (using the 2014-and-on cardinality estimator) processes user-defined functions:

If we remember, for the CE 120 it was a one row estimate, and in this case server decided, that it is cheaper to use a non-clustered index and then make a lookup into clustered. Not very effective if we remember that our predicate returns all rows.

In CE 130 there was a 365 rows estimate, which is too expensive for key lookup and server decided to make a clustered index scan.

But, wait, what we see is that in the second plan the estimate is also 1 row!

That fact seemed to me very curious and that’s why I’m writing this post. To find the answer, let’s look in more deep details at how the optimization process goes.

This was an interesting look at how the optimizer looks at scalar user-defined functions.

Querying A Named Instance From Powershell

Steve Jones explains how to query a named instance in Powershell:

I was looking at some sample code the other day and it looked like this.

cd sqlserver:\sql\localhost\default\databases

This allows you to browse the list of databases on your local instance. However, this is for a default instance, which I don’t have on this host. How can I get to a named instance? Usually I connect as .\SQL2016, so where does that fit in PowerShell?

Read on for the answer.

Why Logins Are Failing

Kenneth Fisher looks at various error log messages to show how to fix different login failures:

Error Displayed
Login failed for user ‘Kenneth_Test’. (Microsoft SQL Server, Error: 18456)

How do I fix it
Simple enough. Change the password. However, there are a few warnings here. If you change the password and it’s being used by someone (for example this is an application id) you may be breaking the application. If this is production, that could be a no-no. Just in case I like to back up the password hash before changing it.

I’ve also gone back to the well several times over the years with this blog post showing what the various error states mean when you get error 18456.

Getting Ahead Of Corruption

Mike Walsh has some recommendations before you have corrupt databases:

So. Always. Always. ALWAYS choose to be proactive and prepared. Don’t wait for corruption to catch you! When we do our SQL Server health assessments, seeing the findings that together mean you aren’t prepared for corruption is a huge red flag. Partially it is because as a consultant, I end up seeing corruption a lot – and it is always “after the fact” and usually from clients who either chose or, more likely didn’t realize they were choosing, the option with less preparation.

So this post won’t really talk about recovering from corruption. It will focus on prevention and preparedness. A follow on post will talk about some initial steps to do if you get a report of corruption.

If you already know how you’ll solve the problem (and ideally, have a step-by-step runbook so you don’t miss anything), corruption is more of an annoyance than a catastrophe.

Restoring A Database To A Different Location

Mike Fal shows how to restore a database to a different location using Powershell:

The cmdlet is straightforward in its use. Fundamentally, all we need to declare is an instance, database name, and backup file. However, if we don’t declare anything else, the cmdlet will try and restore the database files to their original locations. Keep in mind this is no different than how a normal RESTORE DATABASE command works.

This is where we make our lives easier with PowerShell. First off, to move files using Restore-SqlDatabase, we need to create a collection of RelocateFile objects. Don’t let the .Net-ness of this freak you out. All we’re doing is creating something that has the logical file name and the new physical file name. In other words, it’s just an abstraction of the MOVE statement in RESTORE DATABASE.

Read the whole thing.

Backing Up To Azure Storage

Neil Gelder shows how to back up directly to Azure blob storage:

The URL is the one from the container we made a note of and the credential is the one we created in the last step.

Now if we return to the container screen in the Azure Console and refresh the screen you’ll see your backup file like below

My personal preference here would be to back up locally and then have a job migrate backups to Azure or S3.  That storage is 1-3 cents per GB per month (and even cheaper if you’re willing to store the data in Glacier), so for more small to mid-sized organizations running databases in the tens of gigs, it’s a great way of getting around only being able to store a week or two worth of backups on-site.

Understanding DATEADD And DATEDIFF

Matan Yungman and Guy Glantser take a hack at DATEDIFF versus DATEADD for date calculations.  First up is Matan:

Pretty simple right?

Well, it is, and since this problem is pretty common, I used this solution in many performance tuning sessions I performed over the years.

There’s a slight problem though: This solution isn’t 100% accurate.

When carefully looking at the results, I find out that for the first query, I get 5859 rows, and for the second query, I get 5988 rows. Where does this difference come from?

Then, Guy gives his take on the problem:

I tested both queries on a sample table, which has millions of rows, and only around 500 rows in the last 90 days. The first query produced a table scan, while the second query produced an index seek. Of course, the execution time of the second query was much lower than the first query.

Both queries were supposed to return the orders in the last 90 days, but the first query returned 523 rows, and the second query returned 497 rows. So what’s going on?

The answer has to do with the way DATEDIFF works. This function returns the number of date parts (days, years, seconds, etc.) between two date & time values. It does that by first rounding down each one of the date & time values to the nearest date part value, and then counting the number of date parts between them.

They both start from the same base problem, but end up with slightly different formulations of a solution.

Connecting Azure Regions

Derik Hammer shows how to use a pair of virtual network gateways to connect two Azure regions:

Refer to Microsoft documentation for the difference between gateway types, VPN types, and SKUs. All of them will work but with different costs and benefits. For this demonstration, I am sticking with a basic VPN.

Check it out; this is a step-by-step guide and Derik makes it easy.


January 2017
« Dec Feb »