Press "Enter" to skip to content

Author: Kevin Feasel

The Benefits Of Polybase

I take a look at running a Hadoop query against a big(gish) data set:

Nearly 12 minutes doesn’t sound fantastic, but let’s remember that this is running on a single-node sandbox hosted on my laptop.  That’s hardly a fair setup for a distributed processing system.  Also, I have done nothing to optimize the files; I’m using compressed, comma-separated text files, have not partitioned the data in any meaningful way, and have taken the easy way out whenever possible.  This means that an optimized file structure running on a real cluster with powerful servers behind it could return the data set a lot faster…but for our purposes, that’s not very important.  I’m using the same hardware in all three cases, so in that sense this is a fair comp.

Despite my hemming and hawing, Polybase still performed as well as Hive and kicked sand in the linked server’s face.  I have several ideas for how to tune and want to continue down this track, showing various ways to optimize Polybase and Hive queries.

Comments closed

Tabular Image For Azure VMs

Mark Vaillancourt has a new Connect item:

Currently, when utilizing the SQL Server images in the VM Gallery in Azure, any installations of SQL Server Analysis Services default to Multidimensional. Thus, if you want SSAS Tabular, you have additional work to perform.

I was just chatting with a Senior Program Manager on the SQL Server Analysis Services product team. They currently don’t have anything in their plans for providing SQL Server Gallery Images with SSAS Tabular instead of Multidimensional. We agreed that it is a good idea for that to happen. We also agreed that a Connect suggestion would be a great way to gauge broader community support/appetite for providing Gallery images with Tabular installed.

Here’s the Connect item.

Comments closed

Casting Oracle Number To Numeric

Jon Morisi points out a rounding issue when casting Oracle’s Number data type to SQL Server’s Numeric:

Perhaps the 2014 SQL Server is implicitly converting to float, using the nearest even prior to the explicit cast to Numeric.  However, how the scale (number of decimal digits that will be stored to the right of the decimal point) would be determined in such a scenario is a conundrum.   Either way, although the mapping is defined the same, the behavior demonstrated between the two versions of SQL Server is inconsistent.
Research into ANSI and IEEE both boil down to truncation and/or rounding is implementation defined.

It’s an interesting issue.  Read on for more details.

Comments closed

Private Clouds

James Serra argues that virtualization does not by itself make for a private cloud:

Since virtualization only solves #3, a lot more should be done to create a private cloud.  Also, a cloud should also support Platform-as-a-service (PaaS) to allow for application innovation.  Fortunately there are products to add the other characteristics to give you a private cloud, such as Microsoft’s Azure Stack.  And of course you can always use a public cloud.

Read James’s post to get the full listing of what makes for a “cloud” offering.

Comments closed

Catalog Compare For Migration

Andy Leonard shows how to use Catalog Compare to migrate SSIS 2014 to 2016:

I recently tried to use the SSISDB Upgrade Wizard to upgrade a restored SSISDB (backed up in an earlier version) to SQL Server 2016. It didn’t go well.

I decided to use SSIS Catalog Compare to generate the scripts and ISPAC files from the previous instance, and deploy them to the SSIS 2016 Catalog.

“You Can Do That?”

Yes. Yes you can. Here’s how…

This is a paid product, but if you want to perform this upgrade, it sounds like a good tool for the job.

Comments closed

DATEFROMPARTS

Aaron Bertrand looks at the DATEFROMPARTS function in SQL Server 2012 and later:

The point of these functions is to make it easier to construct a date, or datetime, or datetime2 variable, when you know the individual parts. DATEFROMPARTS() takes three arguments: year, month, and day, and returns adate value. So, for example, SELECT DATEFROMPARTS(2016,7,6); would yield the date 2016-07-06.

Read on for a comparison of this function against about a dozen other methods of building dates from components.

Comments closed

Parallel Insertion

Sanjay Mishra and Arvind Shyamsundar show that you can use parallelism with the INSERT INTO [Table] SELECT [Values] construct:

Two important criteria must be met to allow parallel execution of an INSERT … SELECT statement.

  1. The database compatibility level must be 130. Execute “SELECT name, compatibility_level FROM sys.databases” to determine the compability level of your database, and if it is not 130, execute “ALTER DATABASE <MyDB> SET COMPATIBILITY_LEVEL = 130” to set it to 130. Changing the compatibility level of a database influences some behavior changes. You should test and ensure that your overall application works well with the new compatibility level.

  2. Must use the TABLOCK hint with the INSERT … SELECT statement. For example: INSERT INTO table_1 WITH (TABLOCK) SELECT * FROM table_2.

This is a limited use case, but it does sound very useful for large staging table loads or backfills when you can control table access.

Comments closed

Diagnosing Virtual Machine Cloning Issues

Jack Li walks through a few common problems when creating Azure VMs based off of captured images:

When you create VM from a captured image, the drive letters for data disks may not preserved.  For example if you have system database files on E: drive, it may get swapped to H: drive.  If this is the case, SQL Server can’t find system database files and will not start.  If the driver letter mismatch occurs on user database files, then the user databases will not recover.   After VM is created, you just need to go to disk management to change the drive letters to match your original configuration.

Read the whole thing if you’re thinking about copying your on-premise server to an Azure VM.

Comments closed

Comments As Documentation

Chris Webb shows that comments in the Advanced Editor become step property tooltips:

The June release of Power BI Desktop has what seems to be a fairly unremarkable new feature in that it allows you to add descriptions to each step in a query in the Query Editor window. However the implementation turns out to be a lot more interesting than you might expect: the step descriptions become comments in the M code, and even better if you write M code in the Advanced Editor window your comments appear as descriptions in the Applied Steps pane.

I think this is a smart move, although it does mean that you have to keep those comments up to date…

Comments closed

Building Extension Methods With Biml

Ben Weissman shows how to write extension methods in Biml:

An AstTableNode requires a schema to be valid, which is the only information that we can’t get from the AstFlatFileFormatNode so we’re defining a variable called UseSchema and pass it to our ToAstTableNode extension method.

But… how does that extension method work? MUCH easier than you might think.

Writing an extension method in C# isn’t tough either.

Comments closed