2018-07-24 – Curated SQL

With ADLA Task in Azure Feature Pack, you can now orchestrate and create U-SQL jobs as a part of the SSIS workflow to process big data in the cloud. As ADLA is a serverless analytics service, you don’t need to worry about cluster creation and initialization, all you need is an ADLA account to start your analytics.

You can get the U-SQL script from different places by using SSIS built-in functions. You can:

Edit the inline U-SQL script in ADLA Task to call table valued functions and stored procedures in your U-SQL databases.
Use the U-SQL files stored in ADLS or Azure Blob Storage by leveraging Azure Data Lake Store File System Task and Azure Blob Download Task.
Use the U-SQL files from local file directly using SSIS File Connection Manager.
Use an SSIS variable that contains the U-SQL statements. You can also use SSIS expression to generate the U-SQL statements dynamically.

Read on for more information and a link to download the pack.

Comments closed

Generating Basic Features From Text Data In R With textfeatures

Published 2018-07-24 by Kevin Feasel

Abdul Majed Raja demonstrates the textfeatures package in R:

Michael Kearney, Assistant Professor in University of Missouri, well known in the R community for the modern twitter package rtweet, has come up with a new R packaged called textfeatures that basically generates a bunch of features for any text data that you supply. Before you dream of Deep Learning based Package for Automated Text Feature Engineering, This isn’t that. This uses very simple Text Analysis principles and generates features like Number of Upper Case letters, Number of Punctuations – plain simple stuff and nothing fancy but pretty useful ones.

It’s a start for text analysis, though there’s a lot more after this.

Comments closed

SSIS: Target Server Version Leads To Script Task “Corruption”

Published 2018-07-24 by Kevin Feasel

Slava Murygin walks us through an ugly-looking error with an easy fix:

There are probably a lot of SSIS corruption errors, but that is one that is very easy to solve.
The whole error message is texted like this:

Script Task:Error: The Script Task is corrupted.

Script Task:Error: There were errors during task validation.

Script Task:Error: There was an exception while loading Script Task from XML: System.Exception: The Script Task “ST_74aca886806a416fa34ae89cac6237c2” uses version 15.0 script that is not supported in this release of Integration Services. To run the package, use the Script Task to create a new VSTA script. In most cases, scripts are converted automatically to use a supported version, when you open a SQL Server Integration Services package in %SQL_PRODUCT_SHORT_NAME% Integration Services. at Microsoft.SqlServer.Dts.Tasks.ScriptTask.ScriptTask.LoadFromXML(XmlElement elemProj, IDTSInfoEvents events)

That error came from an execution of SSIS package and it points that particular Script task is corrupted.
It is very confusing, because if you try to run the same package in the Visual Studio everything will be fine:

I’d consider the initial error that “The Script Task is corrupted.” to be a bad error message. The longer description is helpful and explains the problem, but the word “corruption” has a certain scary connotation to DBAs and tossing it around when you really mean to say “Fix your target version” is unhelpful.

Comments closed

Getting A Random Row From A Small Data Set

Published 2018-07-24 by Kevin Feasel

David Fowler has a quick script to get a random row from a relatively small table:

A couple of times recently I’ve seen the question asked, ‘How can I select a single row at random from a table?’.

There are often a few ways of doing this suggested, most seem to rely using CTEs or temp tables. I thought I’d share, in a quick post a very simple and easy way of doing it that I’ve used a couple of times.

The script David provides requires a table scan, so it doesn’t scale out very well. But depending on your hardware, that can still be pretty efficient into the tens of thousands of rows.

Comments closed

Using DBCC OPTIMIZER_WHATIF To Mimic Production Hardware

Published 2018-07-24 by Kevin Feasel

Max Vernon has a technique for mimicking production hardware layouts when testing queries in development:

Attempting to debug production performance problems in your development environment can be problematic in many ways, leading to a frustrating troubleshooting experience. One very common situation is the resources on the development environment are substantially less robust than on the production system; for instance prod has 128 GB of RAM, while dev only has 16 GB, prod has 16 cores, while dev only has 4 cores. Unintuitively, this disparity can result in queries running faster in development than in production.

SQL Server has a little-known (and undocumented and unsupported) troubleshooting-related DBCC command that can be used to mimic production resource levels in your development environment. As with all undocumented features, do not try this in production.

Read on to learn how DBCC OPTIMIZER_WHATIF can lead the optimizer to choose different plans. I almost never use this command, but it is helpful to have it in your back pocket.

Comments closed

Modifying Data In Temporal Tables

Published 2018-07-24 by Kevin Feasel

Jeanne Combrinck shows us how we can insert, update, and delete data in temporal tables:

You delete data in the current table with a regular DELETE statement. The end period column for deleted rows will be populated with the begin time of underlying transaction.
You cannot directly delete rows from history table while SYSTEM_VERSIONING = ON.

Set SYSTEM_VERSIONING = OFF and delete rows from current and history table but keep in mind that way system will not preserve history of changes. TRUNCATE, SWITCH PARTITION OUT of current table and SWITCH PARTITION IN history table are not supported while SYSTEM_VERSIONING = ON.

Data modification is reasonably straightforward with temporal tables. Read on for examples.

Comments closed

TANSTAAQRC (Query Result Cache)

Published 2018-07-24 by Kevin Feasel

Andy Mallon explains that a query result cache does not exist in SQL Server:

I was recently doing a training session when a developer commented that it was OK to run an expensive query twice because on the second execution, SQL Server would use the “results cache” and be “practically free”. It’s not the first time I’ve heard someone refer to a “results cache” in SQL Server. This is one of those myths that is almost true, which makes it that much more believable. If you don’t know better, you might think SQL Server has a “results cache” because the second execution of a query is often faster.

SQL Server does not have a “results cache” and the second execution is not “practically free.”
SQL Server does have a “buffer cache” and the second execution is “faster, but not free.”

The SQL Server buffer cache holds data pages in memory, in the exact form that they reside on disk. The second execution will not have to perform physical I/O operations to satisfy the query, because it can use the buffer cache. However, it does have to perform all other operations. Think of it like this: the second execution still executes the entire execution plan, including all the expensive operations. It is faster, but not “practically free.”

Read the comments for Erik Darling’s plot twist.

Comments closed

Tracking Latency To Azure With PsPing

Published 2018-07-24 by Kevin Feasel

Arun Sirpal shows us how to use PsPing (part of the Sysinternals tool set) to determine latency between your computer and a VM in an Azure data center:

This is the tool of choice when wanting to find out latency to your Azure SQL Server. In addition to standard ICMP ping functionality, it can report the latency of connecting to TCP ports, the latency of TCP round-trip communication.

I use this to find the latency from my location to various Azure SQL Servers which are in different Azure regions. I am based in the heart of England so let’s look and compare a couple of locations (just out of curiosity). Once you have downloaded the tool you will need to CD to the directory and call the following command.

Read on to see how to use PsPing.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: July 24, 2018

Using Azure Data Lake Analytics With Integration Services

Generating Basic Features From Text Data In R With textfeatures

SSIS: Target Server Version Leads To Script Task “Corruption”

Getting A Random Row From A Small Data Set

Using DBCC OPTIMIZER_WHATIF To Mimic Production Hardware

Modifying Data In Temporal Tables

TANSTAAQRC (Query Result Cache)

Tracking Latency To Azure With PsPing