Press "Enter" to skip to content

Month: July 2018

The Value Of CROSS APPLY

Kendra Little talks about one of my favorite T-SQL operators:

Here’s my top 3 favorite uses for CROSS APPLY and OUTER APPLY:

  1. APPLY is fantastic for calling table valued functions. I didn’t include questions about those in the quiz, simply for the purposes of keeping the code simple, and because I wanted the quiz to be about thinking through how apply works — but it’s still the #1 use.

  2. Another thing I might use it for is when a query needs a correlated subquery — somewhat like an inline function.

  3. And I also like it for queries that have a calculation that needs to be done and which is referenced in multiple columns in the query, or perhaps also in a predicate and the select. You can perform the computation once in the apply and then reference it multiple times. That way if you have to change the formula later on, you only have to change it in once place, plus I find it’s easier to read in some cases.

Item number three is something I learned from an older blog post Kendra wrote and regularly use APPLY for this purpose.

Comments closed

Using Azure Data Lake Analytics With Integration Services

Yanan Cai announces that Azure Data Lake Analytics has a new task in the Azure Feature Pack for SQL Server Integration Services:

With ADLA Task in Azure Feature Pack, you can now orchestrate and create U-SQL jobs as a part of the SSIS workflow to process big data in the cloud. As ADLA is a serverless analytics service, you don’t need to worry about cluster creation and initialization, all you need is an ADLA account to start your analytics.

You can get the U-SQL script from different places by using SSIS built-in functions. You can:

  • Edit the inline U-SQL script in ADLA Task to call table valued functions and stored procedures in your U-SQL databases.

  • Use the U-SQL files stored in ADLS or Azure Blob Storage by leveraging Azure Data Lake Store File System Task and Azure Blob Download Task.

  • Use the U-SQL files from local file directly using SSIS File Connection Manager.

  • Use an SSIS variable that contains the U-SQL statements. You can also use SSIS expression to generate the U-SQL statements dynamically.

Read on for more information and a link to download the pack.

Comments closed

Generating Basic Features From Text Data In R With textfeatures

Abdul Majed Raja demonstrates the textfeatures package in R:

Michael Kearney, Assistant Professor in University of Missouri, well known in the R community for the modern twitter package rtweet, has come up with a new R packaged called textfeatures that basically generates a bunch of features for any text data that you supply. Before you dream of Deep Learning based Package for Automated Text Feature Engineering, This isn’t that. This uses very simple Text Analysis principles and generates features like Number of Upper Case letters, Number of Punctuations – plain simple stuff and nothing fancy but pretty useful ones.

It’s a start for text analysis, though there’s a lot more after this.

Comments closed

SSIS: Target Server Version Leads To Script Task “Corruption”

Slava Murygin walks us through an ugly-looking error with an easy fix:

There are probably a lot of SSIS corruption errors, but that is one that is very easy to solve.
The whole error message is texted like this:

Script Task:Error: The Script Task is corrupted.
Script Task:Error: There were errors during task validation.
Script Task:Error: There was an exception while loading Script Task from XML: System.Exception: The Script Task “ST_74aca886806a416fa34ae89cac6237c2” uses version 15.0 script that is not supported in this release of Integration Services. To run the package, use the Script Task to create a new VSTA script. In most cases, scripts are converted automatically to use a supported version, when you open a SQL Server Integration Services package in %SQL_PRODUCT_SHORT_NAME% Integration Services. at Microsoft.SqlServer.Dts.Tasks.ScriptTask.ScriptTask.LoadFromXML(XmlElement elemProj, IDTSInfoEvents events)

That error came from an execution of SSIS package and it points that particular Script task is corrupted.
It is very confusing, because if you try to run the same package in the Visual Studio everything will be fine:

I’d consider the initial error that “The Script Task is corrupted.” to be a bad error message.  The longer description is helpful and explains the problem, but the word “corruption” has a certain scary connotation to DBAs and tossing it around when you really mean to say “Fix your target version” is unhelpful.

Comments closed

Getting A Random Row From A Small Data Set

David Fowler has a quick script to get a random row from a relatively small table:

A couple of times recently I’ve seen the question asked, ‘How can I select a single row at random from a table?’.

There are often a few ways of doing this suggested, most seem to rely using CTEs or temp tables.  I thought I’d share, in a quick post a very simple and easy way of doing it that I’ve used a couple of times.

The script David provides requires a table scan, so it doesn’t scale out very well.  But depending on your hardware, that can still be pretty efficient into the tens of thousands of rows.

Comments closed

Using DBCC OPTIMIZER_WHATIF To Mimic Production Hardware

Max Vernon has a technique for mimicking production hardware layouts when testing queries in development:

Attempting to debug production performance problems in your development environment can be problematic in many ways, leading to a frustrating troubleshooting experience. One very common situation is the resources on the development environment are substantially less robust than on the production system; for instance prod has 128 GB of RAM, while dev only has 16 GB, prod has 16 cores, while dev only has 4 cores. Unintuitively, this disparity can result in queries running faster in development than in production.

SQL Server has a little-known (and undocumented and unsupported) troubleshooting-related DBCC command that can be used to mimic production resource levels in your development environment. As with all undocumented features, do not try this in production.

Read on to learn how DBCC OPTIMIZER_WHATIF can lead the optimizer to choose different plans.  I almost never use this command, but it is helpful to have it in your back pocket.

Comments closed

Modifying Data In Temporal Tables

Jeanne Combrinck shows us how we can insert, update, and delete data in temporal tables:

You delete data in the current table with a regular DELETE statement. The end period column for deleted rows will be populated with the begin time of underlying transaction.
You cannot directly delete rows from history table while SYSTEM_VERSIONING = ON.

Set SYSTEM_VERSIONING = OFF and delete rows from current and history table but keep in mind that way system will not preserve history of changes. TRUNCATE, SWITCH PARTITION OUT of current table and SWITCH PARTITION IN history table are not supported while SYSTEM_VERSIONING = ON.

Data modification is reasonably straightforward with temporal tables.  Read on for examples.

Comments closed

TANSTAAQRC (Query Result Cache)

Andy Mallon explains that a query result cache does not exist in SQL Server:

I was recently doing a training session when a developer commented that it was OK to run an expensive query twice because on the second execution, SQL Server would use the “results cache” and be “practically free”. It’s not the first time I’ve heard someone refer to a “results cache” in SQL Server. This is one of those myths that is almost true, which makes it that much more believable. If you don’t know better, you might think SQL Server has a “results cache” because the second execution of a query is often faster.

SQL Server does not have a “results cache” and the second execution is not “practically free.”
SQL Server does have a “buffer cache” and the second execution is “faster, but not free.”

The SQL Server buffer cache holds data pages in memory, in the exact form that they reside on disk. The second execution will not have to perform physical I/O operations to satisfy the query, because it can use the buffer cache. However, it does have to perform all other operations. Think of it like this: the second execution still executes the entire execution plan, including all the expensive operations. It is faster, but not “practically free.”

Read the comments for Erik Darling’s plot twist.

Comments closed

Tracking Latency To Azure With PsPing

Arun Sirpal shows us how to use PsPing (part of the Sysinternals tool set) to determine latency between your computer and a VM in an Azure data center:

This is the tool of choice when wanting to find out latency to your Azure SQL Server. In addition to standard ICMP ping functionality, it can report the latency of connecting to TCP ports, the latency of TCP round-trip communication.

I use this to find the latency from my location to various Azure SQL Servers which are in different Azure regions. I am based in the heart of England so let’s look and compare a couple of locations (just out of curiosity). Once you have downloaded the tool you will need to CD to the directory and call the following command.

Read on to see how to use PsPing.

Comments closed

Real-Time Data Visualization With R And SQL Server

Tomaz Kastrun shows how simple it can be to plot real(ish)-time data from SQL Server using R:

In the previous post, I have showed how to visualize near real-time data using Python and Dash module.  And it is time to see one of the many ways, how to do it in R. This time, I will not use any additional frames for visualization, like shiny, plotly or any others others, but will simply use base R functions and RODBC package to extract data from SQL Server.

Extracting data from SQL Server will and simulating inserts in SQL Server table will primarily simulate the near real-time data. If you have followed the previous post, you will notice that I am using same T-SQL table and query to extract real-time data.

Tomaz is using the base plot library, but if you want something nicer, there are several good alternatives.

Comments closed