Press "Enter" to skip to content

Curated SQL Posts

See The Pernicious Effects Of Your UDFs

Pedro Lopes announces an improvement to SQL Server execution plan results in 2017 CU3:

As I mentioned on yesterday’s post, with the recent release of SQL Server 2017 CU3, we released yet more showplan enhancements: you can see other posts related to showplan enhancements here.

In this article I’ll talk about the second showplan improvement we worked on, to assist in the discoverability of UDF usage impact on query execution.

The scenario is that if a query uses one or more user-defined scalar functions (such as T-SQL functions or CLR functions), a significant portion of query execution time may be spent inside those functions, depending on the function definition. And this may be not immediately obvious by looking at the query execution plan.

Recently, we added information on overall query CPU and elapsed time tracking for statistics showplan xml (both in ms), found in the root node of an actual plan (on which I blogged about here). We now added two new attributes: UdfCpuTime and UdfElapsedTime. These provide the total CPU and elapsed time (again, both in ms) that is spent inside all scalar user-defined functions, during the execution of a query.

I love it.  UDFs have historically been silent query killers, as the execution plan would gleefully think that the function call is practically free because it’d only show a single iteration.

Comments closed

When UNION ALL Can Beat OR

Bert Wagner compares a couple methods for writing a query:

Suddenly those key-lookups become too expensive for SQL Server and the query optimizer thinks it’ll be faster to just scan the entire clustered index.

In general this makes sense; SQL Server tries to pick plans that are good enough in most scenarios, and in general I think it chooses wisely.

However, sometimes SQL Server doesn’t pick great plans. Sometimes the plans it picks are downright terrible.

If that particular topic is interesting, I’ve a blog post from a few years back on a similar vein.

Comments closed

Data Discovery And Classification In SQL Server

Gilad Mittelman explains how the SQL Information Protection (aka Data Discovery and Classification) process works in SQL Server and Azure SQL Database:

SQL Information Protection (SQL IP) introduces a set of advanced services and new SQL capabilities, forming a new information protection paradigm in SQL aimed at protecting the data, not just the database:

  • Discovery & recommendations – The classification engine scans your database and identifies columns containing potentially sensitive data. It then provides you an easy way to review and apply the appropriate classification recommendations via the Azure portal.

  • Labeling – Sensitivity classification labels can be persistently tagged on columns using new classification metadata attributes introduced into the SQL Engine. This metadata can then be utilized for advanced sensitivity-based auditing and protection scenarios.

  • Monitoring/Auditing – Sensitivity of the query result set is calculated in real time and used for auditing access to sensitive data (currently in Azure SQL DB only).

  • Visibility – The database classification state can be viewed in a detailed dashboard in the portal. Additionally, you can download a report (in Excel format) to be used for compliance & auditing purposes, as well as other needs.

Check it out, especially with GDPR breathing down our necks.

Comments closed

SSMS 17.5 Released

Alan Yu announces SQL Server Management Studio 17.5:

SSMS 17.5 provides support for almost all feature areas on SQL Server 2008 through the latest SQL Server 2017, which is now generally available.

In addition to enhancements and bug fixes, SSMS 17.5 comes with several exciting new features:

  • Data Discovery and Classification is a new feature for discovering, classifying, labeling and reporting sensitive data in your databases.
  • Query Editor now has support for a SkipsRows option for the Delimited Text External File Format for Azure SQL Data Warehouse.
  • Showplan has enabled the display of estimated plan button for SQL Data Warehouse.

SSMS 17.5 also includes key bug fixes to Showplan, Query Editor, SMO, and Templates, which can be found in the Release Notes.

Aside from data discovery and classification, it’s probably not a major update for most people, but this does look like an interesting tool.

Comments closed

Visual Studio Code In Anaconda 5.1

George Leopold reports that Anaconda 5.1 will now include Visual Studio Code as an optional IDE:

Microsoft and Python data science platform vendor Anaconda have extended their partnership by adding the software giant’s code editor to the latest Anaconda distribution.

The addition of Microsoft’s Visual Studio Code (VS Code) expands its support for the latest release of the Python data science platform, Anaconda 5.1. The Python platform has attracted more than 4.5 million users running the programming language on Windows, Mac and Linux.

Along with editing and debugging features, the partners said the cross-platform code editor includes custom features for Anaconda users. For example, a Python extension customizes VS Code for the Python development environment.

Read on for more information.

Comments closed

Streaming ETL In Practice Using KSQL

Robin Moffatt builds an example of streaming ETL using Oracle, GoldenGate, and Kafka:

So in this post I’m going to show an example of what streaming ETL looks like in practice. I’m replacing batch extracts with event streams, and batch transformation with in-flight transformation of these event streams. We’ll take a stream of data from a transactional system built on Oracle, transform it, and stream it into Elasticsearch to land the results to, but your choice of datastore is up to you—with Kafka’s Connect API you can stream the data to almost anywhere! Using KSQL we’ll see how to filter streams of events in real-time from a database, how to join between events from two database tables, and how to create rolling aggregates on this data.

It’s a very useful example.

Comments closed

Automating HDF Cluster Deployment

Ali Bajwa has a how-to guide for automating HDF 3.1 cluster deployment on AWS:

The release of HDF 3.1 brings about a significant number of improvements in HDF: Apache Nifi 1.5, Kafka 1.0, plus the new NiFi registry. In addition, there were improvements to Storm, Streaming Analytics Manager, Schema Registry components. This article shows how you can use ambari-bootstrap project to easily generate a blueprint and deploy HDF clusters to both either single node or development/demo environments in 5 easy steps. To quickly setup a single node setup, a prebuilt AMI is available for AWS as well as a script that automates these steps, so you can deploy the cluster in a few commands.

Click through for the installation guide.

Comments closed

SSAS Query Analyzer

Chris Webb reviews Analysis Services Query Analyzer:

Last week a new, free tool for analysing the performance of MDX queries on SSAS Multidimensional was released: Analysis Services Query Analyzer. You can get all the details and download it here:

https://ssasqueryanalyzer.github.io/

…and here’s a post on LinkedIn by one of the authors, Francesco De Chirico, explaining why he decided to build it:

https://www.linkedin.com/pulse/asqa-10-released-francesco-de-chirico/

I’ve played around with it a bit and I’m very impressed – it’s a really sophisticated and powerful tool, and one I’m going to spend some time learning because I’m sure it will be very useful to me.

Read on for the rest of Chris’s review, including product screenshots.

Comments closed

Installing Jupyter Notebook Kernels

Nigel Meakins continues his Jupyter series by showing how to install various kernels:

Jupyter-Scala

This can be downloaded from here. Unzip and run the jupyter-scala.ps1 script on windows using elevated permissions in order to install.

The kernel files will end up in <UserProfileDir>\AppData\Roaming\jupyter\kernels\scala-develop and the kernel will appear in Jupyter with the default name of ‘Scala (develop)’. You can of course change this in the respective kernel.json file.

Click through to see how to install a few other kernels with various levels of configuration.

Comments closed

Changing Int To Bigint

Danny Kruge shows one way to change a table’s identity value from integer to bigint:

The table was around 500GB with over 900 million rows. Based on the average number of inserts a day on that table, I estimated that we had eight months before inserts on that table would grind to a halt. This was an order entry table, subject to round-the-clock inserts due to customer activity. Any downtime to make the conversion to BIGINT was going to have to be minimal.

This article describes how I planned and executed a change from an INT to a BIGINT data type, replicating the process I used in a step by step guide for the AdventureWorks database. The technique creates a new copy of the table, with a BIGINT datatype, on a separate SQL Server instance, then uses object level recovery to move it into the production database.

There’s a way to do this without any downtime, though the trigger logic gets a little more complex and it does take longer.

Comments closed