SQL Agent’s 5 Second Rule

Ewald Cress uncovers a change in the way the SQL Agent scheduler works in SQL Server 2016 compared to prior versions:

Upon completion of a job, the next run time is calculated based on the last scheduled time plus the schedule interval. However, allowance is made for the edge cases where the completed invocation overruns into the next start time. In such a situation, there isn’t a “catch-up” run; instead, the schedule is advanced iteratively until it reaches a future point in time.

However, 2016 introduces a new twist. When applying the “is the proposed next schedule time after Now()?” check, it adds five seconds to Now(). In other words, the question becomes “Is the proposed next schedule time more than five seconds in the future?”

Ewald jumps into the debugger to understand this better, so click through for that.

When To Add That Index

Kenneth Fisher shares his rules of thumb with regard to indexing:

Here are my general rules of thumb, although of course, you should always use your best judgment. Also, this is for OLTP systems (ex: data entry systems) not OLAP systems (ex: data warehouses).

  • No Clustered Index: You really should add a clustered index. Clustered indexes are important for a number of reasons, so if you don’t have one spend some time, figure one out and create it. When possible I like dates as a clustered index but of course, it’s highly specific to the table you are working on. There are a few types of tables, some load tables, for example, were not having a clustered index is appropriate but they are few and far between.

Indexes are like cookies:  you can’t have too many.  No, wait, that doesn’t sound right…

Learn SQL Server Security Via E-mails

Chris Bell has announced a free e-mail course for learning the basics of SQL Server security:

Today I am very excited to announce that I have (finally!) launched my email course covering the basics of SQL Server Security.

This has been a lot of work to get a new system in place to make the learning experience a little different. It is like a normal email course, but at the same time it isn’t.

I have been waiting for this for months ever since hearing Chris first talk about it.

Trying Out Interleaved Execution

Dmitry Pilugin has a test of interleaved execution of a multi-statement table-valued function:

First of all, note that Estimated Number of Rows is 10 000 now, which is correct and equals Actual Number of Rows. Due to the correct estimate the optimizer decided that there are enough rows to benefit from a partial (local/global) aggregation and introduced a partial Hash aggregate before the join.

If you take a Profiler, enable events SP:StmtStarting, SP:StmtCompleted, SQL:StmtStarting, SQL:StmtCompleted and run the query without and with a TF, you’ll see what does it actually mean “interleaved” in terms of the execution sequence.

During the regular execution the query starts executing, then the function is executed, the query continues execution and finishes the execution. We see the following event sequence:

This is a very thorough post, but if you have multi-statement TVFs, you absolutely want to read it.

Whither CLR?

Joey D’Antoni is shaking his head about a CLR announcement:

With this is mind, Microsoft has made some big changes to CLR in SQL Server 2017. SQL CLR has always been an interesting area of the engine—it allows for the use of .NET code in stored procedures and user defined types. For certain tasks , it’s an extremely powerful tool—things like RegEx and geo functions can be much faster in native CLR than trying to do the equivalent operation in T-SQL. It’s always been a little bit of a security risk, since under certain configurations, CLR had access to resources outside of the context of the database engine. This was protected by boundaries defined in the CLR host policy. We had SAFE, EXTERNAL_ACCESS, and UNSAFE levels that we could set. SAFE simply limited access of the assembly to internal computation and local data access. For the purposes of this post, we will skip UNSAFE and EXTERNAL_ACCESS, but it is sufficed to say, these levels allow much deeper access to the rest of the server.

Code Access Security in .NET (which is used to managed these levels) has been marked obsolete. What does this mean? The boundaries that are marked SAFE, may not be guaranteed to provide security. So “SAFE” CLR may be able to access external resources, call unmanaged code, and acquire sysadmin privileges. This is really bad.

It’s not the end of the world for CLR, but this is a breaking change.  Read on for more details.

Feature Improvements In Microsoft R Server 9.1

David Smith gives us a nice roundup of feature improvements in Microsoft R Server 9.1:

Interoperability between Microsoft R Server and sparklyr. You can now use RStudio’s sparklyr package in tandem with Microsoft R Server in a single Spark session

New machine learning models in Hadoop and Spark. The new machine learning functions introduced with Version 9.0 (such as FastRank gradient-boosted trees and GPU-accelerated deep neural networks) are now available in the Hadoop and Spark contexts in addition to standalone servers and within SQL Server.

I have been looking forward to these.

Data Amp Reactions

Forthwith some Data Amp reactions.  Steve Hughes hits some of the highlights:

Migration Project for Azure SQL DB

Whether you have SQL Server, Oracle, or MySQL, you should be able to migrate your database to Azure SQL DB in “five simple steps”. While a great tool, I am interested in exploring this more with Oracle in particular. You can create a project in Azure that let’s you choose the source database and platform and target a Azure SQL DB then move the schema and load the database. While I am skeptical on the full capability of this solution, I look forward to exploring it more.

Garland MacNeill uses the announcement to think about how the role of the DBA is changing:

Which brings me to my ultimate realization after today’s presentation. Career wise, DBAs have to evolve. I know it’s been said a thousand times, but the days of running backups, granting permissions, and the other daily dba tasks are riding off into the sunset for all but those who work in data centers such as Azure, Google, or AWS.

That means for the rest of the DBAs to continue to be employed as data professionals they have to learn new skills. Unless I am missing something, that means focusing on development skills. Not just SQL either. They are going to have to know things like R, Python, PowerShell, and probably some C#, including .NET framework. That might not be the only thing, as Thomas LaRock (b/t) thinks there might be other areas to invest in as well.

And Victoria Holt has a link-filled wrapup:

SQL Graph

Storing and analyzing graph data relationships. This includes full CRUD support to create nodes and edges and T-SQL query language extensions to provide multi-hop navigation using join-free pattern matching.  The SQL Server engine integration enables querying across SQL tables and graph data.

Good reading all.

Adaptive Query Processing In CTP 2.0

Joe Sack has a couple blog posts on adaptive query processing enhancements in SQL Server 2017 CTP 2.0.  First, Batch Mode Adaptive Joins:

We have seen numerous cases where providing a specific join hint solved query performance issues for our customers.  However, the drawback of adding a hint is that we remove join algorithm decisions from the optimizer for that statement. While fixing a short-term issue, the hard-coded hint may not be the optimal decision as data distributions shift over time.

Another scenario is where we do not know up front what the optimal join should be, for example, with a parameter sensitive query where a low or high number of rows may flow through the plan based on the actual parameter value.

With these scenarios in mind, the Query Processing team introduced the ability to sense a bad join choice in a plan and then dynamically switch to a better join strategy during execution.

That one’s really cool.  Joe also talks about interleaved execution for multi-statement TVFs:

SQL Server has historically used a unidirectional “pipeline” for optimizing and executing queries.  During optimization, the cardinality estimation process is responsible for providing row count estimates for operators in order to derive estimated costs.  The estimated costs help determine which plan gets selected for use in execution.  If cardinality estimates are incorrect, we will still end up using the original plan despite the poor original assumptions.

Interleaved execution changes the unidirectional boundary between the optimization and execution phases for a single-query execution and enables plans to adapt based on the revised estimates. During optimization if we encounter a candidate for interleaved execution, which for this first version will be multi-statement table valued functions (MSTVFs), we will pause optimization, execute the applicable subtree, capture accurate cardinality estimates and then resume optimization for downstream operations.

The goal here is to make Table-Valued Functions viable from a performance perspective.  The team has a long way to go there, but they’re working on it.  Also, Joe gives a shout out to Arun Sirpal, frequent curatee.

SQL Server ML Services

SQL Server R Services is now SQL Server Machine Learning Services and supports Python.  First, Nagesh Pabbisetty and Sumit Kumar talk about Python support:

The addition of Python builds on the foundation laid for R Services in SQL Server 2016 and extends that mechanism to include Python support for in-database analytics and machine learning. We are renaming R Services to Machine Learning Services, and R and Python are two options under this feature.

The Python integration in SQL Server provides several advantages:

  • Elimination of data movement: You no longer need to move data from the database to your Python application or model. Instead, you can build Python applications in the database. This eliminates barriers of security, compliance, governance, integrity, and a host of similar issues related to moving vast amounts of data around. This new capability brings Python to the data and runs code inside secure SQL Server using the proven extensibility mechanism built in SQL Server 2016.

  • Easy deployment: Once you have the Python model ready, deploying it in production is now as easy as embedding it in a T-SQL script, and then any SQL client application can take advantage of Python-based models and intelligence by a simple stored procedure call.

  • Enterprise-grade performance and scale: You can use SQL Server’s advanced capabilities like in-memory table and column store indexes with the high-performance scalable APIs in RevoScalePy package. RevoScalePy is modeled after RevoScaleR package in SQL Server R Services. Using these with the latest innovations in the open source Python world allows you to bring unparalleled selection, performance, and scale to your SQL Python applications.

  • Rich extensibility: You can install and run any of the latest open source Python packages in SQL Server to build deep learning and AI applications on huge amounts of data in SQL Server. Installing a Python package in SQL Server is as simple as installing a Python package on your local machine.

  • Wide availability at no additional costs: Python integration is available in all editions of SQL Server 2017, including the Express edition.

Nagesh Pabbisetty also announces Microsoft R Server 9.1:

We took the first step with Microsoft R Server 9.0, and this follow on release includes significant innovations such as:

  • New machine learning enhancements and inclusion of pre-trained cognitive models such as sentiment analysis & image featurizers

  • SQL Server Machine Learning Services with integrated Python in Preview

  • Enterprise grade operationalization with real-time scoring and dynamic scaling of VMs

  • Deep customer & ISV partnerships to deliver the right solutions to customers

  • A panoply of sources to help you get started with ease

And Joseph Sirosh indicates that AI is where the money is:

So today it’s my pleasure to announce the first RDBMS with built-in AIa production-quality Community Technology Preview (CTP 2.0) of SQL Server 2017. In this preview release, we are introducing in-database support for a rich library of machine learning functions, and now for the first time Python support (in addition to R). SQL Server can also leverage NVIDIA GPU-accelerated computing through the Python/R interface to power even the most intensive deep-learning jobs on images, text, and other unstructured data. Developers can implement NVIDIA GPU-accelerated analytics and very sophisticated AI directly in the database server as stored procedures and gain orders of magnitude higher throughput. In addition, developers can use all the rich features of the database management system for concurrency, high-availability, encryption, security, and compliance to build and deploy robust enterprise-grade AI applications.

There’s a lot to digest here.

SQL Server 2017 CTP 2.0

The SQL Server team announces CTP 2.0 of SQL Server 2017:

Microsoft is excited to announce a new preview for the next version of SQL Server!  We disclosed a name for this next release, SQL Server 2017, today at the Microsoft Data Amp event. Community Technology Preview (CTP) 2.0 is the first production-quality preview of SQL Server 2017, and it is available on both Windows and Linux.  In this preview, we added a number of new capabilities, including the ability to run advanced analytics using Python in a parallelized and highly scalable way, the ability to store and analyze graph data, and other capabilities that help you manage SQL Server for high performance and uptime, including the Adaptive Query Processing family of intelligent database features and resumable online indexing.

I can finally call it “SQL Server 2017” instead of “SQL Server vNext.”  I don’t know why there was such a hubbub about the name 2017, but there you go.  Anyhow, I’ve grabbed the CTP and am raring to go.


April 2017
« Mar