Press "Enter" to skip to content

Author: Kevin Feasel

Machine Learning Services Updates

Umachandar Jayachandran and team have been busy.  First, they announced a preview of SQL Server ML Services in Azure SQL Database:

In-database Machine Learning support was added in SQL Server 2016 and we are now bringing the same functionality to Azure SQL Database. You can now train and score machine learning models in Azure SQL Database and the predictions can be exposed to any application using your database, easily and seamlessly.

The preview functionality allows you to train and score machine learning models using data that fits in memory (in R data frame). Please note that the amount of memory available for R scripts execution depends on the edition of the Azure SQL database and cannot be modified.

No Python support there yet, but it’s upcoming.  Second, we can use the PREDICT function in Azure SQL Database:

Today we are announcing the general availability of the native PREDICT Transact-SQL function in Azure SQL Database. The PREDICT function allows you to perform scoring in real-time using certain RevoScaleR or revoscalepy models in a SQL query without invoking the R or Python runtime.

The PREDICT function support was added in SQL Server 2017. It is a table-valued function that takes a RevoScaleR or revoscalepy model & data (in the form of a table or view or query) as inputs and generates predictions based on the machine learning model. More details of the PREDICT function can be found here.

There are a limited number of models which support PREDICT—things like linear and logistic regression, RevoScaleR’s fast decision trees, etc.  If you have this type of model, however, the predictions stay within SQL Server and end up being much faster than going out to R.

Comments closed

Forced Plans And Query Text

Erin Stellato warns us about changing objects with which we use Query Store’s forced plans:

There are different query_id values in sys.query_store_query, as well as different entries for each query in sys.query_store_query_text. Both queries are still tied to the stored procedure (object_id 1483152329) which can be confusing if you don’t understand what occurred. In order to confirm which query has executed most recently, you can use the query below, which joins both system views (replace the query_ids as appropriate)

This is a good thing to keep in mind if you perform formatting changes on procedures.

Comments closed

Messing With Views

Daniel Janik shows what happens when you add a column to the middle of a table while a SELECT * view exists on that table:

Let’s say you get a request or maybe the developer adds a column in the middle of the table. What happens to the view? Was it created with SELECT *? Could the title of this post just as easily have been “Don’t SELECT * ever again!”? Sure…

This is exactly what had happened. The table was altered and the view didn’t change. The view was actually throwing a date from string conversion error.

Let’s take a peek at the after math of adding a column without rebuilding the view.

To Daniel’s two take-aways I would add a third:  don’t use the GUI to insert columns in the middle of a table.  The order of columns in a relational table is ultimately irrelevant, so add new columns at the end.  That avoids this problem altogether.

Comments closed

CosmosDB Via Linked Server

Rolf Tesmer shows us how to connect to Azure CosmosDB using a linked server:

Recently I had a requirement to combine data that I already had in SQL Server (2016)with JSON document data already stored in Azure CosmosDB.  Both databases were operational and continuously accepting data so I didn’t want to go to the trouble of doing the delta load thing between them, instead I just wanted to be able to query directly on demand.

And so – the purpose of this article is to outline the method to connect direct to Azure CosmosDB from SQL Server using a SQL Linked Server.

Click through for the step-by-step details.  Ultimately, it’s a linked server connecting via ODBC, so nothing magical—but it is nice to see interoperability.

Comments closed

dplyr Mutate Quirks

John Mount explains a quirk in dplyr’s mutate function:

It is hard for experts to understand how frustrating the above is to a new R user or to a part time R user. It feels like any variation on the original code causes it to fail. None of the rules they have been taught anticipate this, or tell them how to get out of this situation.

This quickly leads to strong feelings of learned helplessness and anxiety.

Our rule for dplyr::mutate() has been for some time:

Each column name used in a single mutate must appear only on the left-hand-side of a single assignment, or otherwise on the right-hand-side of any number of assignments (but never both sides, even if it is different assignments).

If you do data analysis with R, you’ve probably run into this before.  I certainly have, and it’s nice to understand why this is the case.

Comments closed

Distributed Database Writes

James Serra provides a number of options around distributed writes:

In SQL Server, scaling out reads (i.e. using Active secondary replicas via AlwaysOn Availability Groups) is a lot easier than scaling out writes.  So what are your options when you have a tremendous amount of writes that scaling up will not handle, no matter how big your server is?  There are a number of options that allow you to write to many servers (instead of writing to one master server) that I’ll call distributed writes.  Here are some ideas:

Read on for more options and some additional thoughts around Cosmos DB.  My first inclination would be to put Kafka in front of a distributed write system, but that’s my bias.

Comments closed

Instant File Initialization On Linux

Anthony Nocentino explains how instant file initialization works on SQL Server on Linux:

With strace up and running let’s turn on the trace flags to enable output for Instant File Initialization and create database that has a 100MB data file and a 100MB log file. Check out this post from Microsoft for more details on the trace flags. This database create code is straight from their post. I changed the model database’s data and log file sizes to 100MB each. Also, it’s important to note Instance File Initialization is only for data files, log files are zeroed out due to requirements for crash recovery. We’re going to see that in action in a bit…

Read the whole thing.

Comments closed

What Prevents Columnstore Row Group Merge

Niko Neugebauer shows two major scenarios in which the Row Group Merge operator will not work as expected, leaving you with an unexpectedly large number of row groups:

Let’s take a look at the most evil and uncorrectable pressure within SQL Server for the Columnstore Indexes – the Dictionary Pressure. For any final user without a bit of the internal knowledge of the Row Group sizes, Dictionaries & Pressures, it is always a huge surprise to find out that they can’t have those perfectly sized Row Groups with 1.048.576 rows.

Let us set up a table producing a good Dictionary Pressure and load 1.1 Million Rows into it, expecting to get just 2 Row Groups, while perfectly being aware that the result shall be quite different:

Niko’s dictionary pressure example is a good reason not to include textual columns on columnstore indexes.

Comments closed

Automated Edition Downgrade For SQL Server

Jana Sattainathan shows how to automate downgrade of SQL Server Enterprise to Standard:

Standard edition is limited to lesser of 4 sockets or 24 cores with a maximum memory of 128 GB plus a few truly Enterprise level features like Compression, Availability Groups, Partitioning etc are off limits. I would say most places would fall under this threshold for “Standard” but feel inferior to say they run “Standard”! I don’t, especially when money matters.

But, all kidding aside, most shops don’t even realize that they do not use any Enterprise features on 90% of their instances but pay Enterprise price anyway! If you don’t trust me, go check for yourself at your place – we did, on hundred’s of SQL Server instances! I painfully built the infrastructure to do this type of thing using PowerShell in seconds  if not a few minutes, for scanning hundreds of servers/instances.

There’s a lot here, so if you’re thinking about downgrading in a post-2016 SP1 world, Jana’s post is a must-read.  But even with the new features, there are still quite a few enterprise-level features that make it so I don’t want to live without Enterprise Edition.

Comments closed