Press "Enter" to skip to content

Curated SQL Posts

Query Store And Plan Cache Flushes

Erin Stellato explains why enabling Query Store requires the plan cache be flushed:

I’ve had two comments recently on my blog about Query Store causing the plan cache to be flushed. There was a known issue related to the plan cache flushing after Query Store was enabled, but this was fixed in CU2 for SQL Server 2016 SP1. So I did some testing and here is what I think is causing the confusion:

When you enable Query Store, which is done with an ALTER DATABASE SET statement, the plan cache for the database is flushed.

Now, before anyone writes up a UserVoice item, understand that there are several ALTER DATABASE SET commands that cause the plan cache for a database to be flushed. For example, taking a database OFFLINE causes the database plan cache to be flushed. That one seems intuitive, right?  So why is the plan cache cleared when you enable Query Store, or change one of the settings?  To ensure that new Query Store data is not lost.  This relates to the internals of how Query Store works, which aren’t essential to dig into, the point is that this behavior is known by Microsoft and expected.

There is what I’d consider a documentation bug around describing what happens when you enable Query Store, but the fact that the plan cache gets cleared makes sense.

Comments closed

wrapr 1.5.0 Now On CRAN

John Mount announces wrapr 1.5.0:

wrapr includes a lot of tools for writing better R code:

John also includes an example using the coalesce operator %?%.

Comments closed

Using The Azure Data Science VM With GPUs

Jennifer Marsman has some tips and tricks around using the Azure Data Science Virtual Machine on an instance running with GPU support:

To get GPU support, you need both hardware with GPUs in a datacenter, as well as the right software – namely, a virtual machine image that includes GPU drivers so you can use the GPU.

The biggest tip is to use the Deep Learning Virtual Machine!  The provisioning experience has been optimized to filter to the options that support GPU (the NC series – see below), which make it easier to set it up correctly.

Read on for the rest of the advice.

Comments closed

Running Hive LLAP As A YARN Service

Gour Saha, et al, demonstrate running Apache Hive LLAP as a YARN service:

Making LLAP as a first-class YARN service also enables us to use some of the other powerful features in YARN that were added in Apache Hadoop 3.0 / 3.1, some of them are noted below.

  1. Advanced container placement scheduling such as affinity and anti-affinity. What Slider used to handle in a custom way is now a core first-class feature (YARN-6592).

  2. Rich APIs for users to fetch/query application details using Timeline Service V2 (YARN-2928 and YARN-5355).

  3. New and improved Services UI in YARN UI2 improving debuggability and log access.

  4. Continuous rolling log aggregation of long running containers (YARN-2443).

  5. Auto-restart of containers by NodeManagers (YARN-4725).

  6. Windowing and threshold based container health monitor (YARN-8122).

  7. In the future, we can also leverage YARN level rolling upgrades for containers and the service as a whole (YARN-7512 and YARN-4726).

Looks like it’s been a fruitful transition.

Comments closed

Preventing Server Manager From Loading

Steve Stedman shows how to prevent the Server Manager app from loading whenever you RDP into a Windows Server machine:

If you frequently connect to many different SQL Server as I do, you are probably used to the Server Manager loading slowly when you log in with Remote Desktop.

The Server Manager has a bad reputation for taking up lots of CPU over time and possibly even bogging down a SQL Server when left open for days on end.

To prevent this from automatically loading you can do the following to quickly disable it for your user session, and your future user sessions.

Read on for the steps.

Comments closed

Auditing Options With Azure SQL Data Warehouse

Janusz Rokicki explores what is available in Azure SQL Data Warehouse when it comes to auditing:

Auditing is disabled by default and the UI experience depends on the region to which the logical server is deployed. For instance, in UK South, the portal offers no options to manage auditing:

In North Europe, the portal allows Table Auditing (table-storage based) to be enabled on the SQL Data Warehouse scope, but it isn’t possible to enable Blob Auditing:

On top of that, Blob Auditing behaves differently when enabled on a logical server level in different regions. In locations that support Table Auditing, turning on Blob Auditing automatically enables it in all databases, including SQL Data Warehouses—and that’s expected. In other regions, Blob Auditing is not automatically enabled and has to be turned on programmatically by calling ARM REST API.

I imagine the plan is to support this across the board but it’s rolling out region by region.

Comments closed

User-Defined Restore Points In Azure SQL DW

Kevin Ngo announces a new feature in Azure SQL Data Warehouse:

Previously, SQL DW supported only automated snapshots guaranteeing an eight-hour recovery point objective (RPO). While this snapshot policy provided high levels of protection, customers asked for more control over restore points to enable more efficient data warehouse management capabilities leading to quicker times of recovery in the event of any workload interruptions or user errors.

Now, with user-defined restore points, in addition to the automated snapshots, you can initiate snapshots before and after significant operations on your data warehouse. With more granular restore points, you ensure that each restore point is logically consistent and limit the impact and reduce recovery time of restoring the data warehouse should this be needed. User-defined restore points can also be labeled so they are easy to identify afterwards.

Creating a user-defined restore point is a one-liner in Powershell, and it’s something you could do after each warehouse load, for example.

Comments closed

Identity Columns And Linked Servers

Kenneth Fisher points out an oddity when inserting data across a linked server into a table with an identity column:

So far so good. Now let’s throw in a twist. Let’s call it through a linked server.

INSERT INTO [(local)\sql2014cs].Test.dbo.IdentTest 
	VALUES ('Col1','Col2');

Msg 213, Level 16, State 1, Line 4
Column name or number of supplied values does not match table definition.

Well that’s a bit odd, right? I mean I used that exact command in the previous test. Turns out that when you do an insert across a linked server that identity column is not ignored. Which means we just need to include the identity value right? Nope.

INSERT INTO [(local)\sql2014cs].Test.dbo.IdentTest 
	VALUES (1,'Col1','Col2');

Msg 7344, Level 16, State 1, Line 4
The OLE DB provider “SQLNCLI11” for linked server “(local)\sql2014cs” could not INSERT INTO table “[(local)\sql2014cs].[Test].[dbo].[IdentTest]” because of column “Id”. The user did not have permission to write to the column.

Click through to see how to do this.

Comments closed

Visualizing Model Input Effects

Ilknur Kaynar Kabul shows us how to use partial dependence plots and individual conditional expectation plots to view the specific effect of an input variable on a model:

A partial dependence (PD) plot depicts the functional relationship between a small number of input variables and predictions. They show how the predictions partially depend on values of the input variables of interest.  For example, a PD plot can show whether the probability of flu increases linearly with fever. It can show whether high energy level will decrease the probability of having flu. PD can also show the type of relationship, such as a step function, curvilinear, linear and so on.

The simplest PD plots are 1-way plots, which show how a model’s predictions depend on a single input. The plot below shows the relationship (according the model that we trained) between price (target) and number of bathrooms. Here, we see that house prices increase as we increase the number of bathroom up to 4. After that it does not change the house price.

These types of plots are helpful for understanding the mechanics behind a model.

Comments closed

Methods For Detecting Anomalies In Business Metrics

Sergey Bryl’ gives us four methods for detecting anomalies in business data:

In this article, by  business metrics, we mean numerical indicators we regularly measure and use to track and assess the performance of a specific business process. There is a huge variety of business metrics in the industry: from conventional to unique ones. The latter are specifically developed for and used in one company or even just by one of its teams. I want to note that usually, a business metrics have dimensions, which imply the possibility of drilling down the structure of the metric. For instance, the number of sessions on the website can have dimensions: types of browsers, channels, countries, advertising campaigns, etc. where the sessions took place. The presence of a large number of dimensions per metric, on the one hand, provides a comprehensive detailed analysis, and, on the other, makes its conduct more complex.

Anomalies are abnormal values of business indicators. We cannot claim anomalies are something bad or good for business. Rather, we should see them as a signal that there have been some events that significantly influenced a business process and our goal is to determine the causes and potential consequences of such events and react immediately. Of course, from the business point of view, it is better to find such events than ignore them.

It was interesting comparing the results of the four methods.  H/T R-bloggers

Comments closed