Month: March 2018

Query Store Deadlock When Creating Databases

Published 2018-03-21 by Kevin Feasel

Andy Mallon ran into a weird issue with Query Store:

I tried the configuration a couple of times just to make sure it wasn’t a one-off problem. I installed the latest Cumulative Update (CU). I made sure nothing else was connected to the instance. I rebooted my machine. I restarted services. I banged my head against the wall. I asked a friend if I was insane or stupid. After confirming that I was both, my friend Aaron Bertrand (blog|twitter) confirmed it wasn’t a problem for him.

I discovered I could reproduce the problem simply by running the same simple statement that SSRS used when creating the ReportServer database. SSRS uses a non-standard collation, and specifying that collation seems to be the difference in causing the deadlock. Then I discovered that specifying ANY non-standard collation was causing the deadlock. This had nothing to do with SSRS, and everything to do with non-default collations.

Vote for his User Voice item too.

Comments closed

Adding Instance Name To The AlwaysON_health Session

Published 2018-03-21 by Kevin Feasel

Jonathan Kehayias shows how to add server_instance_name to the AlwaysON_health event session to make Availability Group troubleshooting easier:

The AlwaysOn_health event session in Extended Events is intended to make analyzing problems with Availability Groups possible after they have occurred. While this event session goes a long way towards making it possible to piece together the puzzle of what went wrong in a lot of situations, it can still be a difficult task. One of the things I wish Microsoft had included in the AlwaysON_health event session definition is the sqlserver.server_instance_name action for every event, and this is something that I usually recommend clients add to each of their AG servers using a script after I work with them the first time. Why would this be useful, since if the files come from a specific server we should know the events are for that server right? Well, when we are working with AG configurations with more than two servers and trying to see the big picture of what is happening across the servers, it can be difficult to follow timelines with multiple files from multiple servers open. It’s not impossible, but it does make things more difficult.

Click through to see how to do this through the UI or via T-SQL.

Comments closed

Performance Concern: Anti-Join And Top

Published 2018-03-21 by Kevin Feasel

Paul White explains a scenario in which an innocent-looking execution plan can hide something sinister:

Not every execution plan containing an apply anti join with a Top (1) operator on its inner side will be problematic. Nevertheless, it is a pattern to recognise and one which almost always requires further investigation.

The four main elements to look out for are:

A correlated nested loops (apply) anti join
A Top (1) operator immediately on the inner side
A significant number of rows on the outer input (so the inner side will be run many times)
A potentially expensive subtree below the Top

Read the whole thing. This is a great way to wrap up the series.

Comments closed

R Data Frames And stringsAsFactors

Published 2018-03-20 by Kevin Feasel

John Mount recommends setting stringsAsFactors = FALSE for data frames in R:

R often uses a concept of factors to re-encode strings. This can be too early and too aggressive. Sometimes a string is just a string.

Tibbles have this set by default. For an explanation as to why it defaults to TRUE for data frames, Roger Peng has the story.

Comments closed

The Microsoft Team Data Science Process Lifecycle Versus CRISP-DM

Published 2018-03-20 by Kevin Feasel

Melody Zacharias compares Microsoft’s Team Data Science Process lifecycle with the CRISP-DM process:

As I pointed out in my previous blog, the TDSP lifecycle is made up of five iterative stages:

Business Understanding

Data Acquisition and Understanding

Modeling

Deployment

Customer Acceptance

This is not very different from the six major phases used by the Cross Industry Standard Process for Data Mining (“CRISP-DM”).

This is part of a series on data science that Melody is putting together, so check it out.

Comments closed

Corrupting Managed Instances

Published 2018-03-20 by Kevin Feasel

Brent Ozar has found a bug with Azure SQL Database Managed Instances:

Corruption happens. It’s just a fact of life – storage is gonna fail. Microsoft’s SLAs for storage only give you 3-4 9’s, and there’s nothing in there about never losing your data. Nothing against Azure, either – I’ve lost entire VMs in AWS due to storage corruption.

So let’s demo it. Normally, this kind of thing might be hard to do, but at the moment, DBCC WRITEPAGE is enabled (although I expect that to change before MIs hit General Availability.) I used Erik’s notorious sp_GoAheadAndFireMe to purposely corrupt the master database (not TempDB. I modified it to work with a user database instead, ran it, and in less than ten seconds, the entire instance went unresponsive.

It’s a good post, so check it out.

Comments closed

Collecting PRINT Outputs From Powershell

Published 2018-03-20 by Kevin Feasel

Jana Sattainathan shows how to query a number of SQL Server instances in parallel using Powershell and collecting the PRINT outputs from each:

As an example, you may have a block of SQL that PRINTs out the current privileges in the databasethat can then be saved off and used as an independent script.

In my case today, I need to collect information on the Service Pack and Cumulative Updates that need to be installed/applied to 200+ SQL Server instances. MS provides the SQL script to identify the right SP and CU to install. Please make sure you download the zip file and unzip the long SQL. I need to run this SQL against the instances to get the info. However, the information returned is completely with PRINT statements!

Click through to see how Jana did it.

Comments closed

For GDPR, Don’t Forget Query Monitoring Tools

Published 2018-03-20 by Kevin Feasel

Grant Fritchey points out another spot that might store personal information:

When you capture query metrics through trace events or extended events, either using rpc_completed or sql_batch_completed, you not only get the query. You also get any parameter values associated with that query. Article 17 of the GDPR is extremely clear:

The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay…

While there are a list of exceptions to the definitions of Article 17 listed at the link, none of those is because the data isn’t in the database or is stored in some separate information store such as your monitoring of queries. Instead, the GDPR pretty much says that any place (SharePoint, Excel, etc.) that the data resides, must be documented as part of your processing and is subject to control through the Regulation.

Read the whole thing.

Comments closed

Quantifier {x,y} Following Nothing

Published 2018-03-20 by Kevin Feasel

Shane O’Neill reminds us that reading is fundamental:

Glancing at the error message, the first things that stick out are the bits “{x,y}” so I change my regex to be anywhere from 1 to digits "*\d{1,6}$"

Why are you glancing, read the error message!

That doesn’t work, so I again quickly scan the error message and see the bit “following nothing”

“Quickly scan”?! No, actually read the error message!!

This wasn’t a great error message, but at least it does make sense after the fact and it’s a case where actually reading the error message does clarify things. The ones I really hate are “error 0x4849f8f8” types which aren’t even intended to make any sense to us.

Comments closed

Data Lake Permissions

Published 2018-03-20 by Kevin Feasel

Melissa Coates has started a multi-part series on Azure Data Lake permissions. She’s put up the first three parts already. Part 1 covers the types of permissions available as well as some official documentation:

(1) RBAC permissions to the ADLS account itself, for the purpose of managing the resource.
RBAC = Role-based access control. RBAC are the familiar Azure roles such as reader, contributor, or owner. Granting a role on the service allows someone to view or manage the configuration and settings for that particular Azure service (ADLS in this case). See Part 2 for info about setting up RBAC.

Part 2 looks at permissions for the Azure Data Lake Store service itself:

Setting permissions for the service + the data stored in ADLS is always two separate processes, with one exception: when you define an owner for the ADLS service in Azure, that owner is automatically granted ‘superuser’ (full) access to manage the ADLS resource in Azure *AND* full access to the data. Any other RBAC role other than owner needs the data access specifically assigned via ACLs. This is a good thing because not all system administrators need to see the data, and not all data access users/groups/service principals need access to the service itself. This type of separation is true for certain other services too, such as Azure SQL Database.

Try to use groups whenever you can to grant access, rather than individual accounts. This is a consistent best practice for managing security across many types of systems.

Part 3 covers using ACLs to grant rights to specific files or folders in Azure Data Lake Storage:

There are two types of ACLs: Access ACLs and Default ACLs.

An Access ACL is the read/write/execute permissions specified for a folder or file. Every single folder or file has its security explicitly defined — so that means the ADLS security model is not an ‘inheritance’ model. That is an important concept to remember.

A Default ACL is like a ‘template’ setting at a folder level (the concept of a default doesn’t apply at the file level). Any new child item placed in that folder will automatically obtain that default security setting. The default ACLs are absolutely critical, given that data permissions aren’t an inheritance model. You want to avoid a situation where a user has permission to read a folder, but is unable to see any of the files within the folder — that situation will happen if a new file gets added to a folder which has an access ACL set at the folder level, but not a default ACL to apply to new child objects.

There’s a lot of good information here and I’m looking forward to parts 4 and 5.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31