2019-02-20 – Curated SQL

This data doesn’t arrive all at once, in reality. It arrives in a stream, and so it’s natural to run these kind of queries continuously. This is simple with Apache Spark’s Structured Streaming, and proceeds almost identically.
Of course, on the first day this streaming analysis is rolled out, it starts from nothing. Even after two quarters of data here, there’s still significant uncertainty about failure rates, because failures are rare.
An organization that’s transitioning this kind of offline data science to an online streaming context probably does have plenty of historical data. This is just the kind of prior belief about failure rates that can be injected as a prior distribution on failure rates!

Bayesian approaches work really well with streaming data if you think of the streams as sampling events used to update your priors to a new posterior distribution.

Comments closed

Handling Definitional Changes In Predictive Variables

Published 2019-02-20 by Kevin Feasel

Vincent Granville explains how you can blend two different definitions of a variable of interest together:

The reasons why scores can become meaningless over time is because data evolves. New features (variables) are added that were not available before, the definition of a metric is suddenly changed (for instance, the way income is measured) resulting in new data not compatible with prior data, and faulty scores. Also, when external data is gathered across multiple sources, each source may compute it differently, resulting in incompatibilities: for instance, when comparing individual credit scores from two people that are costumers at two different banks, each bank computes base metrics (income, recency, net worth, and so on) used to build the score, in a different way. Sometimes the issue is caused by missing data, especially when users with missing data are very different from those with full data attached to them.

Click through for a description of the approach and links showing how it works in practice.

Comments closed

Access Violation Error In SQL Server 2016 SP2 CU4

Published 2019-02-20 by Kevin Feasel

Lonny Niederstadt tracked down an ugly bug in SQL Server 2016 SP2 CU4:

When I started investigating, the error was known only as an access violation, preventing some operations related to data cleansing or fact table versioning.

It occurred deep within a series of stored procedures. The execution environment included cross-database DELETE statements, cross-database synonyms, lots of SELECT statements against system views, scalar UDFs, and lots and lots of dynamic SQL.

And… I don’t have access to the four systems where the access violation occurred.

I was able to have some investigation performed on those systems – we learned that by disabling ‘auto stats update’ for the duration of the sensitive operations, the error was avoided. We also learned that reverting a system from SQL Server 2016 SP2 CU4 to SP2 CU2 avoided the errors. On those systems, reverting to SP2 CU2 or temporarily disabling ‘auto stats update’ were sufficient temporary mitigations.

Very interesting sleuthing work. It also appears the issue might have been limited to SP2 CU4, as SP2 CU3 and SP2 CU5 return different results in Lonny’s repro.

Comments closed

COUNT And NULL

Published 2019-02-20 by Kevin Feasel

Bert Wagner explains some of the trickiness of COUNT and NULL values in SQL Server:

One thing I see fairly often (and am occasionally guilty of myself) is using COUNT(DISTINCT) and DISTINCT interchangeably to get an idea of the number of unique values in a column.
While they will sometimes give you the same results, depending on the data the two methods may not always be interchangeable.

Click through for examples.

Comments closed

Attempted To Read Or Write Protected Memory

Published 2019-02-20 by Kevin Feasel

Kenneth Fisher explains a nasty-looking error to us:

So, are you seeing this error?
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
If you read the error it might freak you out a bit. The key words memory and corrupt can be a bit .. concerning. Fortunately in this case they are also rather misleading.

Click through to understand what’s going on and how you can fix the problem if you see this error.

Comments closed

Tooling For SQL Server Automation With Powershell

Published 2019-02-20 by Kevin Feasel

Max Trinidad shares some tools you can use to automate SQL Server processes with Powershell:

For script automation we could install either or both version of PowerShell Core: (As of February 19th, 2019)
PowerShell Core GA version 6.1.3
PowerShell Core Preview 6.2.0 Preview 4
Here are some important PowerShell Modules to use for SQL Server management scripting:
*SQLServer – This module currently can be use on SQL Server 2017 and greater.
*DBATools – This a community supported module that will work with SQL Server 2000 and greater.
DBAReports – Supports for Windows SQL Server.
DBCheck – Support for Windows SQL Server.

Automation is a great DBA’s best weapon. Knowing the tools which help you automate your tasks is critical.

Comments closed

Fun With QUOTENAME

Published 2019-02-20 by Kevin Feasel

Louis Davidson shares some tips on using the QUOTENAME function:

Or if you are Rob Volk (@sql_r on Twitter), and you want to create an annoying database on your best frenemy’s SQL Server that includes brackets in the name, like:
This [database] Is Awesome
You will need to do:
CREATE DATABASE [This [database]] Is Awesome];

I’m not saying you should do that, but I’m also not saying you shouldn’t.

Comments closed

Looking At Compressed Pages

Published 2019-02-20 by Kevin Feasel

Jess Pomfret shows us what compressed data looks like in SQL Server:

We first need to switch on trace flag 3604: this will write the output of our DBCC PAGE command to the messages tab instead of the event log.
There are 4 parameters for DBCC PAGE: we will need to pass in the database name (or id), the file number, the page id and the print option. Using a print option of 0 will give us just the page header. In these examples I’m going to use option 3 which gives us more details on the rows stored on the page. For more information on using DBCC PAGE I’d recommend Paul Randal’s post “How to use DBCC PAGE“.

This kind of investigation lets you see how compression really works.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28

Day: February 20, 2019

Bayesian Modeling Of Hardware Failure Rates

Handling Definitional Changes In Predictive Variables

Access Violation Error In SQL Server 2016 SP2 CU4

COUNT And NULL

Attempted To Read Or Write Protected Memory

Tooling For SQL Server Automation With Powershell

Fun With QUOTENAME

Looking At Compressed Pages