Press "Enter" to skip to content

Curated SQL Posts

Bayesian Modeling Of Hardware Failure Rates

Sean Owen shows how you can use Bayesian statistical approaches with Spark Streaming, using the example of hard drive failure rates:

This data doesn’t arrive all at once, in reality. It arrives in a stream, and so it’s natural to run these kind of queries continuously. This is simple with Apache Spark’s Structured Streaming, and proceeds almost identically.

Of course, on the first day this streaming analysis is rolled out, it starts from nothing. Even after two quarters of data here, there’s still significant uncertainty about failure rates, because failures are rare.

An organization that’s transitioning this kind of offline data science to an online streaming context probably does have plenty of historical data. This is just the kind of prior belief about failure rates that can be injected as a prior distribution on failure rates!

Bayesian approaches work really well with streaming data if you think of the streams as sampling events used to update your priors to a new posterior distribution.

Comments closed

Handling Definitional Changes In Predictive Variables

Vincent Granville explains how you can blend two different definitions of a variable of interest together:

The reasons why scores can become meaningless over time is because data evolves. New features (variables) are added that were not available before, the definition of a metric is suddenly changed (for instance, the way income is measured) resulting in new data not compatible with prior data, and faulty scores. Also, when external data is gathered across multiple sources, each source may compute it differently, resulting in incompatibilities: for instance, when comparing individual credit scores from two people that are costumers at two different banks, each bank computes base metrics (income, recency, net worth, and so on) used to build the score, in a different way. Sometimes the issue is caused by missing data, especially when users with missing data are very different from those with full data attached to them.

Click through for a description of the approach and links showing how it works in practice.

Comments closed

Access Violation Error In SQL Server 2016 SP2 CU4

Lonny Niederstadt tracked down an ugly bug in SQL Server 2016 SP2 CU4:

When I started investigating, the error was known only as an access violation, preventing some operations related to data cleansing or fact table versioning.

It occurred deep within a series of stored procedures.  The execution environment included cross-database DELETE statements, cross-database synonyms, lots of SELECT statements against system views, scalar UDFs, and lots and lots of dynamic SQL.

And… I don’t have access to the four systems where the access violation occurred.

I was able to have some investigation performed on those systems – we learned that by disabling ‘auto stats update’ for the duration of the sensitive operations, the error was avoided.  We also learned that reverting a system from SQL Server 2016 SP2 CU4 to SP2 CU2 avoided the errors.  On those systems, reverting to SP2 CU2 or temporarily disabling ‘auto stats update’ were sufficient temporary mitigations.

Very interesting sleuthing work. It also appears the issue might have been limited to SP2 CU4, as SP2 CU3 and SP2 CU5 return different results in Lonny’s repro.

Comments closed

Attempted To Read Or Write Protected Memory

Kenneth Fisher explains a nasty-looking error to us:

So, are you seeing this error?

Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

If you read the error it might freak you out a bit. The key words memory and corrupt can be a bit .. concerning. Fortunately in this case they are also rather misleading.

Click through to understand what’s going on and how you can fix the problem if you see this error.

Comments closed

Tooling For SQL Server Automation With Powershell

Max Trinidad shares some tools you can use to automate SQL Server processes with Powershell:

For script automation we could install either or both version of PowerShell Core: (As of February 19th, 2019)
PowerShell Core GA version 6.1.3
PowerShell Core Preview 6.2.0 Preview 4

Here are some important PowerShell Modules to use for SQL Server management scripting:
*SQLServer – This module currently can be use on SQL Server 2017 and greater.
*DBATools – This a community supported module that will work with SQL Server 2000 and greater.
DBAReports – Supports for Windows SQL Server.
DBCheck – Support for Windows SQL Server.

Automation is a great DBA’s best weapon. Knowing the tools which help you automate your tasks is critical.

Comments closed

Looking At Compressed Pages

Jess Pomfret shows us what compressed data looks like in SQL Server:

We first need to switch on trace flag 3604: this will write the output of our DBCC PAGE command to the messages tab instead of the event log.

There are 4 parameters for DBCC PAGE: we will need to pass in the database name (or id), the file number, the page id and the print option.  Using a print option of 0 will give us just the page header. In these examples I’m going to use option 3 which gives us more details on the rows stored on the page. For more information on using DBCC PAGE I’d recommend Paul Randal’s post “How to use DBCC PAGE“.

This kind of investigation lets you see how compression really works.

Comments closed

Improving Plots With ggformula

Sebastian Sauer shows how you can use the ggformula package combined with ggplot2 to enhance your R visuals:

Since some time, there’s a wrapper for ggplot2 available, bundled in the package ggformula. One nice thing is that in that it plays nicely with the popular R package mosaicmosaic provides some useful functions for modeling along with a tamed and consistent syntax. In this post, we will discuss some “ornaments”, that is, some details of beautification of a plot. I confess that every one will deem it central, but in some cases in comes in handy to know how to “refine” a plot using ggformula.

Note that this “refinement” is primarily controlled via the function gf_refine() (most stuff), gf_lab() (for labs), and gf_lims() (for axis limits). Themes can be adjusted using gf_theme().

Click through for several examples.

Comments closed

Using Convolutional Neural Networks To Recognize Features In Images

Michael Grogan shows how you can use Keras to perform image recognition with a convolutional neural network:

VGG16 is a built-in neural network in Keras that is pre-trained for image recognition.

Technically, it is possible to gather training and test data independently to build the classifier. However, this would necessitate at least 1,000 images, with 10,000 or greater being preferable.

In this regard, it is much easier to use a pre-trained neural network that has already been designed for image classification purposes.

This is probably the best generally available technique for image classification.

Comments closed