This data doesn’t arrive all at once, in reality. It arrives in a stream, and so it’s natural to run these kind of queries continuously. This is simple with Apache Spark’s Structured Streaming, and proceeds almost identically.
Of course, on the first day this streaming analysis is rolled out, it starts from nothing. Even after two quarters of data here, there’s still significant uncertainty about failure rates, because failures are rare.
An organization that’s transitioning this kind of offline data science to an online streaming context probably does have plenty of historical data. This is just the kind of prior belief about failure rates that can be injected as a prior distribution on failure rates!
Bayesian approaches work really well with streaming data if you think of the streams as sampling events used to update your priors to a new posterior distribution.
The reasons why scores can become meaningless over time is because data evolves. New features (variables) are added that were not available before, the definition of a metric is suddenly changed (for instance, the way income is measured) resulting in new data not compatible with prior data, and faulty scores. Also, when external data is gathered across multiple sources, each source may compute it differently, resulting in incompatibilities: for instance, when comparing individual credit scores from two people that are costumers at two different banks, each bank computes base metrics (income, recency, net worth, and so on) used to build the score, in a different way. Sometimes the issue is caused by missing data, especially when users with missing data are very different from those with full data attached to them.
Click through for a description of the approach and links showing how it works in practice.
When I started investigating, the error was known only as an access violation, preventing some operations related to data cleansing or fact table versioning.
It occurred deep within a series of stored procedures. The execution environment included cross-database DELETE statements, cross-database synonyms, lots of SELECT statements against system views, scalar UDFs, and lots and lots of dynamic SQL.
And… I don’t have access to the four systems where the access violation occurred.
I was able to have some investigation performed on those systems – we learned that by disabling ‘auto stats update’ for the duration of the sensitive operations, the error was avoided. We also learned that reverting a system from SQL Server 2016 SP2 CU4 to SP2 CU2 avoided the errors. On those systems, reverting to SP2 CU2 or temporarily disabling ‘auto stats update’ were sufficient temporary mitigations.
Very interesting sleuthing work. It also appears the issue might have been limited to SP2 CU4, as SP2 CU3 and SP2 CU5 return different results in Lonny’s repro.
One thing I see fairly often (and am occasionally guilty of myself) is using COUNT(DISTINCT) and DISTINCT interchangeably to get an idea of the number of unique values in a column.
While they will sometimes give you the same results, depending on the data the two methods may not always be interchangeable.
Click through for examples.
So, are you seeing this error?
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
If you read the error it might freak you out a bit. The key words memory and corrupt can be a bit .. concerning. Fortunately in this case they are also rather misleading.
Click through to understand what’s going on and how you can fix the problem if you see this error.
Here are some important PowerShell Modules to use for SQL Server management scripting:
*SQLServer – This module currently can be use on SQL Server 2017 and greater.
*DBATools – This a community supported module that will work with SQL Server 2000 and greater.
DBAReports – Supports for Windows SQL Server.
DBCheck – Support for Windows SQL Server.
Automation is a great DBA’s best weapon. Knowing the tools which help you automate your tasks is critical.
Or if you are Rob Volk (@sql_r on Twitter), and you want to create an annoying database on your best frenemy’s SQL Server that includes brackets in the name, like:
This [database] Is Awesome
You will need to do:
CREATE DATABASE [This [database]] Is Awesome];
I’m not saying you should do that, but I’m also not saying you shouldn’t.
We first need to switch on trace flag
3604: this will write the output of our
DBCC PAGEcommand to the messages tab instead of the event log.
There are 4 parameters for
DBCC PAGE: we will need to pass in the database name (or id), the file number, the page id and the print option. Using a print option of 0 will give us just the page header. In these examples I’m going to use option 3 which gives us more details on the rows stored on the page. For more information on using
DBCC PAGEI’d recommend Paul Randal’s post “How to use DBCC PAGE“.
This kind of investigation lets you see how compression really works.