Curated SQL – Page 1515 – A Fine Slice Of SQL Server

There are 3 important factors that I use for the determination of the column that is really good for the Segment Elimination and hence the Segment Alignment:
– The support for the Segment Elimination (and then for the Predicate Pushdown). If the data type does not support Segment Elimination, than why would someone optimise for it ?
– The frequency with which the column is used in the predicates (not in the joins, because this is where generally the Segment Elimination/Predicate Pushdown does not function)
– The number of the distinct values within a table/partition (if we have more Segments than distinct values, it is not a very good sign generally: example – 10 million rows with 5 distinct values)

Read on for more details.

Comments closed

Waits With Outsized Importance

Published 2017-08-28 by Kevin Feasel

Brent Ozar has a few wait types whose appearance in your “uh-oh” list is disproportionate to the wait type’s relative percentage:

RESOURCE_SEMAPHORE_QUERY_COMPILE – this means a query came in, and SQL Server didn’t have an execution plan cached for it. In order to build an execution plan, SQL Server needs a little memory – not a lot, just a little – but that memory wasn’t available. SQL Server had to wait for memory to become available before it could even build an execution plan. For more details and a reproduction script, check out my Bad Idea Jeans: Dynamically Generating Ugly Queries post. In this scenario, cached query plans (and small ones) may be able to proceed just fine (depending on how much pressure the server is under), but the ugly ones will feel frozen.

Read on for a couple more examples of poison wait types.

Comments closed

In-Memory OLTP In SQL Server 2017

Published 2017-08-28 by Kevin Feasel

Ned Otter walks us through all of the changes to memory-optimized objects coming in SQL Server 2017:

Up to and including SQL 2016, the maximum number of nonclustered indexes on a memory-optimized table was eight, but that limitation has been removed for SQL 2017. I’ve tested this with almost 300 indexes, and it worked. With this many supported indexes, it’s no wonder they had to….
Enhance the index rebuild performance for nonclustered indexes during database recovery. I confirmed with Microsoft that the database does not have be in SQL 2017 compatibility mode (140) to benefit from the index rebuild enhancement. This type of rebuild happens not only for database restore and failover, but also for other “recovery events” – see my blog post here.

Read on for several more improvements, as well as a couple of things which Ned would like to see but aren’t there yet.

Comments closed

When CHECKDB Snapshots Run Out Of Disk Space

Published 2017-08-28 by Kevin Feasel

Andy Galbraith walks through an error message in DBCC CHECKDB when the snapshot runs out of disk space:

Looking in the SQL Error Log there were hundreds of these combinations in the minutes immediately preceding the job failure:

—

The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x000048a123e000 in file ‘E:\SQL_Data\VLDB01.mdf:MSSQL_DBCC17‘. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

—

Error: 17053, Severity: 16, State: 1.

—

E:\SQL_Data\VLDB01.mdf:MSSQL_DBCC17: Operating system error 665(The requested operation could not be completed due to a file system limitation) encountered.

Read on for more information, including a rough idea of how much space the snapshot requires as well as a few workarounds and hints.

Comments closed

Classifying Time Series Data With TensorFlow

Published 2017-08-25 by Kevin Feasel

Burak Himmetoglu applies a time series data set to two different types of neural networks using TensorFlow:

In this blog post, I will discuss the use of deep leaning methods to classify time-series data, without the need to manually engineer features. The example I will consider is the classic Human Activity Recognition (HAR) dataset from the UCI repository. The dataset contains the raw time-series data, as well as a pre-processed one with 561 engineered features. I will compare the performance of typical machine learning algorithms which use engineered features with two deep learning methods (convolutional and recurrent neural networks) and show that deep learning can surpass the performance of the former.

I have used Tensorflow for the implementation and training of the models discussed in this post. In the discussion below, code snippets are provided to explain the implementation. For the complete code, please see my Github repository.

Click through for the samples, or check out the repo, linked above.

Comments closed

Evaluating A Data Science Project

Published 2017-08-25 by Kevin Feasel

Tom Fawcett gives us an interesting evaluation of a data science case study:

The model is a fully connected neural network with three hidden layers, with a ReLU as the activation function. They state that data from Google Compute Engine was used to train the model (implemented in TensorFlow), and Cloud Machine Learning Engine’s HyperTune feature was used to tune hyperparameters.

I have no reason to doubt their representation choices or network design, but one thing looks odd. Their output is two ReLU (rectifier) units, each emitting the network’s accuracy (technically: recall) on that class. I would’ve chosen a single Softmax unit representing the probability of Large Loss driver, from which I could get a ROC or Precision-Recall curve. I could then threshold the output to get any achievable performance on the curve. (I explain the advantages of scoring over hard classification in this post.)

But I’m not a neural network expert, and the purpose here isn’t to critique their network design, just their general approach. I assume they experimented and are reporting the best performance they found.

Read the whole thing.

Comments closed

Reversing Dynamic Data Masking

Published 2017-08-25 by Kevin Feasel

Joe Obbish shows how easy it is to reverse Dynamic Data Masking:

Armed with our new knowledge, we can create a single SQL query that decodes all of the SSNs. The strategy is to define a single CTE with all ten digits and to use one CROSS APPLY for each digit in the SSN. Each CROSS APPLY only references the SSN column in the WHERE clause and returns the matching prefix of the SSN that we’ve found so far. Here’s a snippet of the code:

Click through for progressively faster solutions. This is the main reason I do not care for DDM as a feature. Its main benefit seems to be preventing shoulder-surfing on reports; any concerted attacker with a little bit of access to writing queries can subvert it.

Comments closed

Substrings In SQL Server Versus Oracle

Published 2017-08-25 by Kevin Feasel

Daniel Janik continues his SQL Server versus Oracle syntax comparison series:

Parsing strings is a feature that is often needed in the database world and SUBSTRING/SUBSTR are designed to do just that. I find it interesting how these two platforms approached the functions differently and that’s definitely shows how you can do many things to get to the same answer.

It’s a short post, but Daniel does show one big difference between the Oracle and SQL Server substring functions.

Comments closed

When To Use Always Encrypted

Published 2017-08-25 by Kevin Feasel

Brent Ozar gives us some good pointers on when to use Always Encrypted:

But that comes with a few big drawbacks. They’re really well-documented, but here’s the highlights:

Do you need to query that data from other apps? Do you have a data warehouse, reporting tools, PowerBI, Analysis Services cubes, etc? If so, those apps will also need to be equipped with the latest database drivers and your decryption certificates. For example, here’s how you access Always Encrypted data with PowerBI. Any app that expects to read the encrypted data is going to need work, and that’s especially problematic if you’re replicating the data to other SQL Servers.

Click through to read the rest. Always Encrypted was designed to encrypt a few columns, not everything in a database.

Comments closed

Investigating The OS Workers DMV

Published 2017-08-25 by Kevin Feasel

Ewald Cress continues his DMV internals series:

wait_started_ms_ticks is set in SOS_Task::PreWait(), i.e. just before actually suspending, and again cleared in SOS_Task::PostWait(). For more about the choreography of suspending, see here.

wait_resumed_ms_ticks is set in SOS_Scheduler::PrepareWorkerForResume(), itself called by the mysteriously named but highly popular SOS_Scheduler::ResumeNoCuzz().

start_quantum is set for the Resuming and InstantResuming case within SOS_Scheduler::TaskTransition(), called by SOS_Scheduler::Switch() as the worker is woken up after a wait.

Ewald intends this post as an extension of the official documentation, so it’s best to read that documentation in conjunction with this post.

Comments closed

Curated SQL Posts