Press "Enter" to skip to content

Curated SQL Posts

NOLOCK Bug

Brent Ozar reports on a NOLOCK bug in SQL Server 2014 SP1 CU6:

  • While one transaction is holding an exclusive lock on an object (Ex. ongoing table update), another transaction is executing parallelized SELECT (…) FROM SourceTable, using the NOLOCK hint, under the default SQL Server lock-based isolation level or higher. In this scenario, the SELECT query trying to access SourceTable will be blocked.

  • Executing a parallelized SELECT (…) INTO Table FROM SourceTable statement, specifically using the NOLOCK hint, under the default SQL Server lock-based isolation level or higher. In this scenario, other queries trying to access SourceTable will be blocked.

The current recommendation is not to install CU6 until the issue is fixed.

Comments closed

Fallacies of Data Science

Adnan Masood and David Lazar have a list of fallacies in the world of data science:

Extrapolating beyond the range of training data, especially in the case of time series data, is fine providing the data-set is large enough.

Strong Evidence is same as a Proof! Prediction intervals and confidence intervals are the same thing, just like statistical significance and practical significance.

These are some good things to think about if you’re getting into analytics.

Comments closed

Daily Briefings

Brent Ozar shows a way to see high-priority sp_Blitz items across an environment:

Step 3: run sp_Blitz @IgnorePrioritiesAbove = 50, @CheckUserDatabaseObjects = 0. This gets you the fast headline news, especially when used with the improved priorities in the latest version in our SQL Server download kit

It’s important to have one or two high-level systems you can glance at to see top-level health of a system.  This could be one of those systems.

Comments closed

Spring Cleaning

Tara Kizer gives some advice on proactive administrative measures:

If a database has a high number of Virtual Log Files (VLFs), it can impact the the speed of transaction log backups and database recovery. I once had a database with 75,000 VLFs. I didn’t even know what a VLF was at the time (hence having so many). After rebooting the server, a mission critical database with extremely high SLAs took 45 minutes to complete recovery. We were in the process of opening a support case with Microsoft when the database finally came online. The next day, I contacted a Microsoft engineer and learned about VLFs.

There are a number of helpful tidbits here, so check it out.

Comments closed

Crime Analysis

Raghavan Madabusi combines Zeppelin, R, and Spark to perform crime analysis:

Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and R.

This links to a rather long post on how to set up and use all of these pieces.  I’m more familiar with Jupyter than Zeppelin, but regardless of the notebook you choose, this is a good exercise to become familiar with the process.

Comments closed

Predictive Maintenance

David Smith shows off a predictive maintenance gallery for dealing with aircraft engines:

In each case, a number of different models are trained in R (decision forests, boosted decision trees, multinomial models, neural networks and poisson regression) and compared for performance; the best model is automatically selected for predictions.

On a related note, Microsoft recently teamed up with aircraft engine manufacturer Rolls-Royceto help airlines get the most out of their engines. Rolls-Royce is turning to Microsoft’s Azure cloud-based services — Stream Analytics, Machine Learning and Power BI — to make recommendations to airline executives on the most efficient way to use their engines in flight and on the ground. This short video gives an overview.

Check out the data set and play around a bit.

Comments closed

Blocking Merge Statement

Kendra Little walks through the MERGE command and potential blocking issues with it:

The holdlock hint is a way to get serializable isolation level in SQL Server for a specific table, without having to change the isolation level for your entire session. Serializable is the highest isolation level in SQL Server using pessimistic locking.

When you “HOLDLOCK”, you tell SQL Server to protect any rows you read with a range lock– just in case someone comes along and tries to change one or sneak  one in.

That means that even when you’re just reading ParentTable and not inserting a row, you’re taking out a key range lock. You’re willing to fight other users over those rows to protect your statement.

Kendra’s final advice is to avoid the MERGE command in most cases; read on to find out why.

Comments closed

ALTER DATABASE

Dave Mason notes that ALTER DATABASE needs to be in an autocommitted transaction and will not roll back:

If you create that trigger and subsequently change the recovery model, you’ll get an error. But the recovery model doesn’t get changed back to its original setting. As the TSql comment notes, ROLLBACK doesn’t work in this context. And that’s a shame too. Sometimes the recovery model gets changed on accident, by a third-party vendor’s software installation, or by someone that isn’t familiar with the ramifications. I was trying to prevent that. <Grumble> At least I knew it wasn’t allowed. I just didn’t know why. After seeing this Connect item, “why” didn’t matter.

Dave goes on to point out that under certain circumstances, ALTER DATABASE will simply fail, so there are preventative checks; we just can’t build our own.

Comments closed

Getting Current File Name In SSIS

Bill Fellows shows how to get the currently processing file name in SSIS:

So, as much as I like the built in solution, my pattern is to use a Derived Column to inject the file name into the Data Flow. I have a variable called CurrentFileName in all my packages. That contains the design-time path for my Flat File Connection Manager (or Excel). My Connection Manager will then have the ConnectionString/ExcelFilePath property assigned to be @[User::CurrentFileName]. This positions me for success because all I need to do is ensure that whatever mechanism I am using to determine my source file correctly populates that variable. In this post, a ForEach File Enumerator will handle that.

Within my Data Flow Task, I will add a Derived Column Transformation that adds my package variable into the data flow as a new column. Here, I am specifying it will be of data type DT_STR with a length of 130.

He follows up with some Biml to drive home the point.

Comments closed