Curated SQL – Page 1431 – A Fine Slice Of SQL Server

Minimize Updates

Published 2017-06-29 by Kevin Feasel

Lukas Eder shows the importance of minimizing the scope of update statements:

Optionally, just as with JPA, you can turn on optimistic locking on this statement. The important thing here is that the clicks and purchases columns are left untouched, because they were not changed by the client code. This is different from JPA, which either sends all the values by default, or if you specify @DynamicUpdate in Hibernate, it would send only the last_name column, because while first_name was changed it was not modified.

My definition:

changed: The value is “touched”, its state is “dirty” and the state needs to be synched to the database, regardless of modification.

modified: The value is different from its previously known value. By necessity, a modified value is always changed.

As you can see, these are different things, and it is quite hard for a JPA-based API like Hibernate to implement changed semantics because of the annotation-based declarative nature of how entities are defined. We’d need some sophisticated instrumentation to intercept all data changes even when the values have not been modified (I didn’t make those attributes public by accident).

I found this an interesting walkthrough of data layer-level mechanisms that directly affect database performance.

Comments closed

Winnowing Down WhoIsActive Data

Published 2017-06-29 by Kevin Feasel

Kendra Little shows how to use temporary objects to pare down the results of sp_whoisactive for later storage:

I used the @schema parameter to have sp_WhoIsActive generate the schema for the table itself. Full instructions on doing this by Adam are here.

Since I care about tempdb in the case of this example, I used @output_column_list to specify that those columns should come first, followed by the rest of the columns.

I also elected to set @get_plans to 1 to get query execution plans if they’re available. That’s not free, and they can take up a lot of room, but they can contain a lot of helpful info.

This is a very useful guide, and also read the linked documentation for sp_whoisactive; there’s a huge amount of goodness in that one procedure.

Comments closed

BimlExpress’s Preview Pane

Published 2017-06-29 by Kevin Feasel

Ben Weissman shows off the preview pane in BimlExpress 2017:

Once you’ve installed BimlExpress 2017 and open your first Biml file, you will probably notice immediately, that the screen is split horizontally – that is exactly for the preview pane.

If you need more real estate for your actual code, just click the „Hide“ button at the lower left corner.
To actually get a preview, click the „Update“ button on the lower right:

That’s a pretty nice feature. It can be hard sometimes to debug Biml issues because you’re often writing code to write code to write code.

Comments closed

Creating Docker Volumes

Published 2017-06-29 by Kevin Feasel

Andrew Pruski continues his series on long-term data storage and Docker:

Awesome stuff! We’ve got a database that was created in another container successfully attached into another one.

So at this point you may be wondering what the advantage is of doing this over mounting folders from the host? Well, to be honest, I really can’t see what the advantages are.

The volume is completely contained within the docker ecosystem so if anything happens to the docker install, we’ve lost the data. OK, OK, I know it’s in C:\ProgramData\docker\volumes\ on the host but still I’d prefer to have more control over its location.

It’s worth reading the whole thing, even though this isn’t the best way to keep data long-term. It’s important to know about this strategy even if only to keep it from accidentally ruining your day later.

Comments closed

Neural Nets On Spark

Published 2017-06-28 by Kevin Feasel

Nisha Muktewar and Seth Hendrickson show how to use Deeplearning4j to build deep learning models on Hadoop and Spark:

Modern convolutional networks can have several hundred million parameters. One of the top-performing neural networks in the Large Scale Visual Recognition Challenge (also known as “ImageNet”), has 140 million parameters to train! These networks not only take a lot of compute and storage resources (even with a cluster of GPUs, they can take weeks to train), but also require a lot of data. With only 30000 images, it is not practical to train such a complex model on Caltech-256 as there are not enough examples to adequately learn so many parameters. Instead, it is better to employ a method called transfer learning, which involves taking a pre-trained model and repurposing it for other use cases. Transfer learning can also greatly reduce the computational burden and remove the need for large swaths of specialized compute resources like GPUs.

It is possible to repurpose these models because convolutional neural networks tend to learn very general features when trained on image datasets, and this type of feature learning is often useful on other image datasets. For example, a network trained on ImageNet is likely to have learned how to recognize shapes, facial features, patterns, text, and so on, which will no doubt be useful for the Caltech-256 dataset.

This is a longer post, but on an extremely interesting topic.

Comments closed

Running H2O In R On Azure HDInsight

Published 2017-06-28 by Kevin Feasel

Daisy Deng shows how to configure HDInsight to be able to run the H2O package in R rather than Python or Scala:

We provide a few script actions for installing rsparkling on Azure HDInsight. When creating the HDInsight cluster, you can run the following script action for header node:

https://bostoncaqs.blob.core.windows.net/scriptaction/scriptaction-head.sh

And run the following action for the worker node:

https://bostoncaqs.blob.core.windows.net/scriptaction/scriptaction-worker.sh

Please consult Customize Linux-based HDInsight clusters using Script Action for more details.

Click through for the full process.

Comments closed

Using Power Query’s Combine Feature

Published 2017-06-28 by Kevin Feasel

Matt Allington explains how to use the Combine feature in Power Query:

Now the good news is that somehow the combine button has done a reasonable job at combining the files (see 1 below). Actually I wanted to unpivot the month columns, but I will come back to that later.

The bad news is all of the activity over on the left hand side. What the…? There are 5 additional “Queries” on the left (numbered 2 – 6) that do all sorts of things. Let me tell you what each of these are and then come back and explain how to use/interpret these things.

2. This is a parameter that can be used to change the sample file

3. This is the link to the sample file that was selected originally (I selected the first file in the folder and this is the link to that file).

4. This is the “by example query” – the most important query to know about.

5. This is an auto generated function that goes with 4 above.

6. This is the final output query (it is the query that is displayed in 1 above).

Matt clearly explains the whole process, so read on if you need to combine files of Power Query.

Comments closed

Indirect Checkpoint And Non-Yielding Scheduler Problems

Published 2017-06-28 by Kevin Feasel

Parikshit Savjani has a post describing an issue you might experience with indirect checkpoint post SQL Server-2012:

One of the scenarios where skewed distribution of dirty pages in the DPList is common is tempdb. Starting SQL Server 2016, indirect checkpoint is turned ON by default with target_recovery_time set to 60 for model database. Since tempdb database is derived from model during startup, it inherits this property from model database and has indirect checkpoint enabled by default. As a result of the skewed DPList distribution in tempdb, depending on the workload, you may experience excessive spinlock contention and exponential backoffs on DPList on tempdb. In scenarios when the DPList has grown very long, the recovery writer may produce a non-yielding scheduler dump as it iterates through the long list (20k-30k) and tries to acquire spinlock and waits with exponential backoff if spinlock is taken by multiple IOC routines for removal of pages.

This is worth taking a close read.

Comments closed

Bacpacing In Azure

Published 2017-06-28 by Kevin Feasel

Derik Hammer shows how to use a bacpac file to deploy an existing database to Azure SQL Database:

The recommended method for working with Azure is always PowerShell. The Azure portal and SSMS are tools there for your convenience but they do not scale well. If you have multiple databases to migrate, potentially from multiple servers, using PowerShell will be much more efficient. Scripting your Azure work makes it repeatable and works towards the Infrastructure as Code concept.

In this demonstration, the below steps will be used.

Export the bacpac file to a local directory with sqlpackage.exe.
Copy the bacpac to Azure Blob Storage with AzCopy.exe
Use the PowerShell AzureRM module and cmdlets to create an Azure SQL Database from the bacpac file.

Derik shows the point-and-click way as well as the Powershell way.

Comments closed

Remote Installation Of Powershell On SQL Server 2017

Published 2017-06-28 by Kevin Feasel

Tracy Boggiano has a script to install the SQL Server 2017 Powershell module via Powershell remoting:

Once the file is copied to the server locally you can use run the below script from your local machine to install SQL Server PowerShell 2017 for all user to use on the server. The Get-CMSHosts function can be found on my blog here. You will need to download PsExec from here and extract to location on your local computer and provide the path.

NOTE: This is a hotfix that requires a RESTART. Be careful in PRODUCTION.

Click through for the script.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts