Press "Enter" to skip to content

Month: July 2018

Finding Low-Hanging Fruit When Tuning SQL Server

Kevin Hill has a couple scripts which help him find easy performance gains:

Recently, I’ve been getting a lot of performance tuning work, much of which is basically “things are slow…can you fix them?” type of requests.  Most experienced DBAs know that there a few gazillion factors that can lead to this request, and I won’t re-hash them here.

Lets assume, that we’ve optimzed or eliminated the server, disks, network, etc. and were now looking at SQL code and everyone’s favorite – indexes.

I have two scripts I use that give me a quick overview of the type of work SQL Server is doing.   These complement each other, and are used AS A STARTING POINT to locate the low-hanging fruit that can be causing excessive work for the disks and memory of the server.

Click through for those scripts.

Comments closed

Building Recurrent Neural Networks Using TensorFlow

Ahmet Taspinar walks us through creating a recurrent neural network topology using TensorFlow:

As we have also seen in the previous blog posts, our Neural Network consists of a tf.Graph() and a tf.Session(). The tf.Graph() contains all of the computational steps required for the Neural Network, and the tf.Session is used to execute these steps.

The computational steps defined in the tf.Graph can be divided into four main parts;

  1. We initialize placeholders which are filled with batches of training data during the run.

  2. We define the RNN model and to calculate the output values (logits)

  3. The logits are used to calculate a loss value, which then

  4. is used in an Optimizer to optimize the weights of the RNN.

As a lazy casual, I’ll probably stick with letting Keras do most of the heavy lifting.

Comments closed

Extractors In Scala

Jyoti Sachdeva explains what extractors are in Scala and why they’re useful:

An extractor is an object that has an unapply method. It takes an object as an input and gives back arguments. Custom extractors are created using the unapply method. The unapply method is called extractor because it takes an element of the same set and extracts some of its parts,  apply method also called injection acts as a constructor, takes some arguments and yields an element of a given set.

Click through for explanatory examples.

Comments closed

The Power Of Predicate Pushdown

Pedro Lopes explains how predicate pushdown helps improve performance on queries:

First, let’s define a few terms, so we can see how to detect whether we’re making good use of our indexes, as they relate to the queries running in our SQL Server.

  1. Whenever you submit a query to SQL Server, if it includes a JOIN and/or WHERE clause, that constitutes a row filtering pattern known as a predicate.
  2. The query optimizer can use that to estimate how to best retrieve only the intended rows, after that predicate has been applied, this surfaces in the query plan as the Estimated Number of Rows.
  3. When that estimated plan is executed, and you look at the actual execution plan, this surfaces as the Actual Number of Rows. Usually, a big difference between Estimated and Actual number of rows indicates a misestimation that may need to be addressed to improve performance: maybe you don’t have the right indexes in place?

These are the two properties related to rows you had on every SQL Server plan up to SQL Server 2014.

Read on to learn how predicate pushdown can make queries faster.

Comments closed

Generating Index Drop And Create Statements

Drew Furgiuele says “Game over, man, game over!” to indexes:

The premise is simple: it will generate a series of DROP and then CREATE INDEX commands for every index. The process is a little more complex in practice, but at a high level it:

  1. Creates a special schema to house a temporary object,
  2. Creates a special stored procedure to run the code,
  3. Calls said stored procedure,
  4. Generates a bunch of PRINT statements that serve as the output (along with new line support for readability),
  5. Cleans up the stored procedure it generated,
  6. And finally deletes the schema it created.

Nifty.

Click through for the script, as well as a bonus Powershell script.  Because hey, it’s only six lines of code.

Comments closed

Instance-To-Instance Migrations With Start-DbaMigration

Chrissy LeMaire touts one of the best parts of dbatools:

dbatools is such a fun toolset to work on, but specifically, I can no longer live without Start-DbaMigration. Even in smaller shops, migrations are often required and they are always a lot of work.

At least they used to be, before I built the command that started it all: Start-DbaMigration. Start-DbaMigration is an instance to instance migration command that migrates just about everything. It’s really a wrapper that simplifies nearly 30 other copy commands, including Copy-DbaDatabaseCopy-DbaLogin, and Copy-DbaSqlServerAgent.

Also a bonus shout out to dbachecks.

Comments closed

The Impact Of Auto-Growth Settings For Log Files

Jamie Wick has started a series on log growth and starts by looking at auto-growth settings:

For the data file, the impact can be illustrated in the following chain of events:

  1. A new 1MB data file is created that contains no information. (ie. a 1MB data file containing 0MB of data)
  2. Data is written to the data until it reaches the file size. (ie. the 1MB data file now contains 1MB of data)
  3. The SQL server suspends normal operations to the database while the data file is grown by 1MB. (ie. the data file is now 2MB and contains 1MB of data) If Instant File Initialization (IFI) is enabled, the file is expanded and database operations resume. If IFI is not enabled, the expanded part of the data file must be zeroed before db operations resume, resulting in an additional delay.
  4.  Once the data file has been grown successfully, the server resumes normal database processing. At this point the server loops back to Step 2.

The server will continue this run-pause-run-pause processing until the data file reaches its Maxsize, or the disk becomes full. If the disk that the data file resides on has other files on it (ie. the C drive, or a disk that is shared by several databases), there will be other disk write events happening between the data file growth events. This may cause the data file expansion segments to be non-contiguous, increasing the file fragmentation and further decreasing the database performance.

This is all to answer the question, “What’s the problem with missing a few log backups?”

Comments closed

Gathering Info On Tables

Raul Gonzales has a script which provides useful information for tables and columns:

Useful information it provides at table level:

  • tableType, to identify HEAP tables
  • row_count, to identify tables with plenty of rows or now rows at all
  • TotalSpaceMB, to identify big tables in size
  • LastUserAccess, to identify tables that are not used
  • TotalUserAccess, to identify tables that are heavily used
  • TableTriggers, to identify tables that have triggers

Useful information it provides at column level:

  • DataType-Size, to identify supersized, incorrect or deprecated data types

  • Identity, to identify identity columns

  • Mandatory-DefaultValue, to identify NULL/NOT NULL columns or with default constraints

  • PrimaryKey, to identify primary key columns

  • Collation, to identify columns that might have different collation from the database

  • ForeignKey-ReferencedColumn, to identify foreign keys and the table.column they reference

Click through for the script.

Comments closed

Installing SQL Server On Ubuntu 18.04

Max Trinidad shows us how to install SQL Server on Ubuntu 18.04, though he leads off with a warning:

This has been an issue for sometime until now. I found the following link that help me install SQL Server on the latest Ubuntu 18.04:

https://askubuntu.com/questions/1032532/how-do-i-install-ms-sql-for-ubuntu-18-04-lts

But, there are few missing steps which can help ease the burden of errors. At the same time, the information is a little out-dated.

But, it works with the following adjustments.

Please Understand!!  This is NOT approved by Microsoft.  Use this method for Test Only!!

I’m waiting somewhat impatiently for Microsoft and Hortonworks to support Ubuntu 18.04.

Comments closed

Partitioning Data For Performance Improvement In R

John Mount shares a few examples of partitioning and parallelizing data operations in R:

In this note we will show how to speed up work in R by partitioning data and process-level parallelization. We will show the technique with three different R packages: rqdatatabledata.table, and dplyr. The methods shown will also work with base-R and other packages.

For each of the above packages we speed up work by using wrapr::execute_parallel which in turn uses wrapr::partition_tables to partition un-related data.frame rows and then distributes them to different processors to be executed. rqdatatable::ex_data_table_parallelconveniently bundles all of these steps together when working with rquery pipelines.

There were some interesting results.  I expected data.table to be fast, but did not expect dplyr to parallelize so well.

Comments closed