Author: Kevin Feasel

With this option, if no rows have changed, the statistic will not be updated. If one or more rows have changed, the statistic will be updated.

Since the release of SP1 for 2012, this has been my only challenge with Ola’s scripts. In SQL Server 2008R2 SP2 and SQL Server 2012 SP1 they introduced the sys.dm_db_stats_properties DMV, which tracks modifications for each statistic. I have written custom scripts to use this information to determine if stats should be updated, which I’ve talked about here. Jonathan has also modified Ola’s script for a few of our customers to look at sys.dm_db_stats_properties to determine if enough data had changed to update stats, and a long time ago we had emailed Ola to ask if he could include an option to set a threshold. Good news, that option now exists!

Keeping statistics up to date is a nice way of quietly improving performance in a system.

Comments closed

Thoughts On Spending Money In The Cloud

Published 2018-06-25 by Kevin Feasel

Andy Leonard has a few thoughts on spending money once you’ve migrated to the cloud:

In a previous consulting life, a customer contacted us and asked for an evaluation of their architecture. They were attempting to scale the business and encountering… obstacles. A team looked over their software and database designs and recommended a rewrite of their custom code. We supplied a proposal (including an expensive estimate) to deliver the re-architecture, redesign, and rewriting of their code.

They felt the price was too high.

We understood, so we countered with a proposal to coach their team to deliver the same result. This would cost less, the outcome would be the same, and their team would grok the solution (because their team would build the solution). The catch? It would take longer.

They felt this solution would take too long.

Andy has some great thoughts on the subject. One area where I’d push further is to say that the best way to take advantage of cloud services is not the best way to take advantage of on-prem services, so even if you have a well-architected on-prem solution, it might not be ideal for running in AWS or Azure.

Comments closed

Solving A Problem In TensorFlow Using SoftMax

Published 2018-06-22 by Kevin Feasel

Kiran Gutha gives us a fairly simple solution to the MNIST digit data set using the SoftMax algorithm:

In this tutorial, we will train a machine learning model for predicting numbers in pictures. Our goal is not to design a world-class complex model (although we will give you the source code to implement first-rate predictive models later). Rather, this tutorial is to introduce how to use TensorFlow. So, we start here with a very simple mathematical model called Softmax Regression.

The implementation code for this tutorial is short, and the really interesting content is only contained in three lines of code. However, it is very important to understand the design ideas contained in these codes: the basic concepts of TensorFlow workflow and machine learning. Therefore, this tutorial will explain in detail the implementation of these codes.

This is about as easy as it gets with neural networks, but easy doesn’t mean wrong.

Comments closed

Executing SSIS From Azure Data Factory

Published 2018-06-22 by Kevin Feasel

Andy Leonard shows us how to execute an SSIS package from Azure Data Factory:

The good people who work on Azure Data Factory recently added an Execute SSIS Package activity. It’s pretty cool. Let’s tinker with it some, shall we?

First, you will need to create an Azure Data Factory SSIS Integration Runtime. If you don’t know how, that’s ok – I’ve written a post titled Lift and Shift SSIS Part 0: Creating the ADF Integration Runtime that describes one way to set up ADFIR.

Read on for an example.

Comments closed

Automatic Seeding In Availability Groups

Published 2018-06-22 by Kevin Feasel

Frank Gill explains one method of building out data in an Availability Group, automatic seeding:

Microsoft released Availability Groups (AG) as a feature in SQL Server 2012. Prior to SQL Server 2016, there were two methods of adding a database to a new AG replica.

You could provide the Add Database to Availability Group wizard a file share accessible by the primary and secondary replicas. SQL Server would run FULL and LOG backups of each database to the share and use them to restore the database(s) to each replica.

You could manually run a FULL and LOG backup of each database, copy the backup files to each replica, and restore the databases WITH NORECOVERY.

With SQL Server 2016. Microsoft has provided a third option, Automatic Seeding. With Automatic Seeding, you specify the databases and the replicas and SQL Server will begin transferring data to each replica. The duration of the seeding process depends on the size of the database and the network bandwidth available between primary and secondary replica.

Automatic seeding isn’t perfect, but it’s quite useful.

Comments closed

Scripting Maintenance Mode Tasks

Published 2018-06-22 by Kevin Feasel

Jamie Wick shares some hard-earned knowledge regarding scripting out maintenance tasks using Powershell:

Given that we have several hundred servers (and growing), this process is taking an increasing amount of time each month. Over the years we’ve implemented various automated patching systems (WSUS, IBM BigFix, etc.) and they’ve worked reasonably well for managing the Download & Install step. The pain point lately has become the first two steps (snapshots and maintenance mode). Both processes are simple to complete using the VCenter web-based user interface and SCOM console. The problem is the volume of button clicks it takes to complete the process for ALL of the servers. Using the standard (web) user interfaces, over an hour of the monthly maintenance window can be lost to just getting the snapshots and maintenance mode tasks completed. Extrapolate that out over a year and we’re looking at over 1.5 DAYS of work-time lost to getting the servers ready to START applying updates. That’s not a statistic we want to publish to senior management. So, how to fix (or minimize) the problem? The answer to which is: Script It.

Let’s take a look at how to use PowerShell to automate the snapshot and maintenance mode tasks.

Read on for sample scripts.

Comments closed

Resetting SSMS Window Layout

Published 2018-06-22 by Kevin Feasel

Wayne Sheffield clues us in on a nice Management Studio feature:

SSMS is a wonderful tool. You can drag Windows around, grouped with others, split, docked, undocked, hidden… it seems endless what you can do with them. You can even change what columns you see. Invariably, with all of this customization, things go wonky. I’ve even seen windows opened up on invisible monitors. Sometimes you can’t find the windows that you need. Conversely, windows that you don’t need are open all over the place. Sometimes, you just need to reset everything and start over. Short of reinstalling SSMS, how do you do this? You just reset the window layout.

It’s one of those options which stares you in the face but you can easily miss.

Comments closed

Using FlashText Instead Of RegEx

Published 2018-06-21 by Kevin Feasel

Leona Zhang compares the FlashText Python library to using regular expressions:

If you have done any text/data analysis, you might already be familiar with Regular Expressions (RegEx). RegEx evolved as a necessary tool for text editing. If you are still using RegEx to deal with text processing, then you may have some problems to deal with. Why? When it comes to large-sized texts, the low efficiency of RegEx can make data analysis unacceptably slow.

In this article, we will discuss how you can use FlashText, a Python library that is 100 times faster than RegEx to perform data analysis.

Learn more on the GitHub repo. I haven’t used this before but I could see it being handy.

Comments closed

Neural Networks Are Polynomial Regression

Published 2018-06-21 by Kevin Feasel

Norman Matloff announces a new paper:

A summary of the paper is:

We present a very simple, informal mathematical argument that neural networks (NNs) are in essence polynomial regression (PR). We refer to this as NNAEPR.
NNAEPR implies that we can use our knowledge of the “old-fashioned” method of PR to gain insight into how NNs — widely viewed somewhat warily as a “black box” — work inside.
One such insight is that the outputs of an NN layer will be prone to multicollinearity, with the problem becoming worse with each successive layer. This in turn may explain why convergence issues often develop in NNs. It also suggests that NN users tend to use overly large networks.
NNAEPR suggests that one may abandon using NNs altogether, and simply use PR instead.
We investigated this on a wide variety of datasets, and found that in every case PR did as well as, and often better than, NNs.
We have developed a feature-rich R package, polyreg, to facilitate using PR in multivariate settings.

The paper and presentation slides are ungated, so check it out. H/T R-bloggers

Comments closed

Strings And Identifiers

Published 2018-06-21 by Kevin Feasel

Kenneth Fisher explains the difference between a string and an identifier:

A common mistake, and one I make frequently myself is to use a string in place of an identifier, or vise-versa. So to start, let’s have some definitions, shall we?

String

a linear sequence of characters, words, or other data.

Identifier

a sequence of characters used to identify or refer to a program or an element, such as a variable or a set of data, within it.

And because I always find examples fairly useful.

Click through for the example as well as additional explanation.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31