Press "Enter" to skip to content

Author: Kevin Feasel

Restoration Options

Richie Lee covers the WITH RECOVERY, WITH NORECOVERY, and WITH STANDBY database restoration options:

Similar to NORECOVERY except that the database will accept read only connections. To do this any uncommitted transactions in the backup will be rolled back and stored in a transaction undo file (tuf.) Whilst users are running queries against the database no further restores can continue until all queries are complete (though this is not the case with log shipping.) When the next restore occurs, those uncommitted transactions in the tuf file will be rolled forward and the next log is restored.

Diving into STANDBY mode was quite helpful.  I’ve never needed to restore a database into standby mode, but could see it being useful for bringing back deleted records.

Comments closed

Migrating TFS

Dave Mason has notes on migrating TFS from one server to another:

If you are migrating, but want to keep the databases on SQL 2012 Express, then you can skip this part. I wanted them moved to my SQL 2014 instance. So I did a traditional backup/restore from SQL 2012 Express to SQL 2014. I took new backups of the SQL 2014 databases, and then uninstalled SQL 2012 Express. Then I had to configure TFS to connect to a different SQL instance. Within the web.config file (%ProgramFiles%\Microsoft Team Foundation Server 12.0\Application Tier\Web Services\web.config), I found an application setting named “applicationDatabase”. I made a backup copy of web.config first, then I changed the “applicationDatabase” value. It should be in a recognizable format if you’re familiar with SQL Server connection strings. You can also make this change within IIS. It was there that I noticed a few other settings that contained SQL connection strings. Check out the following in IIS and change settings as needed:

Dave has lots of screen shots to make the process easier to understand, but my main takeaway is that for the most part, migrating TFS  is a huge pain…

Comments closed

Spark + R Webinar

David Smith points out a recent webinar on combining Microsoft R Server with HDInsight:

As Mario Inchiosa and Roni Burd demonstrate in this recorded webinar, Microsoft R Server can now run within HDInsight Hadoop nodes running on Microsoft Azure. Better yet, the big-data-capable algorithms of ScaleR (pdf) take advantage of the in-memory architecture of Spark, dramatically reducing the time needed to train models on large data. And if your data grows or you just need more power, you can dynamically add nodes to the HDInsight cluster using the Azure portal.

I don’t normally link to webinars (because they tend to violate my “should be viewable in a coffee break” rule of thumb) but I have a soft spot in my heart for these technologies.  If you want to dig into more “mainstream” (off the Microsoft platform) Spark + R fun, check out SparkR.

Comments closed

Stopping SQL Agent Jobs

Chris Shaw shows how to stop SQL Agent jobs programmatically:

SQL Server has a number of system stored procedures that you can use to perform tasks that you might be doing in the user interface, for example… If you want to stop a job you can open SQL Server Management Studio, navigate to the job, right click and stop the job.  Here is where the system supplied stored procedure comes into play.  What if your busy time of the day is at 6 AM, and you want to make sure that the indexing has finished by 5:00 AM so that the system is ready to take on the day.  Do you really want to wake up at 5:00 AM just to right click and stop job, in the chance that it is running?

The answer to Chris’s question is no, I’d much rather not wake up at 5 AM to stop a job if it’s running.  This is why we have computers, to do that sort of thing for us.

Comments closed

Exploring Taxi Data

David Smith ties together two of my favorite technologies in R and Hadoop to analyze New York City taxi data:

Debraj GuhaThakurta, Senior Data Scientist, and Shauheen Zahirazami, Senior Machine Learning Engineer at Microsoft, demonstrate some of these capabilities in their analysis of 170M taxi trips in New York City in 2013 (about 40 Gb). Their goal was to show the use of Microsoft R Server on an HDInsight Hadoop cluster, and to that end, they created machine learning models using distributed R functions to predict (1) whether a tip was given for a taxi ride (binary classification problem), and (2) the amount of tip given (regression problem). The analyses involved building and testing different kinds of predictive models. Debraj and Shauheen uploaded the NYC Taxi data to HDFS on Azure blob storage, provisioned an HDInsight Hadoop Cluster with 2 head nodes (D12), 4 worker nodes (D12), and 1 R-server node (D4), and installed R Studio Server on the HDInsight cluster to conveniently communicate with the cluster and drive the computations from R.

To predict the tip amount, Debraj and Shauheen used linear regression on the training set (75% of the full dataset, about 127M rows). Boosted Decision Trees were used to predict whether or not a tip was paid. On the held-out test data, both models did fairly well. The linear regression model was able to predict the actual tip amount with a correlation of 0.78 (see figure below). Also, the boosted decision tree performed well on the test data with an AUC of 0.98.

If you’re looking for a data set for exploration, this is certainly a good one.

Comments closed

Stats Terminology

Erik Darling fills in gaps on statistics terminology in his unique style:

SELECTIVITY

This tells you how special your snowflakes are. When a column is called “highly selective” that usually means values aren’t repeating all that often, if at all. Think about order numbers, identity or sequence values, GUIDs, etc.

DENSITY

This is sort of the anti-matter to selectivity. Highly dense columns aren’t very unique. They’ll return a lot of rows for a given value. Think about Zip Codes, Gender, Marital Status, etc. If you were to select all the people in 10002, a densely (there’s that word again) populated zip code in Chinatown, you’d probably wait a while, kill the query, and add another filter.

Combine that with Kendra Little’s statistics FAQ for additional learning.

Comments closed

Error Handling In Service Broker

Colleen Morrow shows how to handle poison messages and other errors in Service Broker:

This type of situation, a message that can never be processed successfully, is known as a poison message.  The name kind of makes it sound like there’s a problem with the message itself.  And there might be.  Perhaps the message format is wrong for what the receiving code was expecting.  But maybe the problem is with the receiving code itself.  Regardless of what causes the poison message, it has to be dealt with.

SQL Server has a built-in mechanism for handling poison messages.  If a transaction that receives a message rolls back 5 times, SQL Server will disable the queue.  So that means that all processing that depends on that queue will cease.  Nice, huh?  Because of this, it behooves you to make sure you include proper error handling in your message processing code.  And how exactly you handle errors will depend on several factors:

Handling errors safely is a huge part of asynchronous programming.

Comments closed

Statistics FAQ

Kendra Little has a great FAQ on statistics, from the standpoint of developers as well as administrators:

I’ve been asked a lot of questions about updating statistics in SQL Server over the years. And I’ve asked a lot of questions myself! Here’s a rundown of all the practical questions that I tend to get about how to maintain these in SQL Server.

I don’t dig into the internals of statistics and optimization in this post. If you’re interested in that, head on over and read the fahhhbulous white paper, Statistics Used by the Query Optimizer in SQL Server 2008. Then wait a couple days and chase it with it’s charming cousin, Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator.

I’m also not talking about statistics for memory optimized tables in this article. If you’re interested in those, head over here.

This is a great distillation of important and oft-misunderstood content.

Comments closed

Configuring Polybase

David Benoit walks through how to configure a Polybase cluster:

When you get to the PolyBase configuration screen, you have the option to run this as a standalone instance, or to use the SQL Server as part of a scale-out group. Personally, I don’t think that anyone should ever choose the standalone instance option. You can always run a SQL Server configured for a scale-out group as a standalone instance, BUT you can’t change (at least not today) a SQL Server configured as a standalone PolyBase instance to run as part of a PolyBase cluster once you have completed the install.

This note sounds like the argument for clustering all SQL Server instances.

Comments closed