ENCRYPTION_SCAN Locks

Suresh Kandoth explains ENCRYPTION_SCAN in a non-TDE scenario:

There are three types of operations that acquire lock with the resource_type of DATABASE and resource_subtype of ENCRYPTION_SCAN:

– Encryption scan performed during TDE enable/disable

– Bulk Allocations that happen as part of bcp/bulk insert/select-into/index operations, etc

– Sort spills that are done as part of sort operators in the query plan

These locks are taken to serialize operations like bulk allocations and sorts with encryption scan.

Read the whole thing.  It looks like this isn’t by itself a significant issue, but it is interesting to see this lock type against a database without TDE.

Field And Record

Kevin Feasel

2016-04-21

Naming

Michael Swart introduces Field & Record magazine:

I’m definitely a descriptivist. Language is always changing and if a word or phrase gets adopted widely enough, it is no longer “wrong” (whatever that means).

So when I hear “Field” and “Record” they’re acceptable to me. But if I’m explaining something, I don’t want to distract from the thing I’m saying. And from that point of view, I try to use “Row” and “Column” because I don’t know anyone who blinks at those terms.

Entity and Attribute or bust.  That’s my philosophy.

Scientific Notation

Andy Mallon digs into one scenario in which you shouldn’t assume how ISNUMERIC behaves:

Someone posted to #sqlhelp on Twitter, asking the following: “Wondered if anyone could enlighten me as to why ISNUMERIC(‘7d8’) returns 1?”

Sure enough, SELECT ISNUMERIC('7d8') returns a 1.

Great answer and explanation, and his advice to use TRY_CONVERT() for 2012 and up is spot-on.

Installing ODBC Drivers

Kevin Feasel

2016-04-21

Linux

Steph Locke shows how to install the SQL Server ODBC drivers in Ubuntu:

Did you know you can now get SQL Server ODBC drivers for Ubuntu? Yes, no, maybe? It’s ok even if you haven’t since it’s pretty new! Anyway, this presents me with an ideal opportunity to standardise my SQL Server ODBC connections across the operating systems I use R on i.e. Windows and Ubuntu. My first trial was to get it working on Travis-CI since that’s where all my training magic happens and if it can’t work on a clean build like Travis, then where can it work?

Now I can create R functionality that can reliably depend on SQL Server without having to fallback to JDBC. A definite woohoo moment!

Thanks to Steph for putting together this script.

Restoration Options

Richie Lee covers the WITH RECOVERY, WITH NORECOVERY, and WITH STANDBY database restoration options:

Similar to NORECOVERY except that the database will accept read only connections. To do this any uncommitted transactions in the backup will be rolled back and stored in a transaction undo file (tuf.) Whilst users are running queries against the database no further restores can continue until all queries are complete (though this is not the case with log shipping.) When the next restore occurs, those uncommitted transactions in the tuf file will be rolled forward and the next log is restored.

Diving into STANDBY mode was quite helpful.  I’ve never needed to restore a database into standby mode, but could see it being useful for bringing back deleted records.

Migrating TFS

Dave Mason has notes on migrating TFS from one server to another:

If you are migrating, but want to keep the databases on SQL 2012 Express, then you can skip this part. I wanted them moved to my SQL 2014 instance. So I did a traditional backup/restore from SQL 2012 Express to SQL 2014. I took new backups of the SQL 2014 databases, and then uninstalled SQL 2012 Express. Then I had to configure TFS to connect to a different SQL instance. Within the web.config file (%ProgramFiles%\Microsoft Team Foundation Server 12.0\Application Tier\Web Services\web.config), I found an application setting named “applicationDatabase”. I made a backup copy of web.config first, then I changed the “applicationDatabase” value. It should be in a recognizable format if you’re familiar with SQL Server connection strings. You can also make this change within IIS. It was there that I noticed a few other settings that contained SQL connection strings. Check out the following in IIS and change settings as needed:

Dave has lots of screen shots to make the process easier to understand, but my main takeaway is that for the most part, migrating TFS  is a huge pain…

Spark + R Webinar

Kevin Feasel

2016-04-20

Hadoop, R, Spark

David Smith points out a recent webinar on combining Microsoft R Server with HDInsight:

As Mario Inchiosa and Roni Burd demonstrate in this recorded webinar, Microsoft R Server can now run within HDInsight Hadoop nodes running on Microsoft Azure. Better yet, the big-data-capable algorithms of ScaleR (pdf) take advantage of the in-memory architecture of Spark, dramatically reducing the time needed to train models on large data. And if your data grows or you just need more power, you can dynamically add nodes to the HDInsight cluster using the Azure portal.

I don’t normally link to webinars (because they tend to violate my “should be viewable in a coffee break” rule of thumb) but I have a soft spot in my heart for these technologies.  If you want to dig into more “mainstream” (off the Microsoft platform) Spark + R fun, check out SparkR.

Stopping SQL Agent Jobs

Chris Shaw shows how to stop SQL Agent jobs programmatically:

SQL Server has a number of system stored procedures that you can use to perform tasks that you might be doing in the user interface, for example… If you want to stop a job you can open SQL Server Management Studio, navigate to the job, right click and stop the job.  Here is where the system supplied stored procedure comes into play.  What if your busy time of the day is at 6 AM, and you want to make sure that the indexing has finished by 5:00 AM so that the system is ready to take on the day.  Do you really want to wake up at 5:00 AM just to right click and stop job, in the chance that it is running?

The answer to Chris’s question is no, I’d much rather not wake up at 5 AM to stop a job if it’s running.  This is why we have computers, to do that sort of thing for us.

Exploring Taxi Data

Kevin Feasel

2016-04-20

Hadoop, R

David Smith ties together two of my favorite technologies in R and Hadoop to analyze New York City taxi data:

Debraj GuhaThakurta, Senior Data Scientist, and Shauheen Zahirazami, Senior Machine Learning Engineer at Microsoft, demonstrate some of these capabilities in their analysis of 170M taxi trips in New York City in 2013 (about 40 Gb). Their goal was to show the use of Microsoft R Server on an HDInsight Hadoop cluster, and to that end, they created machine learning models using distributed R functions to predict (1) whether a tip was given for a taxi ride (binary classification problem), and (2) the amount of tip given (regression problem). The analyses involved building and testing different kinds of predictive models. Debraj and Shauheen uploaded the NYC Taxi data to HDFS on Azure blob storage, provisioned an HDInsight Hadoop Cluster with 2 head nodes (D12), 4 worker nodes (D12), and 1 R-server node (D4), and installed R Studio Server on the HDInsight cluster to conveniently communicate with the cluster and drive the computations from R.

To predict the tip amount, Debraj and Shauheen used linear regression on the training set (75% of the full dataset, about 127M rows). Boosted Decision Trees were used to predict whether or not a tip was paid. On the held-out test data, both models did fairly well. The linear regression model was able to predict the actual tip amount with a correlation of 0.78 (see figure below). Also, the boosted decision tree performed well on the test data with an AUC of 0.98.

If you’re looking for a data set for exploration, this is certainly a good one.

Stats Terminology

Erik Darling fills in gaps on statistics terminology in his unique style:

SELECTIVITY

This tells you how special your snowflakes are. When a column is called “highly selective” that usually means values aren’t repeating all that often, if at all. Think about order numbers, identity or sequence values, GUIDs, etc.

DENSITY

This is sort of the anti-matter to selectivity. Highly dense columns aren’t very unique. They’ll return a lot of rows for a given value. Think about Zip Codes, Gender, Marital Status, etc. If you were to select all the people in 10002, a densely (there’s that word again) populated zip code in Chinatown, you’d probably wait a while, kill the query, and add another filter.

Combine that with Kendra Little’s statistics FAQ for additional learning.

Categories

February 2019
MTWTFSS
« Jan  
 123
45678910
11121314151617
18192021222324
25262728