Press "Enter" to skip to content

Author: Kevin Feasel

Learning R

Grant Fritchey is learning R:

Awesome. Fixed that algorithm problem, right?

Wrong.

That’s because algorithms are not the problem… the only problem. The real problem is data preparation. A lot of the examples you’ll read online are very straight forward with nice neat data sets. That’s because they were carefully groomed and prepared. Here I am looking at the wooly wild real data and I’m utterly lost in how to properly prepare this so that it’s appropriately set up as a continuous distribution(or a distribution at all). WOOF! The reason this is so hard is because I actually don’t understand the data fundamentals of the problem I’m trying to solve in exactly the way needed to solve the problem. More cogitation is necessary.

Just because you can write R code doesn’t mean you are a data scientist.  Grant has the right mindset, but this post is fair warning that R’s complexity isn’t so much in its being a DSL, but rather in the domain itself.

Comments closed

Type 2 SCDs With Biml

Meagan Longoria has a great post on Type 2 Slowly Changing Dimensions:

The most common mistake I see in SCD 2 packages, whether using the built-in transformation or creating your own data flow, is that people use OLEDB commands to perform updates one row at a time rather than writing updates to a staging table and performing a set-based update on all rows.  If your dimension is small, the performance from row by row updates may be acceptable, but the overhead associated with using a staging table and performing set-based update will probably be negligible. So why not keep a consistent pattern for all type 2 dimensions and require no changes if the dimension grows?

Spot on.

Comments closed

SQL Server And Mercurial

Joshua Feierman talks Mercurial:

While the choice of version control system is a very subjective and personal one, there are a number of reasons why I think Mercurial is an excellent choice, especially for someone just venturing into the world of version control. Here are a number of those reasons.

I prefer Mercurial to Git because I think the Mercurial commands are a bit easier to understand (although the proliferation of non-terrible Git GUIs has mitigated this somewhat).  Nevertheless, if you love Git, use Git; if you love Mercurial, use Mercurial; if you love SVN, use SVN; if you love Visual Source Safe…use Mercurial or Git…

Comments closed

SSIS And Always Encrypted

Jakub Szymaszek links to two articles on using SSIS with an Always Encrypted database.

Using Always Encrypted:

The SQL Server 2016 Always-Encrypted feature is only supported by the ADO.NET  provider currently. It is not supported by the OleDB provider and therefore any OleDB-provider-related transformation tasks such as Fuzzy Lookup will not support Always Encrypted feature.

In the “Execute SQL Task”, parameter binding for some encrypted SQL types is not supported, because of data type conversion limitations in Always Encrypted. The
unsupported types are money, smallmoney, smalldatetime, UniqueIndentifier, DatatimeOffset, time and date.

Lookup Transformations

Add an ADO NET source connect to the table “Customers” (please ref to here get more detail about how to use ADO NET Source to connect encrypted table).

Then create a cache connection manager “Customer Cache” and set the column information as below:

Based on article #2, it looks like you can’t simply use a Lookup transformation on an Always Encrypted column; you need to pull the results into cache first and then query the cache.  That’s not exactly difficult, but if you have an encrypted column, make sure you’re not writing those columns out in plaintext because of the cache option you selected.

Comments closed

Optimizing Update Queries

Paul White has an article.  Read it:

The point is that there is an awful lot more going on inside SQL Server than is exposed in execution plans. Hopefully some of the details discussed in this rather long article will be interesting or even useful to some people.

It is good to have expectations of performance, and to know what plan shapes and properties are generally beneficial. That sort of experience and knowledge will serve you well for 99% or more of the queries you will ever be asked to tune. Sometimes, though, it is good try something a little weird or unusual just to see what happens, and to validate those expectations.

Optimizing update queries seems trivial at first, but as Paul shows, we have a few more tools at our disposal than is apparent at first glance.

Comments closed

Identifying Blocked Processes

Priyanka Chouhan talks about identifying and handling blocked processes:

In order to maintain data integrity within the database, locks are used on resources like tables, rows, pages etc. by any process that wishes to use them. This is done to ensure multiple process don’t alter the same resources at one time leading to data inconsistency. When a process wishes to lock a resource, it sends a request to the server and the server grants it. However, when a process requests lock on a resource that has already been locked by another process, the request is denied. The requesting process is thus placed on “hold” until the resource it is requesting for isn’t released. In this situation, the requesting process is called a blocked process, and such a process could put a halt on other subsequent processes and activities scheduled on the server.

Thus identifying a blocked process and releasing it requires a DBA team to check the application database blocking. Additionally, here are some other techniques that may be used to find out which processes are creating a block on the server:

My favorite method, not mentioned, is Adam Machanic’s sp_whoisactive.

Comments closed

Get Backup History

David Alcook grabs backup history:

It’s a very common task that we have to query backup timings and other bits of info from msdb.

Now its pretty straight forward to select this data for a particular database or use the MAX function for example to return the last backup but how do we get the last 5 FULL backups per database?

MSDB has a lot of useful backup information, so if you’ve never dug into it, I recommend taking your time and seeing what is available.

Comments closed

Git Support In SQL Source Control

Mickey Stuewe looks at Git support in Red Gate’s SQL Source Control tool:

At the beginning of October 2015, everything changed. That was when Redgate announced SQL Source Control users could now push to and pull from remote Git repositories.

And not just from within SQL Source Control – from within SQL Server Management Studio (SSMS).

Your developers are probably using Git, so you should too.  If your developers are using Mercurial, I applaud them.  If they’re using SVN, CVS, TFVC, Vault, or anything else (Visual SourceSafe?), flip a coin and decide to use Git/Mercurial or the generally accepted tool…

Comments closed

WMF 5 RTM

Windows Management Framework 5.0 is available, says Max Trinidad:

Finally! The Windows Management Framework version 5.0 RTM is available for download for all down level Operating systems: Windows 7, Windows 8.1, Windows Servers 2008 R2, Windows Server 2012, and Windows Servers 2012 R2.

There are several interesting features here.  My favorite one is “Just Enough Administration (JEA)”; after all, who wants too much or too little administration?

Comments closed