Press "Enter" to skip to content

Month: December 2015

Master Data Services Workflow

Reza Rad shows us how to introduce workflow concepts into MDS 2016:

This feature introduced in SQL Server 2016 CTP 3.0. You can save your pending changes in a set called Change Set. This Set then can be modified or deleted or applied. You can also add the functionality of Requirement for Approval for the change set, which then requires an entity administrator to approve or reject the change set.  This is a great feature for MDS and would help on the work flow process of making changes in master data. Let’s now have a closer look at how this feature works in action.

I don’t use Master Data Services, but I could see this being very useful at a large company with a lot of people modifying master data.

Comments closed

Check Bulk Insert Errors

Tom Staab points out bulk insert allows up to 10 errors by default:

The issue was that the last row in a text file contained the row count, so he needed to bulk import all but that last row.

My solution was to set maxerrors to 1 so the import would ignore the last row due to the error. Any other row with an error would still fail the import. This reminded me of one of my least favorite defaults in SQL Server, so I decided to write about it here as well. A lot of people don’t realize this, but by default a bulk insert will only fail after 10 errors. Why not 0? I’ve never understood that. If you don’t change the default and then bulk insert 20 rows of data from a file, it will only fail if over half of the rows cause an error.

Keep track of error incidence and what that means for your data.  The default of 10 errors does seem rather strange.

Comments closed

CHECKSUM And BINARY_CHECKSUM Collisions

Jovan Popovic shows us how fragile CHECKSUM and BINARY_CHECKSUM are:

You can find patterns of inputs that return same values. This is similar to modulo operator that returns the same remainder if you find the right pattern of numbers.

As an example, if you run following queries you will  find that these functions return 0 for all strings containing only letter ‘a’ that have lengths 32, 64, 96, n * 32 characters:

This isn’t the only place where you can get data collisions with these two functions; because they return ints, they’re limited to 8 bits.  Jeff Preshing has a great chart on hash collisions, showing that if you have 77,163 separate values, there’s a 50% chance two will collide.

Use HASHBYTES instead.  The likelihood of MD5 collisions is pretty low (needing 609 million records before a 1% chance of collision), SHA1 even lower (1.71 x 10^23 records before a 1% chance of collision), and SHA2 256 or 512 ridiculously low.

Comments closed

Making Charts Work

Meagan Longoria shows us a non-working chart and how to make it work:

First, I added some explanatory text for context to help communicate my message. If you don’t know about the Citizen Work Sessions, this chart might be confusing. The note about the shift in spending to public safety came from the FY 2016 – 2021 Citywide Business Plan. I thought it was important context to note as city council members and government officials have hard decisions to make when it comes to the budget.

This is a fantastic post for people without much background in visualization (like me) who are winging it (like me) and probably creating ugly charts (…like me).  And it’s all in Excel, meaning you don’t need to learn new tools to make charts convey useful information.

Comments closed

SSIS Catalog Reports

Andy Leonard announces SSIS Catalog Reports:

I’ve released a (very early) version of SSIS Catalog Reports – v0.1 – on GitHub. You can access the project here.

My main goal is to provide some visibility into the SSIS Catalog without the need for SQL Server Management Studio.

These are Reporting Services reports, so they should work anywhere SSRS is hosted.  This looks like a helpful project for companies with ops people who don’t want or need SQL Server Management Studio but do need information on the SSIS catalog.

Comments closed

Beware Certain Extended Events

Jonathan Kehayias warns us about using query_post_execution_showplan in production:

During a recent client engagement to look at the performance problems of a production SQL Server I ran into something I hoped to never see in real life. For a long time I’ve been a proponent of Extended Events in SQL Server, and I’ve really looked forward to seeing them used more and more inside of SQL Server for diagnostics data collection, especially by third party software vendors. I’ve done a lot of performance tests comparing Extended Events to SQL Trace and Extended Events generally has lower performance impact to SQL Server performance. However, I have also found and demonstrated how specific events like collecting the actual execution plan with Extended Events can severely impact a server’s performance. Any time Erin or I talk about Extended Events, whether it is at a conference or in our IEPT02 – Performance Tuning class, one of the key things we both repeat to people is that if you see an completely unexplainable drop in server performance and throughput, check for Extended Events sessions and Traces running on the server and look at what events are being collected. For Extended Events we always demo the impact of the query_post_execution_showplan event live to demonstrate why this event should not be used in production environments ever. Yesterday I was incredibly surprised to not only find the event in an Event Session on the affected server, but also to find that it came from Idera Diagnostic Manager’s Query Monitor feature.

If you’re using Diagnostic Manager version 9, check to make sure this event is not turned on, as it’s a performance killer.

Comments closed

SQL Injection Blacklists Are Bad

Eli Leiba created a function to try to generate a blacklist against SQL injection:

The suggested solution presented here involves creating a user defined T-SQL scalar function that checks the input string for any suspicious key words that might indicate the SQL injection intents.

The function checks the input string against a set of pre-defined keywords that are known to be used in SQL injection cases.

I get the intent here, but blacklists don’t work.

The first line of defense that many developers come up with is a blacklist: we know that keywords like “select,” “insert,” and “drop” are necessary to perform a SQL injection attack, so if we just ban those keywords, everything should be fine, right? Alas, life is not so simple; this leads to a number of problems with blacklists in general, as well as in this particular case.

The second-biggest problem with blacklists is that they could block people from performing legitimate requests. For example, a user at a paint company’s website may wish to search for “drop cloths,” so a naïve blacklist, outlawing use of the word “drop” in a search would lead to false positives.

The biggest problem is that, unless extreme care is taken, the blacklist will still let through malicious code. One of the big failures with SQL injection blacklists is that there are a number of different white-space characters: hex 0x20 (space), 0x09 (tab), 0x0A, 0x0B, 0x0C, 0x0D, and 0xA0 are all legitimate white-space as far as a SQL Server query is concerned. If the blacklist is looking for “drop table,” it is looking for the word “drop,” followed by a 0x20 character, followed by the word “table.” If we replace the 0x20 with a 0x09, it sails right by the blacklist.

With this particular blacklist, you have a pretty high probability of false positives:  the list includes dashes, “tran,” “update,” “while,” “grant,” and even “go.”  These are tokens used in SQL injection attempts, but they’re also very common words or word segments in English.  This means that if you’re trying to blacklist a publicly-accessible search box which reads common English phrases, the incidence of false positive is going to be high enough that the blacklist changes.  But even if it doesn’t, a dedicated attacker can still get around your blacklist; as the old saying goes, the attacker only needs to be right once.

Comments closed

Improve Backup Performance

Brent Ozar on backup tuning:

QUESTION 1: HOW FAST CAN SQL SERVER READ DATA FROM DISK?

You can measure this by doing a full backup to disk, but use the NUL: option:

  1. BACKUP DATABASE MyDb TO DISK=‘NUL:’

This does a backup, but doesn’t actually write it anywhere, so it’s a measure of how fast the data files can be read from storage. If you run this, know that you’re taking a real backup: this backup can break your differential backups.

Vital follow-up:  Sean McCown’s talk on performance tuning for backups.  SQL Server backups have a few knobs you can turn, like buffercount, maxtransfersize, and number of files.

Comments closed

Schedule Powershell Tasks

Steve Jones show us how to schedule Powershell tasks:

One of the things I needed to do with my SQL Saturday download process was run this automatically. I don’t want to remember to do this, so I decided to set this up on a schedule.

I ran a quick search and it seemed to be a simple process. Essentially I run the Task Scheduler application and then call my script as a parameter to the PowerShell exe.

Windows Task Scheduler isn’t fantastic (and they seem to mess with it in every version), but it does let you run Powershell scripts easily.  Bonus points for those of you using at for scheduling.

Comments closed