Press "Enter" to skip to content

Curated SQL Posts

Dynamic Data Masking

Andrea Allred has been checking out Dynamic Data Masking in SQL Server 2016:

This is a great time to talk about the different masking functions and what they do.  The four types in 2016 are Default, Email, Random and Custom String.

Default – For numeric and binary it will show a “0” For a date it will show 01/01/1900 and for strings it will show xxxx’s (more or less depending on the size of the field).

Email – It will expose the first letter of the email address and the suffix at the end of the email (.com, .net, .edu etc.) For example Batgirl@DC.com  would now be bxxx@xxxx.com.

Random – Number randomly generated between a set range. Kind of like the game, “Pick a number between 1 and 10” but for SQL.

Custom String – Lets you get creative with how much you show or cover and what you use to cover (not stuck with just xxxx’s).

It’s not really a security feature, but it could be useful for protecting sensitive data from snoopers glancing over the shoulder.

Comments closed

Missing JRE, Or Maybe C++

Meagan Longoria went through a frustrating scenario:

On a recent project I used Azure Data Factory (ADF) to retrieve data from an on premises SQL Server 2014 instance and land them in Azure Data Lake Store (ADLS) as ORC files. This required the use of the Data Management Gateway (DMG). Setup was quick and easy in our development environment. We installed the DMG for development on a separate server in the client’s network, where we also installed SQL Server Management Studio (SSMS) for query development and data validation. We set up resource groups in Azure for development and production, and made sure the settings for development and production were the same.  Then we set up a separate server for the production DMG.

Deployment and execution went well in the dev environment. Testing was completed, so we deployed to prod. Deployment went fine, but the pipelines failed execution and returned the following error on the output data sets.

Weird solution, but I’m going to guess that it makes perfect sense if you are able to look at the code.

Comments closed

Availability Group Tips

Derik Hammer has some tips to help you learn about Availability Groups:

3. Use MultiSubnetFailover=true

The Availability Group Listener is technically an optional component of an Availability Group. However, in my opinion it is necessary. By default, your listener will register all IP addresses as DNS A records and it will have multiple IP addresses when your cluster crosses subnets, most commonly when you have disaster recovery between data centers. Using the MultiSubnetFailover=true parameter in your client connection strings will attempt to connect to all IP addresses and completes the connection on the first thread to succeed. The listener ensures that only one IP address is online at a time, therefore you always connect to correct node.

This feature effectively bypasses the limitations of your DNS cache. Traditionally, you would cache the IP address for a DNS record. When you needed the client to connect to a different IP address using the same virtual network name, you would have to wait for the time to live setting to expire. This would delay your recovery time. With the MultiSubnetFailover setting, you can still cache your IP addresses but without the delay that they could induce.

There’s some good reading here.

Comments closed

SQL Data Warehouse Distribution Keys

Simon Whiteley explains the different distribution key options available in Azure SQL Data Warehouse and SQL Server APS:

Each record that is inserted goes onto the next available distribution. This guarantees that you will have a smooth, even distribution of data, but it means you have no way of telling which data is on which distribution. This isn’t always a problem!

If I wanted to perform a count of records, grouped by a particular field, I can perform this on a round-robin table. Each distribution will run the query in parallel and return it’s grouped results. The results can be simply added together as a second part of the query, and adding together 60 smaller datasets shouldn’t be a large overhead. For this kind of single-table aggregation, round-robin distribution is perfectly adequate!

However, the issues arise when we have multiple tables in our query. In order to join two tables. Let’s take a very simple join between a fact table and a dimension. I’ve shown 6 distributions for simplicity, but this would be happening across all 60.

Figuring out which distribution key to use can make a huge difference in performance.

Comments closed

Full-Text Indexing Issues

Cody Konior looks at a couple errors in full-text indexing:

The first error message holds the key. It’s not enough to just enable the service, you need to run that command also (which will also start the service for you as well as doing something internally) or restart SQL Server after the service is enabled.

Here’s where we go to the next level. You now know that full-text indexing can have a problem on startup that isn’t visible externally but surely this could never happen to you, right? But I guarantee you that if you have 500 servers many of them currently have this issue. When a developer or application support person encounters it that’s going to take half an hour or an hour of investigation, then logging incidents, then passing it on to you, for another half hour or hour of investigation, and then more time, and more time, and then doing a root cause analysis, and then sharing it with your team, and then…

Wouldn’t it be so much easier to just write a validation test for this now so that if this ever occurs you can just proactively go fix it? Well you can!

It’s a good read.

Comments closed

Multiple Flat Files SSIS Connector

John Morehouse shows that there are more SSIS connection managers than meets the eye:

While recently examining an existing SSIS package  before implementing some needed changes I noticed the connection manager for multiple flat files.

Normally, when processing multiple files, I use a sequence container and loop through them.  It’s usually a cumbersome process to setup and you must configure variables.  I was not aware the multiple flat file type of connection manager existed. No wonder because it’s notwithin the normal list.  This handy thing could only be found when I looked under “New Connection”.

Click through for more details.

1 Comment

Perfmon Counters For Monitoring AGs

Tracy Boggiano has a set of Perfmon counters she uses to monitor Availability Groups:

Monitoring Availability Groups can be tricky.  The DMVs don’t update the log_send_rate and redo_rate all the time especially if nothing is happening so if you try to use those for calculations when monitoring you could false results worse yet pages in the middle of the night.  In order to calculate the log_send_rate and redo_rate you need to capture the perfmon counters ‘Log Bytes Flushed/sec’, ‘Redone Bytes/sec’, and ‘Log Bytes Received/sec’ into temporary tables WAITFOR 1 second then capture them again.  Below is query that captures this along side what you see in the DMV for when Microsoft might fix the issue.

Click through for a script as well as a Powershell cmdlet wrapper for running against a set of hosts in your Central Management Server.

Comments closed