Press "Enter" to skip to content

Author: Kevin Feasel

Mixing VARCHAR And NVARCHAR

Solomon Rutzky walks through some of the nuance of mixing VARCHAR and NVARCHAR data types with respect to your collation:

There are two types of Collations in SQL Server: SQL Server Collations and Windows Collations:

  • SQL Server Collations (those starting with SQL_) are older Collations that were the only ones available prior to SQL Server 2000. These Collations use simplistic sort orders, and do not handle the great variety of linguistic rules defined by Unicode. In fact, they do not have any Unicode rules defined at all, so N[VAR]CHAR data in these Collations will actually use OS-level Collation rules.

  • Windows Collations (those not starting with SQL_) were introduced in SQL Server 2000. These Collations not only have the Unicode rules defined, but they also apply those same linguistic rules to [VAR]CHAR data. While this does come at a slight cost to performance, it also allows for consistency of behavior. And it is this consistency that helps out greatly when there is a mismatch of datatypes (as we will see in a moment).

The simple advice to avoid mixing NVARCHAR and VARCHAR data types is still sound, but do read the whole thing.

Comments closed

Reading Drive Files Recursively From T-SQL

Slava Murygin shows how to use a common table expression to read a recursive drive listing:

In this blog I’ll use undocumented in SQL Server extended stored procedure “xp_dirtree“.
That procedure is well described in Patrick Keisler’s blog.

Briefly, “xp_dirtree” extended procedure returns tree of sub-directories from given folder and has three parameters:
1. Starting or root folder
2. Depth level – determines how deep SQL Server will dig into a sub-directory structure. If it set to zero or omitted whole tree will be returned.
3. If omitted or set to zero will return only folders .If set to “1” will return files as well.

If you want to be really mean to somebody, create a shortcut to the parent directory within a directory.

Comments closed

Managing Drives On Linux

David Klee walks through some basics of Linux administration with respect to drives and mountpoints:

We see that all four of the drives show up in the list. Because of the nature of how I presented the disks to the VM, the bootable drives (sdd) show up at the end of the chain instead of the beginning, but that’s OK. It doesn’t change how the disks are bootable.

We are going to use the Logical Volume Manager (LVM) to manage the disks for us instead of using regular partitions. On Windows-based servers, Windows has the ability to expand partitions without incurring any downtime, and we want to have the same flexibility from a Linux standpoint. With elevated priviledges (for the rest of the commands), let’s scan the drives to look at what it sees.

Read the whole thing.

Comments closed

Environmental Considerations With SQL Server

Jeff Mlakar wraps up his environmental diagnostics series:

Let’s start by defining terms. By environmental I mean factors that affect database performance outside of the code. The remediation of issues can be accomplished without altering SQL, no code change request, no release management necessary. Poor performance can often be addressed by doing a quick review of the overall system and ensuring there are not horrible defaults in use.

This is the low hanging fruit we can get before deep diving into the code

Before you start banging your head against the hood of the car take a walk around and look at the big picture first. It is a common thing to see junior DBAs go right for the most complex area they can identify and ignore the obvious issues which are plain to the eyes of a veteran DBA.

Read on for some resources to help you learn more.

Comments closed

R For Apache Impala

Ian Cook describes implyr, an R interface for Apache Impala:

dplyr provides a grammar of data manipulation, consisting of set of verbs (including mutate()select()filter()summarise(), and arrange()) that can be used together to perform common data manipulation tasks. The implyr package helps dplyr translate this grammar into Impala-compatible SQL commands. This gives R users access to Impala’s scale and speed on large distributed datasets while using the same familiar dplyr syntax that they are accustomed to using on local data frames and other data sources. R users can also choose to directly write SQL commands and execute them on Impala using implyr.

implyr builds upon recent work from RStudio and other contributors, including major updates to the packages dplyr and DBI, and new packages dbplyr and odbc. implyr together with these packages enables data scientists and data engineers to more easily interact with Impala through self-service data science tools like Cloudera Data Science Workbench.

It looks like this project is off to a good start already.

Comments closed

Data Cleanup Using Drools

Rathnadevi Manivannan gives an example of using Drools to create rule-based data cleansing processes:

The oil well drilling datasets contain raw information about wells and their formation details, drill types, and production dates. The Arkansas dataset has 6,040 records and the Oklahoma dataset has 2,559 records.

The raw data contains invalid values such as null, invalid date, invalid drill type, and duplicate well and invalid well information with modified dates.

This raw data from the source is transformed to MS SQL for further filtering and normalization. To download raw data, look at the Reference section.

This is an example of applying several constraints and rules to a single data set.  Each individual rule would probably be easier to do in T-SQL, but the whole bunch becomes easier to understand with a procedural language.

Comments closed

Whitelisting SQL CLR Assemblies

Niels Berglund walks through the process of whitelisting a CLR assembly in SQL Server 2017:

What Microsoft introduces in SQL Server 2017 RC1, is something I refer to as whitelisting. It is somewhat similar to the TRUSTWORTHY setting, where you indicate that a database is to be trusted. But instead of doing it on the database level, you do it per assembly.

To whitelist in SQL Server 2017 RC1, you use the system stored procedure sys.sp_add_trusted_assembly. As the name implies the procedure adds an assembly to a list of “trusted” assemblies. By marking an assembly as trusted, SQL Server will allow it to be loaded when clr strict security is on (on by default), even if:

  • the assembly is not signed, and

  • the database where you want to deploy it to is not TRUSTWORTHY.

With the elimination of the CAS model finally hitting CLR, this is probably going to be one of the easier ways for DBAs to move forward with CLR in the future.

Comments closed

Finding The Largest Tables

Andy Galbraith has a script to find the largest tables in a database:

In the previous post in this series “Toolbox – Where Did All My Space Go?” I shared a script for finding which database files consumed the most space and which of those files had free space in them.  The next step after finding out which databases are using the space is determining which tables in those databases are occupying that space to consider purge/archive opportunities.  In many cases you will find a single large table (often called a “mother table”) taking up most of the space in your database.

There are cases when you wouldn’t want to use the Disk Usage By Table report, so here is a viable alternative.

Comments closed

Narrowing Down Deprecated Feature Usage

Dave Mason now skips msdb when he looks for deprecated feature usage:

In my previous post, I took a stab at monitoring deprecation events for SQL Server. It didn’t go so well. A deprecation event occurred more than 5,000 times in a very short period of time, and I got one email for every occurrence. Not good. Here’s what I kept seeing over and over:

USER_ID will be removed from a future version of SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use the feature. Use DATABASE_PRINCIPAL_ID instead.

It turns out the system stored proc msdb.dbo.sp_send_dbmail has a USER_ID() reference. I suspect an unrelated alert/email happened once, which executed sp_send_dbmail, which generated a DEPRECATION_FINAL_SUPPORT event, which ultimately led to another execution of sp_send_dbmail, which generated yet another DEPRECATION_FINAL_SUPPORT event, and round and round we go.

Click through for examples of deprecated features that various Microsoft products, including Reporting Services and Team Foundation Server, still use.

Comments closed