Press "Enter" to skip to content

Month: July 2017

Managing Drives On Linux

David Klee walks through some basics of Linux administration with respect to drives and mountpoints:

We see that all four of the drives show up in the list. Because of the nature of how I presented the disks to the VM, the bootable drives (sdd) show up at the end of the chain instead of the beginning, but that’s OK. It doesn’t change how the disks are bootable.

We are going to use the Logical Volume Manager (LVM) to manage the disks for us instead of using regular partitions. On Windows-based servers, Windows has the ability to expand partitions without incurring any downtime, and we want to have the same flexibility from a Linux standpoint. With elevated priviledges (for the rest of the commands), let’s scan the drives to look at what it sees.

Read the whole thing.

Comments closed

Environmental Considerations With SQL Server

Jeff Mlakar wraps up his environmental diagnostics series:

Let’s start by defining terms. By environmental I mean factors that affect database performance outside of the code. The remediation of issues can be accomplished without altering SQL, no code change request, no release management necessary. Poor performance can often be addressed by doing a quick review of the overall system and ensuring there are not horrible defaults in use.

This is the low hanging fruit we can get before deep diving into the code

Before you start banging your head against the hood of the car take a walk around and look at the big picture first. It is a common thing to see junior DBAs go right for the most complex area they can identify and ignore the obvious issues which are plain to the eyes of a veteran DBA.

Read on for some resources to help you learn more.

Comments closed

R For Apache Impala

Ian Cook describes implyr, an R interface for Apache Impala:

dplyr provides a grammar of data manipulation, consisting of set of verbs (including mutate()select()filter()summarise(), and arrange()) that can be used together to perform common data manipulation tasks. The implyr package helps dplyr translate this grammar into Impala-compatible SQL commands. This gives R users access to Impala’s scale and speed on large distributed datasets while using the same familiar dplyr syntax that they are accustomed to using on local data frames and other data sources. R users can also choose to directly write SQL commands and execute them on Impala using implyr.

implyr builds upon recent work from RStudio and other contributors, including major updates to the packages dplyr and DBI, and new packages dbplyr and odbc. implyr together with these packages enables data scientists and data engineers to more easily interact with Impala through self-service data science tools like Cloudera Data Science Workbench.

It looks like this project is off to a good start already.

Comments closed

Data Cleanup Using Drools

Rathnadevi Manivannan gives an example of using Drools to create rule-based data cleansing processes:

The oil well drilling datasets contain raw information about wells and their formation details, drill types, and production dates. The Arkansas dataset has 6,040 records and the Oklahoma dataset has 2,559 records.

The raw data contains invalid values such as null, invalid date, invalid drill type, and duplicate well and invalid well information with modified dates.

This raw data from the source is transformed to MS SQL for further filtering and normalization. To download raw data, look at the Reference section.

This is an example of applying several constraints and rules to a single data set.  Each individual rule would probably be easier to do in T-SQL, but the whole bunch becomes easier to understand with a procedural language.

Comments closed

Whitelisting SQL CLR Assemblies

Niels Berglund walks through the process of whitelisting a CLR assembly in SQL Server 2017:

What Microsoft introduces in SQL Server 2017 RC1, is something I refer to as whitelisting. It is somewhat similar to the TRUSTWORTHY setting, where you indicate that a database is to be trusted. But instead of doing it on the database level, you do it per assembly.

To whitelist in SQL Server 2017 RC1, you use the system stored procedure sys.sp_add_trusted_assembly. As the name implies the procedure adds an assembly to a list of “trusted” assemblies. By marking an assembly as trusted, SQL Server will allow it to be loaded when clr strict security is on (on by default), even if:

  • the assembly is not signed, and

  • the database where you want to deploy it to is not TRUSTWORTHY.

With the elimination of the CAS model finally hitting CLR, this is probably going to be one of the easier ways for DBAs to move forward with CLR in the future.

Comments closed

Finding The Largest Tables

Andy Galbraith has a script to find the largest tables in a database:

In the previous post in this series “Toolbox – Where Did All My Space Go?” I shared a script for finding which database files consumed the most space and which of those files had free space in them.  The next step after finding out which databases are using the space is determining which tables in those databases are occupying that space to consider purge/archive opportunities.  In many cases you will find a single large table (often called a “mother table”) taking up most of the space in your database.

There are cases when you wouldn’t want to use the Disk Usage By Table report, so here is a viable alternative.

Comments closed

Narrowing Down Deprecated Feature Usage

Dave Mason now skips msdb when he looks for deprecated feature usage:

In my previous post, I took a stab at monitoring deprecation events for SQL Server. It didn’t go so well. A deprecation event occurred more than 5,000 times in a very short period of time, and I got one email for every occurrence. Not good. Here’s what I kept seeing over and over:

USER_ID will be removed from a future version of SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use the feature. Use DATABASE_PRINCIPAL_ID instead.

It turns out the system stored proc msdb.dbo.sp_send_dbmail has a USER_ID() reference. I suspect an unrelated alert/email happened once, which executed sp_send_dbmail, which generated a DEPRECATION_FINAL_SUPPORT event, which ultimately led to another execution of sp_send_dbmail, which generated yet another DEPRECATION_FINAL_SUPPORT event, and round and round we go.

Click through for examples of deprecated features that various Microsoft products, including Reporting Services and Team Foundation Server, still use.

Comments closed

How Long Was That Event?

Taiob Ali shows that some extended event times are in milliseconds, some are in microseconds, and some are in unknownaseconds:

There was a question in dba.stackexchange.com titled “module_end extended events duration in microseconds?” that I answered and it was Microsecond in that instance.
Later on I questioned myself if the duration is always in Microseconds for extended events. I found, it is a mix of Millisecond, Microsecond and some are unknown meaning famous NULL. I wrote below tsql code to determine which is what.

Click through for the script that Taiob used to determine the answer.

Comments closed

Message Queues For The DBA

Drew Furgiuele explains message queueing theory and puts together a nice demo:

Please note, there’s one thing I need to make super abundantly clear for this demo: You’d never, ever configure these components like this for production. There’s so much more to consider, like setting up RabbitMQ to use SSL, writing actual applications and services instead of PowerShell to handle message listeners, and putting more effort into Service Broker. This should serve fine as a proof of concept, but if you want to actually implement something like this, make sure you do you research and follow best practices for each component.

This is the warning label on the chainsaw that says “normally you’d grab it by the other end.”  This is a great post, giving an introduction to tying Service Broker to RabbitMQ.  My biases lead me to Apache Kafka over RabbitMQ in most cases, but that’s just me.

Comments closed