Press "Enter" to skip to content

Author: Kevin Feasel

Int To BigInt

Kendra Little walks through the process of expanding an int column into a bigint:

Sometimes you just can’t take the outage. In that case, you’ve got to proceed with your own wits, and your own code. This is tricky because changes are occurring to the table.

The solution typically looks like this:

  • Set up a way to track changes to the table – either triggers that duplicate off modifications or Change Data Capture (Enterprise Edition)

  • Create the new table with the new data type, set identity_insert on if needed

  • Insert data into the new table. This is typically done in small batches, so that you don’t overwhelm the log or impact performance too much. You may use a snapshot from the point at which you started tracking changes.

  • Start applying changed data to the new table

  • Make sure you’re cleaning up from the changed data you’re catching and not running out of space

  • Write scripts to compare data between the old and new tables to make sure you’re really in sync (possibly use a snapshot or a restored backup to compare a still point in time)

  • Cut over in a quick downtime at some point using renames, schema transfer, etc. If it’s an identity column, don’t forget to fix that up properly.

This method matches what I’ve done in zero downtime situations.

Also see Aaron Bertrand’s article on the same topic:

In part 3 of this series, I showed two workarounds to avoid widening an IDENTITY column – one that simply buys you time, and another that abandons IDENTITY altogether. The former prevents you from having to deal with external dependencies such as foreign keys, but the latter still doesn’t address that issue. In this post, I wanted to detail the approach I would take if I absolutely needed to move to bigint, needed to minimize downtime, and had plenty of time for planning.

Because of all of the potential blockers and the need for minimal disruption, the approach might be seen as a little complex, and it only becomes more so if additional exotic features are being used (say, partitioning, In-Memory OLTP, or replication).

At a very high level, the approach is to create a set of shadow tables, where all the inserts are directed to a new copy of the table (with the larger data type), and the existence of the two sets of tables is as transparent as possible to the application and its users.

Those are two good posts on this topic.

Comments closed

Billing Migration: Choosing A Database Product

Jyoti Shandil, et al, explain how they chose a database product for Netflix’s billing system:

AWS RDS MySQL: Ideally we would have gone with MySQL RDS as our backend, considering Amazon does a great job in managing and upgrading relational database as a service, providing multi-AZ support for high availability. However, the main drawback to RDS was the storage limit of 6TB. Our requirement at the time, was closer to 10TB.

AWS Aurora: AWS Aurora would have met the storage needs, but it was in beta at that time.

PostgreSQL: PostgreSQL is a powerful open source, object-relational database system, but we did not have much in house expertise using PostgreSQL. In the DC, our primary backend databases were Oracle and MySQL. Moreover, choosing PostgreSQL would have eliminated the option of a seamless migration to Aurora in future, as Aurora is based on the MySQL engine.

From there, they also explain some technical issues they found in migrating data.  Read the whole thing.  If you’re coming into this series blind, they also have part 1 and part 2 of the series, giving more of an architectural overview of their billing system.

Comments closed

DacFx Wrapper

Ed Elliott has a new Powershell module named DacFxed:

There is a solution? Well yes of course otherwise I wouldn’t have been writing this! DacFxed is a powershell module which:

  • 1. References the DacFx nuget package so updating to the latest version is simple
  • 2. Implements a hack (ooh) to allow contributors to be loaded from anywhere
  • 3. Is published to the powershell gallery so to use it you just do “Install-Module -Name DacFxed -Scope User -Force”
  • 4. Has a Publish-Database, New-PublishProfile and Get-DatabaseChanges CmdLets

Cool right, now a couple of things to mention. Firstly this is of course open source and available: https://github.com/GoEddie/DacFxed

This is a nice tool to deploy dacpac files using Powershell.  Check out the GitHub repo for more details.

Comments closed

Spatial Functions In Hive

Constantin Stanca has a couple of posts on using Hive to implement geospatial queries.  First, an overview:

The Esri Geometry API for Java includes geometry objects (e.g. points, lines, and polygons), spatial operations (e.g. intersects, buffer), and spatial indexing. By deploying the library (as a jar) within Hadoop, you are able to build custom MapReduce applications using Java to complete analysis on your spatial data. This can be used as a standalone library, or combined with Spatial Framework for Hadoop to create a SQL like experience.

The Spatial Framework for Hadoop includes among others, the Hive Spatial library with User-Defined Functions and SerDes for spatial analysis in Hive. By enabling this library in Hive, you are able to construct queries using Hive Query Language (HQL), which is very similar to SQL. This allows you to avoid complicated MapReduce algorithms and stick to a more familiar workflow. The API used by the Hive UDF’s could be used by developers building geometry functions for 3rd-party applications using Storm, Spark, HBase etc.

He follows that up with some pieces hive misses compared to SQL Server, Oracle, etc.:

As discussed with ESRI recently, there are no plans to open source all spatial functions currently available for traditional RDBMS like Oracle, SQL Server, or Netezza, as those are commercially licensed packages. The best option to compensate for the 5-10% missing functions is to contribute to ESRI’s open source repository: https://github.com/Esri/spatial-framework-for-hadoop. ESRI does not provide a commercial library for Hive including all spatial functions.

Be sure to check out that second link to get an understanding of exactly what’s missing.  Via Mark Herring.

Comments closed

Deploying SSAS Cubes Using Powershell

Richie Lee has a Powershell script to deploy an Analysis Services cube:

Recently someone got in touch with me about one of my earliest posts. I realised I had made a mistake on that post and had updated. But I also realised that the script itself is actually part of a PowerShell module. So wanting to make things as easy as possible, and being the helpful guy that I am, I have altered that example into a script which I am posting here.

It’s worth stressing that the errors that output to the console are fairly generic, and that there is a log file that will contain all the information if there are any errors.

Sadly, you still need to write/tweak the big glob of XML, it seems.

Comments closed

DNS Aliases

Drew Furgiuele shows us how to use CNAME records to give us easy aliases for servers hosting SQL Server:

When you connect to a SQL Server instance, you’re most likely connecting directly to the host name of the server running that instance. So for example, if the host name of my instance is SQLSERVER-A, then in my SSMS connection (or application) I probably type in a host name offully qualified domain name (FQDN) of a server. If I want to move a database from one server to another, or stand up a new server and move everything over to it, from now on I’ll need to make sure I type in the new host name. And for a DBA, this is fine. We mostly identify our servers by the hosts they run on.

Developers and users, on the other hand, don’t always think like that. They each probably only care about one or two databases on a given instance, and depending on their release cycle even something as simple as changing a connection string might need to be a scheduled change. So when databases move or you build a new server you not only need to ensure as little downtime as possible from a system perspective but with as little impact to a user’s system too. And you can do that with the help of your network team and your local domain name system: DNS.

I’ve had great experiences with CNAME records masking actual server names.  This was most relevant in an environment in which devs just couldn’t remember which X-Men character was the production SQL Server and  which was QA.

Comments closed

Reverse Engineering Databases Using Biml

Bill Fellows shows how to get a database schema using Biml:

I’m at a new client and I needed an offline version of their operation data store, ODS, database schema. I don’t know what I was expecting, but it wasn’t 11,500 tables. :O That’s a lot. First up to bat was Visual Studio Database Projects. I clicked Import and you really have no options to winnow the list of items your importing down via Import. Ten minutes later, the import timed out on spatial indexes. Which wouldn’t be so bad except it’s an all or nothing operation with import.

Fair enough, I’ll use the Schema Comparison and only compare tables, that should make it less ugly. And I suppose it did but still, the operation timed out. Now what?

SSMS to the rescue. I right click on my database and select Generate Scripts and first off, I script everything but the tables. Which is amusing when you have 11.5k tables, checking and unchecking the table box causes it to spin for a bit. I generated a file for each object with the hope that if the operation goes belly up, I’ll at least have some progress. Round 1, Winner! I had all my views, procedures, functions, data types (don’t hate), all scripted out nice and neat. Round 2, I just selected tables. And failed.

As you’d expect, there’s not that many lines of Biml code, and yet it does the job.

Comments closed

Installing R Packages In SQL Server

Tomaz Kastrun shows how to install packages in SQL Server R Services:

Julie Koesmarno made a great post on installing R packages. Please follow this post. Also Microsoft suggests the following way to install R packages on MSDN.

Since I wanted to be able to have packages installed directly from SQL Server Management Studio (SSMS) here is yet another way to do it. I have used xp_cmdshell to install any additional package for my R (optionally you can setEXECUTE AS USER).

This is a bit of a backdoor method, but it does work.

Comments closed

Access Control Basics

Robert Sheldon gives an introductory-level overview of the basics of logins, users, roles, and permissions:

You can think of a role as a type of container for holding one or more logins, users, or other roles, similar to how a Windows group can hold multiple individual and group accounts. This can make managing multiple principals easier when those principals require the same type of access to SQL Server. You can configure each role with permissions to specific resources, adding or removing logins and users from these roles as needed.

SQL Server supports three types of roles: server, database, and application. Server roles share the same scope as logins, which means they operate at the server level and pertain to the database engine as a whole. As a result, you can add only server-level principals to the roles, and you can configure the roles with permissions only to server-level securables, not database-level securables.

These help form the foundation of a secure instance, so it’s vital to understand these concepts.

Comments closed