Billing Migration: Choosing A Database Product

Jyoti Shandil, et al, explain how they chose a database product for Netflix’s billing system:

AWS RDS MySQL: Ideally we would have gone with MySQL RDS as our backend, considering Amazon does a great job in managing and upgrading relational database as a service, providing multi-AZ support for high availability. However, the main drawback to RDS was the storage limit of 6TB. Our requirement at the time, was closer to 10TB.

AWS Aurora: AWS Aurora would have met the storage needs, but it was in beta at that time.

PostgreSQL: PostgreSQL is a powerful open source, object-relational database system, but we did not have much in house expertise using PostgreSQL. In the DC, our primary backend databases were Oracle and MySQL. Moreover, choosing PostgreSQL would have eliminated the option of a seamless migration to Aurora in future, as Aurora is based on the MySQL engine.

From there, they also explain some technical issues they found in migrating data.  Read the whole thing.  If you’re coming into this series blind, they also have part 1 and part 2 of the series, giving more of an architectural overview of their billing system.

DacFx Wrapper

Ed Elliott has a new Powershell module named DacFxed:

There is a solution? Well yes of course otherwise I wouldn’t have been writing this! DacFxed is a powershell module which:

  • 1. References the DacFx nuget package so updating to the latest version is simple
  • 2. Implements a hack (ooh) to allow contributors to be loaded from anywhere
  • 3. Is published to the powershell gallery so to use it you just do “Install-Module -Name DacFxed -Scope User -Force”
  • 4. Has a Publish-Database, New-PublishProfile and Get-DatabaseChanges CmdLets

Cool right, now a couple of things to mention. Firstly this is of course open source and available:

This is a nice tool to deploy dacpac files using Powershell.  Check out the GitHub repo for more details.

Spatial Functions In Hive

Constantin Stanca has a couple of posts on using Hive to implement geospatial queries.  First, an overview:

The Esri Geometry API for Java includes geometry objects (e.g. points, lines, and polygons), spatial operations (e.g. intersects, buffer), and spatial indexing. By deploying the library (as a jar) within Hadoop, you are able to build custom MapReduce applications using Java to complete analysis on your spatial data. This can be used as a standalone library, or combined with Spatial Framework for Hadoop to create a SQL like experience.

The Spatial Framework for Hadoop includes among others, the Hive Spatial library with User-Defined Functions and SerDes for spatial analysis in Hive. By enabling this library in Hive, you are able to construct queries using Hive Query Language (HQL), which is very similar to SQL. This allows you to avoid complicated MapReduce algorithms and stick to a more familiar workflow. The API used by the Hive UDF’s could be used by developers building geometry functions for 3rd-party applications using Storm, Spark, HBase etc.

He follows that up with some pieces hive misses compared to SQL Server, Oracle, etc.:

As discussed with ESRI recently, there are no plans to open source all spatial functions currently available for traditional RDBMS like Oracle, SQL Server, or Netezza, as those are commercially licensed packages. The best option to compensate for the 5-10% missing functions is to contribute to ESRI’s open source repository: ESRI does not provide a commercial library for Hive including all spatial functions.

Be sure to check out that second link to get an understanding of exactly what’s missing.  Via Mark Herring.

U-SQL Updates

Michael Rys shows off what has been added to the August release for Azure Data Lake Analytics and U-SQL:

As part of the Azure Data Lake Analytics and U-SQL August 1st refresh, we released a couple of new, long awaited features, are fixing some confusing syntax and fixed several bugs

Read on for more details.

Deploying SSAS Cubes Using Powershell

Richie Lee has a Powershell script to deploy an Analysis Services cube:

Recently someone got in touch with me about one of my earliest posts. I realised I had made a mistake on that post and had updated. But I also realised that the script itself is actually part of a PowerShell module. So wanting to make things as easy as possible, and being the helpful guy that I am, I have altered that example into a script which I am posting here.

It’s worth stressing that the errors that output to the console are fairly generic, and that there is a log file that will contain all the information if there are any errors.

Sadly, you still need to write/tweak the big glob of XML, it seems.

DNS Aliases

Drew Furgiuele shows us how to use CNAME records to give us easy aliases for servers hosting SQL Server:

When you connect to a SQL Server instance, you’re most likely connecting directly to the host name of the server running that instance. So for example, if the host name of my instance is SQLSERVER-A, then in my SSMS connection (or application) I probably type in a host name offully qualified domain name (FQDN) of a server. If I want to move a database from one server to another, or stand up a new server and move everything over to it, from now on I’ll need to make sure I type in the new host name. And for a DBA, this is fine. We mostly identify our servers by the hosts they run on.

Developers and users, on the other hand, don’t always think like that. They each probably only care about one or two databases on a given instance, and depending on their release cycle even something as simple as changing a connection string might need to be a scheduled change. So when databases move or you build a new server you not only need to ensure as little downtime as possible from a system perspective but with as little impact to a user’s system too. And you can do that with the help of your network team and your local domain name system: DNS.

I’ve had great experiences with CNAME records masking actual server names.  This was most relevant in an environment in which devs just couldn’t remember which X-Men character was the production SQL Server and  which was QA.

Reverse Engineering Databases Using Biml

Kevin Feasel



Bill Fellows shows how to get a database schema using Biml:

I’m at a new client and I needed an offline version of their operation data store, ODS, database schema. I don’t know what I was expecting, but it wasn’t 11,500 tables. :O That’s a lot. First up to bat was Visual Studio Database Projects. I clicked Import and you really have no options to winnow the list of items your importing down via Import. Ten minutes later, the import timed out on spatial indexes. Which wouldn’t be so bad except it’s an all or nothing operation with import.

Fair enough, I’ll use the Schema Comparison and only compare tables, that should make it less ugly. And I suppose it did but still, the operation timed out. Now what?

SSMS to the rescue. I right click on my database and select Generate Scripts and first off, I script everything but the tables. Which is amusing when you have 11.5k tables, checking and unchecking the table box causes it to spin for a bit. I generated a file for each object with the hope that if the operation goes belly up, I’ll at least have some progress. Round 1, Winner! I had all my views, procedures, functions, data types (don’t hate), all scripted out nice and neat. Round 2, I just selected tables. And failed.

As you’d expect, there’s not that many lines of Biml code, and yet it does the job.

Installing R Packages In SQL Server

Kevin Feasel



Tomaz Kastrun shows how to install packages in SQL Server R Services:

Julie Koesmarno made a great post on installing R packages. Please follow this post. Also Microsoft suggests the following way to install R packages on MSDN.

Since I wanted to be able to have packages installed directly from SQL Server Management Studio (SSMS) here is yet another way to do it. I have used xp_cmdshell to install any additional package for my R (optionally you can setEXECUTE AS USER).

This is a bit of a backdoor method, but it does work.

Access Control Basics

Robert Sheldon gives an introductory-level overview of the basics of logins, users, roles, and permissions:

You can think of a role as a type of container for holding one or more logins, users, or other roles, similar to how a Windows group can hold multiple individual and group accounts. This can make managing multiple principals easier when those principals require the same type of access to SQL Server. You can configure each role with permissions to specific resources, adding or removing logins and users from these roles as needed.

SQL Server supports three types of roles: server, database, and application. Server roles share the same scope as logins, which means they operate at the server level and pertain to the database engine as a whole. As a result, you can add only server-level principals to the roles, and you can configure the roles with permissions only to server-level securables, not database-level securables.

These help form the foundation of a secure instance, so it’s vital to understand these concepts.

Basic Powershell Snippets

Michael Bourgon has a few Powershell snippets he wrote this week:

I needed to parse out a Unix filename from a file.  The original “ls” looked like this:

-rw-rw-r– 1 psabcderfg data 646621 Jul 19 16:25 myfile16071901.Z

The unix command to parse it would
cat $fl |awk ‘{print $9}’

And the powershell equivalent is:
$newfiles = get-content $fl | foreach {$_.split(” “)[8]}
(the number is 8 instead of 9 because powershell arrays start at 0)

Check all of them out.


August 2016
« Jul Sep »