Press "Enter" to skip to content

Month: July 2016

Netflix Billing

Subir Parulekar and Rahul Pilani describe how they moved their billing data out of a data center and into AWS:

Now the only (and most important) thing remaining in the Data Center was the Oracle database. The dataset that remained in Oracle was highly relational and we did not feel it to be a good idea to model it to a NoSQL-esque paradigm. It was not possible to structure this data as a single column family as we had done with the customer-facing subscription data. So we evaluated Oracle and Aurora RDS as possible options. Licensing costs for Oracle as a Cloud database and Aurora still being in Beta didn’t help make the case for either of them.
 
While the Billing team was busy in the first two acts, our Cloud Database Engineering team was working on creating the infrastructure to migrate billing data to MySQL instances on EC2. By the time we started Act III, the database infrastructure pieces were ready, thanks to their help. We had to convert our batch application code base to be MySQL-compliant since some of the applications used plain jdbc without any ORM. We also got rid of a lot of the legacy pl-sql code and rewrote that logic in the application, stripping off dead code when possible.
Our database architecture now consists of a MySQL master database deployed on EC2 instances in one of the AWS regions. We have a Disaster Recovery DB that gets replicated from the master and will be promoted to master if the master goes down. And we have slaves in the other AWS regions for read only access to applications.

Read the whole thing.  Their architectural requirements probably won’t be yours (unless you’re working at a company at the scale of Netflix), but it’s quite interesting seeing how they solve their problems.

Comments closed

Built-In Query Monitoring Tools

Grant Fritchey describes a couple built-in options for monitoring query performance:

It’s not enough to know that you have a slow query or queries. You need to know exactly how slow they are. You must measure. You need to know how long they take to run and you need to know how many resources are used while they run. You need to know these numbers in order to be able to determine if, after you do something to try to help the query, you’ll know whether or not you’ve improved performance. To measure the performance of queries, you have a number of choices. Each choice has positives and negatives associated with them. I’m going to run through my preferred mechanisms for measuring query performance and outline why. I’ll also list some of the other mechanisms you have available and tell you why I don’t like them. Let’s get started.

This is an intro-level blog post, so Grant doesn’t go into much detail, but he does provide some good links for getting started.

Comments closed

Upgrades And Regressions

Kendra Little explains when upgrades can cause performance to suffer:

The cluster’s servers and SQL Server configurations were built to be as close to identical as possible to the previous instance (memory, cores, disk, maxdop, CTP, etc).

After the migration, I noticed that CPU utilization jumped from the normal 25% to a consistent 75%.

I did several other migrations with similar server loads with no issues, so I’m a bit puzzled as to what might be going on here. Could the upgrade from SQL Server 2008 R2 to SQL Server 2012 simply be exposing bad queries that 2008 was handling differently?

Kendra goes through a number of reasons, building a troubleshooting guide in the process.  This is a great read.

Comments closed

Hive 2.1 Benchmarks

Nita Dembla and Gopal Vijayaraghavan compare Hive 2.1 versus Hive 1:

To measure the improvement LLAP brings we ran 15 queries that were taken from the TPC-DS benchmark, similar to what we have done in the past. The entire process was run using the hive-testbench repository and data generation tools. The queries there are adapted to Hive SQL but are otherwise not modified from the standard TPC-DS queries using any of the tricks that some big data vendors routinely use to show better performance for their tools. This blog only covers 15 queries but a more comprehensive performance test is underway.

The full test environment is explored below but at a high level the tests run using 10 powerful VMs with a 1TB dataset that is intended to show performance at data scales commonly used with BI tools. The same VMs and the same data are used both for Hive 1 and for Hive 2. All reported times represent the average across 3 runs in the respective Hive version.

Hive 2.1 looks like a big step forward for Hadoop performance.

Comments closed

Custom Visuals: Chord

Devin Knight has part nine of his custom visualization series:

In this module you will learn how to use the Chord Power BI Custom Visual.  Chord diagrams show directed relationships among a group of entities using colored lines (chords); this allows for an easy representation of correlating data.

Chord diagrams, when done right, can be extremely informative.  The problem is that they’re also really confusing to the uninitiated.

Comments closed

Error Severity Levels Greater Than 18

Manoj Pandey was debugging an Informatica ETL and got back an uncommon error message:

So, to identify the cause I tried to execute the above MERGE statement directly and I got the same error:

EXEC spMergeTables 'STG.ABCtblXYZ','ABC.tblXYZ'

(0 row(s) affected)
Msg 2754, Level 16, State 1, Procedure spMergeTables, Line 107
Error severity levels greater than 18 can only be specified by members of the sysadmin role, using the WITH LOG option.

This is a case in which an immediate error obscured the actual error.

Comments closed

Stopping Integration Services Packages

Andy Leonard explains various methods of stopping SSIS packages in progress:

Once you have the operation_id value, simply plug it into the stop_operation stored procedure and execute:

exec SSISDB.[catalog].stop_operation @operation_id = 24

The stop_operation stored procedure runs for a few seconds (typically less than 15 seconds) and stops the execution of the SSIS package. SSIS packages that have been stopped are listed with “Canceled” status. You can see operation_id 19 was stopped in the screenshots shown above.

Read on for more.

Comments closed

Groups Of Basic Availability Groups

Russ Thomas looks into whether you can combine Basic Availability Groups in a way which mimics Enterprise Edition’s Availability Groups functionality:

For a recent project that required HA/DR but couldn’t justify Enterprise edition we decided to take the plunge on 2016’s Basic Availability Groups.

For a quick rundown of the watered down feature set – basically what you don’t get with a Basic Availability Group (BAG) – the major points are as follows:

  • You can only have 2 nodes.

  • Only one database can be in the group.

  • You can not have the secondary be in read only.

  • You can not take backups from the secondary.

The answer is “yes” but it’s not easy.  Read on for more.

Comments closed

Securing Kafka Streams

Michael Noll shows security features of Kafka Streams:

First, which security features are available in Apache Kafka, and thus in Kafka Streams?  Kafka Streams supports all the client-side security features in Apache Kafka.  In this short blog post we cannot cover these client-side security features in full detail, so I recommend reading the Kafka Security chapter in the Confluent Platform documentation and our previous blog post Apache Kafka Security 101 to familiarize yourself with the security features that are currently available in Apache Kafka.

That said, let me highlight a couple of important Kafka security features that are essential for implementing robust data infrastructures, whether these are used for building horizontal services at larger companies, for multi-tenant infrastructures (e.g. microservices), or for shared platforms such as in the Internet of Things.  Later on I will then demonstrate an example application where we use some of these security features in Kafka Streams.

It’s important to secure sensitive data, even in “transient” media like Kafka (though the transience of Kafka is user-definable, so “It’ll go away soon” isn’t really a good argument).

Comments closed