Press "Enter" to skip to content

Author: Kevin Feasel

Netflix Billing Architecture

The Netflix tech blog discusses changing their billing infrastructure to be entirely in the cloud (AWS in this case):

Cleaning up Code: We started chipping away existing code into smaller, efficient modules and first moved some critical dependencies to run from the Cloud. We moved our tax solution to the Cloud first.

Next, we retired serving member billing history from giant tables that were part of  many different code paths. We built a new application to capture billing events, migrated only necessary data into our new Cassandra data store and started serving billing history, globally, from the Cloud.

We spent a good amount of time writing a data migration tool that would transform member  billing attributes spread across many tables in Oracle  into a much simpler Cassandra data structure.

We worked with our DVD engineering counterparts to further simplify our integration and got rid of obsolete code.

Purging Data: We took a hard look at every single table to ensure that we were migrating only what we needed and leaving everything else behind. Historical billing data is valuable to legal and customer service teams. Our goal was to migrate only necessary data into the Cloud. So, we worked with impacted teams  to find out what parts of historical data they really needed. We identified alternative data stores that could serve old data for these teams. After that, we started purging data that was obsolete and was not needed for any function.

All in all, a very interesting read on how to migrate large databases.  Even if you’re moving from one version of a product to another, some of these steps might prove very helpful in your environment.

Comments closed

R 3.3.1 Available

David Smith reports that a new version of R is now available, 3.3.1:

This minor update, codenamed “Bug in Your Hair”, makes a few small fixes to the R 3.3.0 release. Bugs fixed include mostly rarely-encountered cases like generating Gamma random numbers with zero or infinite rate parameters, and correctly matching text (with the matchfunction) that only differed in the encoding.

There are no new features in this update, and all R code and packages should work with R 3.3.1 just as they did with R 3.3.0. For a complete list of the fixes in R 3.3.1, follow the link below.

Even though this is a small update, it might be useful to check out.

Comments closed

New No-Longer-Features In SQL Server 2016

Bob Pusateri acts as SQL grim reaper:

32-bit SQL Server. SQL Server 2016 is 64-bit only. If for whatever reason you’re running on a 32-bit architecture, sadly you’re now out of luck – 2014 is the end of the road. On the bright side, there’s probably some new hardware in your future!

Compatibility Level 90. If you’re using compatibility level for backwards compatibility, the oldest available version in SQL Server 2016 is 100, which corresponds to SQL Server 2008. Compatibility level 90, SQL Server 2005, is no longer an option.

Bob also covers a few deprecated features, none of which (hopefully) are in regular use in your environment.

Comments closed

Radar Charts

Devin Knight continues his custom visuals course:

In this module you will learn how to use the Radar Chart, a Power BI Custom Visual. The Radar Chart is sometimes is also know to some as a web chart, spider chart, or star chart.  Using the Radar Chart allows you to display multiple categories of data on each spoke (like spokes on a bicycle wheel) of the chart. The Radar Chart does support the display of multiple metrics, which allows you to compare and contrast the “pull” that each category has on your metrics.

I still say you should stick with the fish chart for all of your visualization needs.

Comments closed

Clustered Indexes

Derik Hammer looks at the power of clustered indexes:

The data in a clustered index is logically sorted but does not guarantee that it will be physically sorted. The physical sorting is simply a common misconception. In fact, the rows on a given page are not sorted even though all rows contained on that page will be appropriate to its place in the logical sort order. Also, the pages on disk are not guaranteed to be sorted by the logical key either.

The most likely time where you will have a clustered index that is physically sorted is immediately after an index rebuild operation. If you are trying to optimize for sequential reads, setting a fill factor to leave free space on your pages will help limit how often you have pages physically out of order at the expense of disk space.

Derik also discusses four qualities for a good clustered index.  My preferred acronym is NUSE (Narrow, Unique, Static, Ever-increasing); Derik uses slightly different terms.

Comments closed

Standard Deviation Estimation

Dan Goldstein gives a rule of thumb for getting standard deviations for various distributions:

Say you’ve got 30 numbers and a strong urge to estimate their standard deviation. But you’ve left your computer at home. Unless you’re really good at mentally squaring and summing, it’s pretty hard to compute a standard deviation in your head. But there’s a heuristic you can use:

Subtract the smallest number from the largest number and divide by four

Let’s call it the “range over four” heuristic. You could, and probably should, be skeptical. You could want to see how accurate the heuristic is. And you could want to see how the heuristic’s accuracy depends on the distribution of numbers you are dealing with.

Sometimes you just don’t have STDEV() available.

Comments closed

2016 SOS_RWLock

Ewald Cress continues his series on internals, and looks at how SOS_RWLock has changed in SQL Server 2016:

Allow me to call out some layout comparison points against the 2014 version:

  • There is no separate member to track the shared reader count.

  • The four-byte spinlock is gone.

  • The four-byte waiting writer count is gone.

  • The two chunks of four-byte padding (for qword alignment of pointers) are gone.

  • The WaitListEntry structure hasn’t changed at all.

Ewald also covers Compare-And-Swap operations in detail.  Definitely a good read.

Comments closed

T-SQL Tuesday Roundup

Michael Swart rounds up the usual suspects:

There’s always some anxiety when throwing a party. Wondering whether it will be a smash. Well I had nothing to worry about with the twenty bloggers who participated last week. You guys hit it out of the park!

Michael put a lot of effort into making his round-up look nice and making my life a little easier by exposing me to a couple blogs I didn’t know about.  Great job.

Comments closed

Minimizing Cloud Costs

Kenneth Fisher looks at reducing the bottom line for cloud operations:

This got me thinking about ways to reduce/minimize costs. These are some general ideas since from what I can tell cloud billing is as complex as the tax codes and at that I have limited experience.

  • If you aren’t using your VM, shut it down. You can do this manually, or with apowershell script or even at the push of a button

  • Start small. Only create the machines you need and keep them to a minimum.

  • Starting small will lead to some bottle necks. Feel free to bounce up and down as you need. There are some restrictions (size etc) when you move downwards, so be careful. Again this can be done manually or with powershell. Let’s say you need to do a high volume load. Bump your service tier, then once you are done, bump it back down again.

  • And my personal favorite : Don’t install enterprise when you only need standard.

Doing business on Azure or AWS does require a bit of a shift in mindset.  Cloud costs are entirely variable—you control when services run; how much compute, storage, and bandwidth you want to use; and your SLA.  Choosing different spots on the continuum results in different pricing.  This has also helped the growth of technologies like Hadoop, in which you can separate compute from storage.  If I know that my cluster gets heavy usage during core business hours, light usage overnight, and no usage on the weekend, I can spin up and down nodes as necessary, and can even shut off clusters which don’t need to operate, and because I’m storing the data off of the cluster nodes (and on S3 or in Azure Data Lake Storage), data doesn’t become unavailable just because the primary compute process is unavailable—I could spin up another cluster or write a quick one-off data reader.

Comments closed

Presentation Versus Storage

Edwin Sarmiento looks at how data is stored on disk when you use Dynamic Data Masking or Always Encrypted in SQL Server 2016:

Looking at the data, the masked columns appear as they are on disk. This validates Ronit Reger’s statement on his blog post Use Dynamic Data Masking to obfuscate your sensitive data.

* There are no physical changes to the data in the database itself; the data remains intact and is fully available to authorized users or applications.* Note that Dynamic Data Masking is not a replacement for access control mechanisms, and is not a method for physical data encryption.

In contrast, the encrypted columns are encrypted on disk and the data types are different on disk compared to how they were defined in the table schema – SSN is defined with nvarchar(11) while CreditCardNumber is defined with nvarchar(25). This means that those “valuables” are even more secured on disk, requiring additional layers of security just to get access to them.

Read the whole thing.

Comments closed