Build Versus Buy For Hadoop

Kevin Feasel



Tom Phelan walks through some thoughts on whether to build versus buy when using big data platforms:

This means you absolutely must sweat the details up front. Big Data project failures are more often than not predicated by the statement: “We will do this bit now, and figure the rest out later”. But you need to begin with the end in mind.

You need to know the performance that you’ll be able to deliver and what your requirements are. You need to know how to integrate with your corporate Active Directory, and LDAP, and Kerberos services. You need to know your network topology and security requirements as well as the required user roles and responsibilities breakdown. You need to know how you’ll handle high availability, QoS, and multi-tenancy. You need to know how you’ll manage upgrades to the latest versions of your Hadoop distribution or other big data tools, and how you’ll respond to requests for new big data frameworks and new data science tools. If not, you’re just asking for trouble.

The motif in his post is building your own car, which makes sense as an extended metaphor.

The Correct Way To Load Libraries In R

Kevin Feasel



Gerald Belton opens a can of worms:

When I was an R newbie, I was taught to load packages by using the command library(package). In my Linear Models class, the instructor likes to use require(package). This made me wonder, are the commands interchangeable? What’s the difference, and which command should I use?

Interchangeable commands . . .

The way most users will use these commands, most of the time, they are actually interchangeable. That is, if you are loading a library that has already been installed, and you are using the command outside of a function definition, then it makes no difference if you use “require” or “library.” They do the same thing.

… Well, almost interchangeable

Read on to understand the differences between the two.  I end up doing something very similar to his code snippet for exactly the reason he describes.  H/T R-Bloggers

How To Create Difficult Measures In Power BI

Matt Allington walks through his process of how he creates measures in Power BI:

Killer Tip 1: Create Good Test Data

The first thing I did was to replicate the test data shown above.  As I have mentioned many times, good quality test data is essential to getting a quick correct answer to your problem.  The data from the OP looked pretty good as it had covered the relevant scenarios (1 ID had just blue, 1 ID had blue and red, several IDs had no blue – this is good test data).

Starting out with good test data is vital—it helps you clarify exactly what it is you want, and if you come up with edge cases, you have the makings of a good test workbench to ensure that your code actually works, and not just in the simplest scenario.

Don’t Unit Test Private Methods

Vladimir Khorikov argues that you should not unit test private methods:

When your tests start to know too much about the internals of the system under test (SUT), that leads to false positives during refactoring. Which means they won’t act as a safety net anymore and instead will impede your refactoring efforts because of the necessity to refactor them along with the implementation details they are bound to. Basically, they will stop fulfilling their main objective: providing you with the confidence in code correctness.

When it comes to unit testing, you need to follow this one rule: test only the public API of the SUT, don’t expose its implementation details in order to enable unit testing. Your tests should use the SUT the same way its regular clients do, don’t give them any special privileges. Here you can read more about what an implementation detail is and how it is different from public API: link.

In the database world, this is one reason why I like using stored procedures:  they give the equivalent of a public API for database code, so you can write tests for them.

Azure SQL Database FAQ

Kevin Feasel



Dimitri Furman answers some common questions about Azure SQL Database:

Q7. Can I use Windows Authentication in Azure SQL Database?

The short answer is no. Therefore, if you are migrating an application dependent on Windows Authentication from SQL Server to Azure SQL Database, you may have to either switch to SQL Authentication (i.e. use a separate login and password for database access), or use Azure Active Directory Authentication (AAD Authentication).

The latter is conceptually similar to Windows Authentication in the sense that connections from directory principals are authenticated without the need to provide additional secrets, such as a password. Since Azure Active Directory can be federated with the on-premises Active Directory Domain Services, it can effectively authenticate the same Active Directory principals that could access the database prior to migration. However, the authentication flow for AAD Authentication is significantly different, so the analogy with Windows Authentication only goes so far.

There are some good questions in here, especially the one about retry logic; that’s good to have in any situation, but becomes vital when working with a cloud service.

How Statistics In SQL Server Have Changed Over The Years

Erin Stellato gives us a version-based timeline of how SQL Server has handled statistics over the years:

SQL Server 2008

This is a very interesting historical look.  Most interesting to me was the decreases in the number of steps available.


October 2017
« Sep Nov »