Press "Enter" to skip to content

Category: Hardware

Hadoop: DAS Or NAS?

Jagdish Mirani asks whether you should prefer Direct Attached Storage (DAS) or Network Attached Storage (NAS) for your Hadoop cluster:

If you want to spin up an Apache Hadoop cluster, you need to grapple with the question of how to attach your disks. Historically, this decision has favored direct attached storage (DAS). This approach is in keeping with the fundamental Hadoop principle of moving processing to a where the data lives, thereby taking advantage of disk locality to optimize performance. Disk locality is so core to Hadoop that virtually any description of Hadoop starts with this.

The alternative is to use network attached storage (NAS). In contrast to DAS, NAS separates the compute and storage layers so that storage can be shared across a number of servers by shipping data over the network. Historically, this heavy dependence on the network made NAS an order of magnitude slower. Remember, the state of the art was 1GbE networks, and switches were slower and more expensive. I/O requirements for demanding Hadoop-based applications could only be met by DAS.

This is a very interesting discussion.  In my limited experience, I’ve had trouble selling operations teams on DAS, given the increased ops effort required to keep a bunch of attached disks going.  Hat tip Ari Amster.

Comments closed

Azure Iaas Disk Differences

Rolf Tesmer looks at whether to use SSDs or Premium Disk for SQL Server instances on Azure IaaS:

To test the throughput I will run a set test with the TempDB database on D:\ (local SSD) and then rerun the test again with the TempDB moved onto F:\ (P30 premium disk).  Between the tests SQL Server is restarted so we’re starting with a clean cache and state.

The test SQL script will create a temporary table and then run a series of insert, update, select and delete queries against that table.  We’ll then capture and record the statistics and time.

The results were interesting; read on to learn more.

Comments closed

Home Labs

Chrissy LeMaire shows off her home lab:

I like to test my scripts against a variety of versions/editions and I don’t like spinning VMs up and down all the time. As for the cost; some people spend their money on golf, Polish pottery and gaming rigs. I spend mine on servers, Belgian beer and travel 😉

As you can see, I also have an old Macbook Pro with 256 SSD, 4TB HDD and 8GB RAM in the mix. It’s for photos and videos, however. And someone gave me an old silver Shuttle from like 2002, but I haven’t had the time to set it up yet.

The “cloud versus local” lab is a tough call, as both sides have their advantages and disadvantages.

Comments closed

Getting Physical

Denny Cherry explains the value of his having physical hardware on hand:

Now you may be asking why don’t we just do all this in Azure? And we could, but the reason we didn’t is pretty straight forward. Cost. Building tons of VMs in Azure and leaving them running for a few weeks for customers can cost a decent amount pretty quickly, even with smaller VMs. Here our cost is fixed. As long as we don’t need another power circuit (we can probably triple the number of servers before that becomes an issue) the cost is fixed. And if we need more power that’s not all that much per month to add on.

All and all, this will make a really nice resource for our customers to take advantage of, and give us a place to play with whatever we want without spending anything.

Ah, the life-long struggle between cap-x and op-x…

Comments closed

Infrastructure Trends

Glenn Berry reports on hardware trends:

Right now, you can purchase extremely capable, high performance server processors with physical core counts between four and twenty-two cores per processor. I am referring to the current 14nmIntel Xeon E5-2600 v4 (Broadwell-EP) and the 22nm Intel Xeon E7-8800 v3 (Haswell-EX) families that both use high bandwidth DDR4 memory.

On March 31, 2016, Intel released the 14nm Xeon E5-2600 v4 family (Broadwell-EP) for two-socket servers. This is a Tick release, building on the current Haswell microarchitecture that has up to 22 physical cores and DDR4 2400 support. This processor will work in existing model servers such as the Dell PowerEdge R730 with a BIOS update, which means that there will be less delay before they are actually available for sale.

A lot of companies aren’t interested in The Cloud, so it’s good to know that hardware vendors are keeping up with on-premises demands.

Comments closed