Planning for a Big Data Cluster

Chris Adkin has started a series on SQL Server Big Data Clusters:

Proposing the idea of using virtual machines as Kubernetes cluster nodes to a Kubernetes purist is likely to be met with consternation. However, the different nodes in your cluster have different resource requirements. A master node can get away with as little as 2 GB of memory and 2 logical processors, worker nodes require much more resources. A best practice is never to run applications on master nodes in production. The view of the world from a Kubernetes purist, is that Kubernetes is designed to obviate the need for virtualization. Consider that you do go down the bare metal route, its unlikely that you are going to purchase blades or servers with 2 GB of memory and 2 CPU cores. At the very least consider the use of virtual machines to host master nodes on. For organizations that have standardized on a software defined virtualized infrastructure, Kubernetes will run perfectly happy on this. Also for the rapid provisioning of environments – virtualization provides the fastest means of doing this – simply create yourself a virtual machine template and base your cluster node hosts on this.

Click through for more guidance around what you need to know before you deploy a cluster.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31