This post post will focus on creating a big data cluster so that you can get up and running as fast as possible, as such the storage type used will be ephemeral, this perfectly acceptable for “Kicking the tyres”. For production grade installations integration with a production grade storage platform is required via a storage plugin. Before we create our cluster, with the assumption we are doing this with an on premises infrastructure, the following pre-requisites need to be met:
Read the whole thing, but wait until part 4 before putting anything valuable in it.
Minikube is a good learning tool and Microsoft provides instructions for deploying a big data cluster to this ‘Platform’. However, its single node nature and the fact that application pods run on the master node means that this does not reflect a cluster that anyone would run in production. Kubernetes-as-a-service is probably by far the easiest option for spinning a cluster up, however it relies on an Aws, Azure or Google Cloud Platform account, hence there is a $ cost associated with this. This leaves a vanilla deployment of Kubernetes on premises. Based on the assumption that most people will have access to Windows server version 2008 or above, a relatively cheap and way of deploying a Kubernetes cluster is via Linux virtual machines running on Hyper-V. This blog post will provide step by step instructions for creating the virtual machines to act as the master and worker nodes in the cluster.
This is going on my “try this out when I have time” list. So expect a full report sometime in the year 2023.
Some of these technologies and concepts are not owned or created by Microsoft – the concepts are universal, and a few of the technologies are open-source. I’ve marked those in italics.
I’ve also included a few links to a training resource I’ve found to be useful. I normally use LinkedIn Learning for larger courses, along with EdX, DataCamp, and many other platforms for in-depth training. The links I have indicated here are by no means exhaustive, but they are free, and provide a good starting point.
Click through for a list of some of the technologies in play.