Nigel Meakins is starting a new series on Spark and his first post involves installing Spark on Windows:
WinUtils provides a number of HDFS-emulating utilities that allow us to run Spark as though it were talking to an HDFS storage system (at least to a certain degree). Without this you will get all manner of file system-related issues wit Spark and won’t get off the launchpad.
Within the WinUtils archive you may have a number of Hortonworks Data Platform versioned folders. For the version of Spark I’m using, being 2.2.1, I have chosen hadoop-2,7,1\bin for my files. Unzip and copy the contents of the bin directory to a directory of your choice. It must however be called ‘bin’ in order to be located by the calling programs. I actually placed mine in the C:\Spark\bin directory together with the other executables that Spark uses but this is not essential.
Once done, you will need to set the following environment variable:
HADOOP_HOME = <your winutils ‘bin’ parent directory>
Note we don’t include the \bin, so for my example this is C:\Spark.
I have a post on installing Spark on Windows that might help if you get stuck on the WinUtils part.