Spark and dotnet in a Single Container

Ed Elliott shows how you can combine Spark and .NET Core in a single Docker container:

This is quite new syntax in docker and you need at least docker 17.05 (client and daemon), after the images “FROM blah” you can specify a name “core” in this case, then later you can copy from the first image to the second using “–from=” on the “COPY” command.

In this dockerfile I have added Spark 2.4.3 and the default environment variables we need to get spark running, if you grab this dockerfile and run “docker build -t dotnet-spark .” you should get an images you can then run which includes the dependencies for dotnet as well as spark.

Ed includes all of the scripts needed to test this out, too.

Related Posts

RDDs, DataFrames, and Datasets in Spark

Brad Llewellyn walks us through the three key data structures in Apache Spark: We see that creating an RDD can be done with one easy function.  In this snippet, sc represents the default SparkContext.  This is extremely important, but is better left for a later post.  SparkContext offers the .textFile() function which creates an RDD from […]

Read More

Flink’s State Processor API

Seth Wiesman and Fabian Hueske show off Apache Flink’s State Processor API: The State Processor API that comes with Flink 1.9 is a true game-changer in how you can work with application state! In a nutshell, it extends the DataSet API with Input and OutputFormats to read and write savepoint or checkpoint data. Due to […]

Read More

Categories

July 2019
MTWTFSS
« Jun Aug »
1234567
891011121314
15161718192021
22232425262728
293031