Spark for .NET Developers

Ed Elliott has a long-form post covering spark-dotnet:

The .NET driver is made up of two parts, and the first part is a Java JAR file which is loaded by Spark and then runs the .NET application. The second part of the .NET driver runs in the process and acts as a proxy between the .NET code and .NET Java classes (from the JAR file) which then translate the requests into Java requests in the Java VM which hosts Spark.

The .NET driver is added to a .NET program using NuGet and ships both the .NET library as well as two Java jars. One jar is for Spark 2.3 and one for Spark 2.4, and you do need to use the correct one on your installed version of Scala.

As much as I’ve enjoyed his series, getting it in a single-post format is great.

Related Posts

Calculating YARN Utilization Metrics

Dmitry Tolpeko shows how you can calculate per-second cluster utilization measures from YARN’s resource manager logs: But even if you query YARN REST API every second it still can only provide a snapshot of the used YARN resources. It does not show which application allocates or releases containers, their memory and CPU capacity, in which […]

Read More

Registering SignalR to the Cosmos DB Change Feed

Hasan Savran shows us how we can hook up SignalR to view the Cosmos DB Change Feed: SignalR allows server code to send asynchronous notifications to client-side web applications. By using it, Azure Functions can send real-time messages to your web applications. Prices can get change whenever data changes in database. Notices can be sent […]

Read More

Categories

August 2019
MTWTFSS
« Jul Sep »
 1234
567891011
12131415161718
19202122232425
262728293031