Category: Misc Languages

Using the Spark Connect GRPC API

Published 2024-01-26 by Kevin Feasel

In the first two posts, we looked at how to run some Spark code, firstly against a local Spark Connect server and then against a Databricks cluster. In this post, we will look more at the actual gRPC API itself, namely ExecutePlan, Config, and AddArtifacts/ArtifactsStatus.

Click through to see how it all works, with plenty of C# code to guide you along the way.

Comments closed

Running Spark Jobs on Databricks with Spark Connect and .NET

Published 2024-01-24 by Kevin Feasel

Ed Elliott runs a Databricks job:

This post aims to show how we can create a .NET application, deploy it to Databricks, and then run a Databricks job that calls our .NET code, which uses Spark Connect to run a Spark job on the Databricks job cluster to write some data out to Azure storage.

In the previous post, I showed how to use the Range command to create a Spark DataFrame and then save it locally as a parquet file. In this post, we will use the Sql command, which will return a DataFrame or, in our world, a Relation. We will then pass that relation to a WriteOperation command, which will write the results of the Sql out to Azure storage.

The code is available HERE

Read on for the description of how everything works.

Comments closed

Using Spark Connect from .NET

Published 2024-01-18 by Kevin Feasel

Ed Elliott keeps the hope alive:

Over the past couple of decades working in IT, I have found a particular interest in protocols. When I was learning how MSSQL worked, I spent a while figuring out how to read data from disk via backups rather than via the database server (MS Tape Format, if anyone cared). I spent more time than anyone should learning how to parse TDS (before the [MS-TDS] documentation was a thing)—having my head buried in a set of network traces and a pencil and pen has given me more pleasure than I can tell you.

This intersection of protocols and Spark piqued my interest in using Spark Connect to connect to Spark and run jobs from .NET rather than Python or Scala.

There’s a whole lot more ceremony involved than the Microsoft .NET for Apache Spark project, but read on to see how it all works. Also, I hereby officially chastise Ed for having examples in C# and VB.NET but not the greatest .NET language of them all: F#. Chastisement aside, I appreciate the work Ed put into this to bring Spark Connect to the .NET masses.

Comments closed

What’s New in .NET 8

Published 2023-12-28 by Kevin Feasel

Thao Nguyen keeps us up to date on .NET:

Microsoft has announced .NET 8 recently. It emphasized the cloud, performance, full-stack Blazor and .NET MAUI as major highlights of the latest edition of the company’s free, cross-platform, open-source developer platform.

Of special importance to enterprises, .NET 8 is a long-term support (LTS), which means it will be supported and patched for three years as opposed to 18 months for a standard term support (STS) release.

Click through for the list, as it’s good to keep at least one eye on the developers lest they run around unopposed.

Comments closed

Exploratory Data Analysis with F# and Plotly

Published 2023-12-26 by Kevin Feasel

Matt Eland is speaking my language (F#):

One of the most common tasks with data roles is the need to perform exploratory data analysis (EDA).

With EDA a data scientist, data analyst, or other data-oriented programmer can:

Understand the value distributions of their data

Identify outliers and data anomalies

Visualize correlations, trends, and relationships between multiple variables

Exploratory data analysis usually involves:

Loading the data into a DataFrame

Performing descriptive statistics to identify the raw shape of the data

Visualizing variables of interest on their own or with other variables.

In this article I’ll walk you through the process of loading data from a sample dataset into a Microsoft.Data.Analysis DataFrame (the kind featured in ML.NET). Next, we’ll look at the descriptive statistics the DataFrame class provides and then explore the process of creating some simple visualizations with Plotly.NET.

Read on for the scenario and analysis.

Comments closed

Load Balancing across Azure SQL DBs

Published 2023-12-22 by Kevin Feasel

Jose Manuel Jurado Diaz scales out:

In today’s data-driven landscape, we are presented with numerous alternatives like Elastic Queries, Data Sync, Geo-Replication, ReadScale, etc., for distributing data across multiple databases. However, in this approach, I’d like to explore a slightly different path: creating two separate databases containing data from the years 2021 and 2022, respectively, and querying them simultaneously to fetch results. This method introduces a unique perspective in data distribution — partitioning by database, which could potentially lead to more efficient resource utilization and enhanced performance for each database. While partitioning within a single database is a common practice, this idea ventures into partitioning across databases.

Click through to see what the code looks like for this.

Comments closed

Getting Started with Semantic Kernel in C#

Published 2023-12-05 by Kevin Feasel

Matt Eland tries out Semantic Kernel:

Generative AI systems use large language models (LLMs) like OpenAI’s GPT 3.5 Turbo (ChatGPT) or GPT-4 to respond to text prompts from the user. But these systems have serious limitations in that they only include information baked into the model at the time of training. Technologies like retrieval augmentation generation (RAG) help overcome this by pulling in additional information.

AI orchestration frameworks make this possible by tying together LLMs and additional sources of information via RAG. Additionally, AI orchestration systems can provide capabilities to generative AI systems, such as inserting records in a database, sending emails, or calling out to external systems.

In this article we’ll look at the high-level capabilities building AI orchestration systems in C# with Semantic Kernel, a rapidly maturing open-source AI orchestration framework.

Click through to see how things work.

Comments closed

A Primer on Functional Programming

Published 2023-10-17 by Kevin Feasel

Anirban Shaw gives us the skinny:

In the ever-evolving landscape of software development, there exists a paradigm that has been gaining momentum and reshaping the way we approach coding challenges: functional programming.

In this article, we delve deep into the world of functional programming, exploring its advantages, core principles, origin, and reasons behind its growing traction.

I like this as an introduction to the topic, helping explain what functional programming languages are and why they’ve become much more interesting over the past 15-20 years. Anirban hits the topic of concurrency well, showing how a functional approach with immutable data makes it easy for multiple machines to work on separate parts of the problem independently and concurrently without error. I’d also add one more bit: functional programming languages tend to be more CPU-intensive than imperative languages, so in an era of strict computational scarcity, imperative languages dominate. With strides in computer processing, we tend to be CPU-bound less often, so the trade-off of some CPU for the benefits of FP makes a lot more sense. H/T R-Bloggers.

Comments closed

Drawing Tubular Paths with Julia and R

Published 2023-08-07 by Kevin Feasel

Stephane Laurent writes some code:

I implemented the framed closed curves exposed in this blog post, in Julia and R. In fact it is useless with R, because the rgl function cylinder3d is faster and better.

Here is the Julia implementation:

Click through for that code, as well as equivalent R code. H/T R-Bloggers

Comments closed

Switch Statements and Expressions in C#

Published 2023-06-06 by Kevin Feasel

Hasan Savran points out the overloaded nature of switch in C# 8 and later:

It works great but the break and the case syntaxes are getting duplicated, new switch syntax gets rid of the case, and the break statements. Here how this example looks like using the new switch syntax.

Click through for Hasan’s demo. Basically, this is the difference between a statement and an expression. C#’s switch keyword has historically been a statement: given some input, perform an action but do not return an output. Performing an action within the function is known as a side effect and it adds some mental overhead to the way we process things, especially as your methods get more complex and you have to keep track of more things in your mind at once.

By contrast, Hasan’s second example is switch as an expression, which is more in the F# style and an example of why I like to joke about how what you’ll find in C# vNext is what you got in F# two versions ago. An expression is an operation which takes an input and returns an output without performing any actions causing side effects along the way. This makes expressions easier to diagram and conceptualize than statements, though statements offer more flexibility, especially when you do want to take radically different actions depending on some given input.

Comments closed