Misc Languages – Page 13

When I first started working with Apache Spark, one of the things I struggled with was that I would have some variable or data in my code that I wanted to work on with Apache Spark. To get the data in a state that Apache Spark can process it involves putting the data into a DataFrame. How do you take some data and get it into a DataFrame?
This post will cover all the ways to get data into a DataFrame in .NET for Apache Spark.

Click through for several methods.

Comments closed

ML.Net and One-Class Matrix Factorization

Published 2021-01-07 by Kevin Feasel

Sergey Tihon notices a problem:

After reading all these 3 samples I realised that I do not fully understand what is Label column is used for. Later I came to a conclusion that all three samples most likely are incorrect and here is why.

Click through for a description of the problem as well as the answer.

Comments closed

Apache Spark Performance Tuning

Published 2020-12-29 by Kevin Feasel

Tomaz Kastrun provides a few hints when performance tuning Apache Spark code:

DataFrame versus Datasets versus SQL versus RDD is another choice, yet it is fairly easy. DataFrames, Datasets and SQL objects are all equal in performance and stability (at least from Spar 2.3 and above), meaning that if you are using DataFrames in any language, performance will be the same. Again, when writing custom objects of functions (UDF), there will be some performance degradation with both R or Python, so switching to Scala or Java might be a optimisation.

Read on for the details. My version is “When performance matters the most, be willing to switch to Scala.” It’s not always correct, but is rarely outright bad advice.

Comments closed

Using Scala in a Databricks Notebook

Published 2020-12-22 by Kevin Feasel

Tomaz Kastrun take a look at the original Spark language:

Let us start with Databricks datasets, that are available within every workspace and are here mainly for test purposes. This is nothing new; both Python and R come with sample datasets. For example the Iris dataset that is available with Base R engine and Seaborn Python package. Same goes with Databricks and sample dataset can be found in /databricks-datasets folder.

Click through for the walkthrough and introduction to Scala as it relates to Apache Spark.

Comments closed

Running VBA Code in Outlook

Published 2020-12-10 by Kevin Feasel

Mike Bronowski does a thing I don’t want to do:

While ago I have written a post about saving Outlook attachments with PowerShell and that that was actually the thing I learned from the topic I want to describe today.
I could not use PowerShell at that moment (security, security), so had to figure it out in the most common scripting language in office – VBA.

Read on to learn how.

Comments closed

Creating an app with Suave and F#

Published 2020-12-09 by Kevin Feasel

Diogo Souza shows off the Suave framework:

F# is the go-to language if you’re seeking functional programming within the .NET world. It is multi-paradigm, flexible, and provides smooth interoperability with C#, which brings even more power to your development stack, but did you know that you can build APIs with F#? Not common, I know, but it’s possible due to the existence of frameworks like Suave.io.
Suave is a lightweight, non-blocking web server. Since it is non-blocking, it means you can create scalable applications that perform way faster than the ordinary APIs. The whole framework was built as a non-blocking organism.

I will shout from the rooftops that data platform developers should learn functional programming. In the .NET space, that’s F#.

Comments closed

Index and Range Operators in C# 8

Published 2020-11-19 by Kevin Feasel

Patrick Smacchia looks at two new operators in C#:

C#8 added the index ^ and range .. operators. In this post I am attempting to demystify both in the most comprehensive way.

Read on for the demos. I’m not sure I like how the range operator is exclusive on the right-hand side, but I suppose it’s just a matter of remembering that in this language, it’s exclusive and in others it can be inclusive.

Comments closed

An Intro to Time Series Databases

Published 2020-11-18 by Kevin Feasel

Kyle Buzzell looks at time series databases:

As the name implies, a time series database (TSDB) makes it possible to efficiently and continuously add, process, and track massive quantities of real-time data with lightning speed and precision. While other database models have been used for these kinds of workloads in the past, TSDBs utilize specific algorithms and architecture to deal with their unique needs.
In this piece, we’ll take a deeper look at time series databases, including the unique needs of the workloads they’re built for, their benefits, common use cases, and the TSDBs out there.

Click through for an overview. Time series databases are definitely a niche product, but they are really good inside that niche.

Comments closed

Stored Procedure Return Values and Entity Framework Core

Published 2020-11-03 by Kevin Feasel

Erik Ejlskov Jensen shows us how to retrieve the return value from a stored procedure using Entity Framework Core:

SQL Server stored procedures can return data in three different ways: Via result sets, OUTPUT parameters and RETURN values – see the docs here.
I have previously blogged about getting result sets with FromSqlRaw here and here.
I have blogged about using OUTPUT parameters with FromSqlRaw here.
In this post, let’s have a look at using RETURN values.

Click through for the process.

Comments closed

Setting the Default Command Timeout with Microsoft.Data.SqlClient

Published 2020-10-28 by Kevin Feasel

Erik Ejlskov Jensen shows us a way to set a default command timeout in .NET’s Microsoft.Data.SqlClient:

With the latest 2.1.0 preview 2 release of the open source .NET client driver for Microsoft SQL Server and Azure SQL Database, Microsoft.Data.SqlClient, it is now possible to set the default command timeout via the connection string.
Now you can work around timeout issues simply by changing the connection string, where this previously required changes to code, and maybe changes to code you did not have the ability to change.

This is pretty nice, as my recollection was that you could set connection timeout via connection string, but not command timeout. And not everything’s going to wrap up nicely within 30 seconds.

1 Comment

Category: Misc Languages

Loading a Spark DataFrame in .NET

ML.Net and One-Class Matrix Factorization

Apache Spark Performance Tuning

Using Scala in a Databricks Notebook

Running VBA Code in Outlook

Creating an app with Suave and F#

Index and Range Operators in C# 8

An Intro to Time Series Databases

Stored Procedure Return Values and Entity Framework Core

Setting the Default Command Timeout with Microsoft.Data.SqlClient