Press "Enter" to skip to content

Category: Misc Languages

Contrasting Scala and Python wrt Spark

Sanjay Rathore contrasts two of the three key Apache Spark languages:

Imagine the first day of a new Apache Spark project. The project manager looks at the team and says: which one to choose, scala or python. So let’s start with “scala vs python for spark”. 

You may wonder if this is a tricky question. What does the enterprise demand say? Is this like asking iOS or Android? Is there a right or wrong answer?

So we are here to inform and provide clarity. Today we’re looking at two popular programming languages, Scala and Python, and comparing them in the context of Apache Spark and Big Data in general.

Read on for the comparison. I’m at a point where I think it’s wise to know both languages and roll with whichever is there. If you’re in a greenfield Spark implementation, pick the one you (or your team) is more comfortable with. If you’re equally comfortable with the two, pick Scala because it’s a functional programming language and those are neat.

Leave a Comment

Uncommenting XML from C#

Joy George Kunjikkur needs to remove some XML comment tags:


As part of the installation, some XML fragments (eg: <authentication>) need to be uncommented in web.config file based on the environment,. This can be done either via PowerShell or C#.Net as this has to be triggered from MSI installation. Never during the runtime of the application.


We can either do string-based detection and replace it. Or use XML parser of .Net. Since the string parser is complex, let us stick with the .Net library to replace it.

Read on for one way to do this.

Comments closed

A Modern C++ Kafka API

Kenneth Jia and Benedek Thaler announce an open source library:

Morgan Stanley uses Apache Kafka® to publish market data to internal clients and to persist it for replay purposes. We started out using librdkafka’s C++ API, which maintains C++98 compatibility. C++ is evolving quickly, and we wanted to break away from this compatibility requirement so we could take advantage of new C++ features. This led us to create a new C++ API for Kafka that uses modern C++ features (i.e. C++14 and later). We’ve open sourced this client and hope you enjoy it.

Click through to learn more. What interests me about this is that most of the other languages’ support for Kafka (for example, .NET) is based off of librdkafka. I don’t know if there’s any benefit to moving to this new library.

Comments closed

Predicting Insurance Prices with ML.NET

Chandra Kudumula shows off ML.NET:

There are three ways to begin with ML.NET

– API Model: You can start ML.NET through a Framework API and write code in C# or F#
– GUI Model: Use ML.NET Model builder in Visual Studio.
– CLI Model: For cross-platform development like Mac and Linux, use ML.NET CLI.

Let’s get started with API Model for predicting the insurance premium using ML.NET Framework.

I’m using Microsoft (MS) Visual Studio 2019 and creating a Console Application. Be sure that you have the latest version of VS and that .NET 5 SDK is installed.

Click through for the demo in Visual Studio using C#.

Comments closed

Embracing the XML

Grant Fritchey has some advice:

While XML is, without a doubt, a giant pain in the bottom, sometimes, the best way to deal with Extended Events is to simply embrace the XML.

Now, I know, just last week, I suggested ways to avoid the XML. I will freely admit, that is my default position. If I can avoid the XML, I will certainly do it. However, there are times where just embracing the XML works out nicely. Let’s talk about it a little.

Just need to do a little victory dance here. I didn’t explicitly say “embrace the XML” but close enough…

I think the biggest problem DBAs have with XML is that they end up treating it like a dreadful task: I need to shred XML for an extended event. But to do that, I have to learn how to query it using this quasi-language, and so they get stuck trying to fuss with something somebody else did, moving symbols around in the hopes that they get the right incantation. By contrast, a day or two really focusing in on how XQuery and XPath work would clarify a lot and make the process much simpler.

There is a fair counter-point in asking how often you’ll use this, and if the answer is “probably never,” then poke through and just try to get it working. But I’ve got a bit of bad news: “probably never” is probably wrong.

Comments closed

Retrieving Counts of Cosmos DB Collections

Manoj Pandey shows how you can retrieve counts of records in Cosmos DB using the .NET client:

Here in this post we will use C# .net code (for beginners like me) to see how to:
1. Connect to a Cosmos DB instance
2. Get list of all Databases in a Cosmos DB
3. Iterate through all the Databases and get the list of all Collections (or Tables)
4. Get COUNT of all documents/items (or records) in these Collections

Click through to see how.

Comments closed

Avoiding Temporal Coupling in Code

Yamini Bansal explains a common error in class and method design:

Temporal means time based. So, consider if the time(instant/order) at which one member of a class is called, affects the calling of another member. Or, if sequence of calling members of a class is something that should be kept in mind before using that class’s member then it means they are coupled.

Click through for an example. The basic concept is, I shouldn’t need to know that I must call setup method X() before I can take advantage of some useful method Y(). This is because a new person coming in might not realize that X() exists, will try to call Y(), and something breaks. Calling a method with a valid set of input parameters and having it immediately break is a sign of a dodgy API.

Comments closed

Configurable Retry in Microsoft.Data.SqlClient

Hasan Savran notes an improvement to the Microsoft.Data.SqlClient library:

You need to watch for Transient errors if you use SQL Server in Azure. Transient errors or Retriable errors can occur any time and your application should be smart enough to retry these failed operations. Azure might quickly shift hardware resources of your database to give you a better load-balance, when this happens your application might not be able to connect to the database. Since these reconfiguration events completes quickly, your application needs to be designed to handle these faults.This adds more complexity to your code because you need to write code to handle this manually. 

      Preview version of  Microsoft.Data.SqlClient library now supports RetryLogic function, you do not need to write any manual code to handle Transient or retriable errors anymore. 

Click through for more details as well as a demonstration. I’m surprised it took this long, to be honest—useful retry logic is exactly the type of thing which should be in the bowels of a library rather than littered throughout business code (or worse, not even in business code).

Comments closed

Using the tree Command

Denis Gobo learns a new trick:

I was watching a Pluralsight course and the person typed in the tree command.. and I was like whoaaaa.. How do I not know this?  Perhaps maybe because I don’t use the command window all that much?  Anyway I thought that this was pretty cool

As you can see tree list all the directories and sub directories in a tree like structure. This is great to quickly see all the directories in one shot

It’s a useful command. And if you’re on Linux, there are a lot of useful switches. If you’re on Windows, there are fewer useful switches.

Comments closed

Kafka in .NET

Diogo Souza walks us through building an application which produces and consumes messages using Apache Kafka:

Kafka is just the broker, the stage in which all the action takes place. The producers send messages to the world while the consumers read specific chunks of data. How do you differentiate one specific portion of data from the others? How do consumers know what data to consume? To understand this, you need a new actor in the play: the topics.

Kafka topics are the channels, the carriage that transport messages around. Kafka records produced by producers are organized and stored into topics.

This is a nice overview of Kafka followed by the basics of building a consumer and a producer in C#. I just wish that there was more community usage of Kafka so that the Confluent .NET driver would include some of the really cool stuff they’ve added to Kafka over the past couple of years.

Comments closed