Category: Misc Languages

A common problem working with Excel data is Excel itself. Working with it programatically requires an installation of Office, and the resulting license cost, and once everything is set, you’re still working with COM objects which present its own set of challenges. If only there was a better way.

Enter, the better way – EPPlus. This is an open source library that wraps the OpenXml library which allows you to simply reference a DLL. No more installation hassles, no more licensing (LGPL) expense, just a simple reference you can package with your solutions.

Let’s look at an example.

Read on for the example. A couple alternatives I like are readxl and XLConnect in R.

Comments closed

Building A Basic Kafka Producer

Published 2018-10-02 by Kevin Feasel

M. Mallikarjun shows us a simple producer in Kafka:

A Kafka producer is an application that can act as a source of data in a Kafka cluster. A producer can publish messages to one or more Kafka topics.

So, how many ways are there to implement a Kafka producer? Well, there are a lot! But in this article, we shall walk you through two ways.

Kafka Command Line Tools

Kafka Producer Java API

You can write producers in quite a few languages. Java is the example here, but there are several libraries, including a good one for .NET.

Comments closed

Writing Higher-Order Functions With Scala

Published 2018-10-01 by Kevin Feasel

Jyoti Sachdeva explains the concept of higher-order functions and shares an example in Scala:

In this blog, I’m going to explain higher-order functions.

A higher order function takes other function as a parameter or return a function as a result.

This is possible because functions are first-class value in scala. What does that mean?

It means that functions can be passed as arguments to other functions and functions can return other function.

The map function is a classic example of a higher order function.

Higher-order functions are one of the key components to functional programming and allows us to reason in small chunks at a time

Comments closed

What’s New With Machine Learning Services

Published 2018-09-28 by Kevin Feasel

Niels Berglund looks at SQL Server 2019’s Machine Learning Services offering for updates:

So, when I read What’s new in SQL Server 2019, I came across a lot of interesting “stuff”, but one thing that stood out was Java language programmability extensions. In essence, it allows us to execute Java code in SQL Server by using a pre-built Java language extension! The way it works is as with R and Python; the code executes outside of the SQL Server engine, and you use sp_execute_external_script as the entry-point.

I haven’t had time to execute any Java code as of yet, but in the coming days, I definitely will drill into this. Something I noticed is that the architecture for SQL Server Machine Learning Services has changed (or had additions to it).

That Java support is for Spark, I’d imagine. And I hope they allow for Scala.

Comments closed

Hadoop + SQL Server In 2019

Published 2018-09-27 by Kevin Feasel

Travis Wright shows off a big part of what the SQL Server team has been working on the last couple of years:

SQL Server 2019 big data clusters provide a complete AI platform. Data can be easily ingested via Spark Streaming or traditional SQL inserts and stored in HDFS, relational tables, graph, or JSON/XML. Data can be prepared by using either Spark jobs or Transact-SQL (T-SQL) queries and fed into machine learning model training routines in either Spark or the SQL Server master instance using a variety of programming languages, including Java, Python, R, and Scala. The resulting models can then be operationalized in batch scoring jobs in Spark, in T-SQL stored procedures for real-time scoring, or encapsulated in REST API containers hosted in the big data cluster.

SQL Server big data clusters provide all the tools and systems to ingest, store, and prepare data for analysis as well as to train the machine learning models, store the models, and operationalize them.
Data can be ingested using Spark Streaming, by inserting data directly to HDFS through the HDFS API, or by inserting data into SQL Server through standard T-SQL insert queries. The data can be stored in files in HDFS, or partitioned and stored in data pools, or stored in the SQL Server master instance in tables, graph, or JSON/XML. Either T-SQL or Spark can be used to prepare data by running batch jobs to transform the data, aggregate it, or perform other data wrangling tasks.

Data scientists can choose either to use SQL Server Machine Learning Services in the master instance to run R, Python, or Java model training scripts or to use Spark. In either case, the full library of open-source machine learning libraries, such as TensorFlow or Caffe, can be used to train models.

Lastly, once the models are trained, they can be operationalized in the SQL Server master instance using real-time, native scoring via the PREDICT function in a stored procedure in the SQL Server master instance; or you can use batch scoring over the data in HDFS with Spark. Alternatively, using tools provided with the big data cluster, data engineers can easily wrap the model in a REST API and provision the API + model as a container on the big data cluster as a scoring microservice for easy integration into any application.

I’ve wanted Spark integration ever since 2016 and we’re going to get it.

Comments closed

When Cassandra Makes Sense

Published 2018-08-21 by Kevin Feasel

Anmol Sarna explains the pros and cons of using Apache Cassandra:

But as we know nothing is perfect. So is the Cassandra Database. What I mean by this is that you cannot have a perfect package. If you wish for one brilliant feature then you might have to compromise on the other features. In today’s blog, we will be going through some of the benefits of selecting Cassandra as your database as well as the problems/drawbacks that one might face if he/she chooses Cassandra for his/her application.
I have also written some blogs earlier which you can go through for reference if you want to know What Cassandra is, How to set it up and how it performs its Reads and Writes.

The only question we have is that should we or should we not pick Cassandra over the other databases that are available. So let’s start by having a quick look at when to use the Cassandra Database. This will give a clear picture to all those who are confused in decided whether to give Cassandra a try or not.

This is a level-headed analysis of Cassandra, so check it out.

Comments closed

The Basic Paradigms Of Functional Programming

Published 2018-08-20 by Kevin Feasel

Ayush Hooda explains a couple core principles behind functional programming:

A pure function can be defined like this:

The output of a pure function depends only on(a) its input parameters and(b) its internal algorithm,which is unlike an OOP method, which can depend on other fields in the same class as the method.
A pure function has no side effects, i.e., that it does not read anything from the outside world or write anything to the outside world. – For example, It does not read from a file, web service, UI, or database, and does not write anything either.
As a result of those first two statements, if a pure function is called with an input parameter x an infinite number of times, it will always return the same result y. – For instance, any time a “string length” function is called with the string “Ayush”, the result will always be 5.

If I got to add one more thing, it’d be the idea that functions are first-class data types. In other words, a function can be an input to another function, the same as any other data type like int, string, etc. It takes some time to get used to that concept, but once you do, these types of languages become quite powerful.

Comments closed

Writing ssisUnit Tests With C#

Published 2018-08-13 by Kevin Feasel

Bartosz Ratajczyk shows us how to create ssisUnit tests in MSTest with C#:

In the post about using MSTest framework to execute ssisUnit tests, I used parts of the ssisUnit API model. If you want, you can write all your tests using this model, and this post will guide you through the first steps. I will show you how to write one of the previously prepared XML tests using C# and (again) MSTest.

Why MSTest? Because I don’t want to write some application that will contain all the tests I want to run, display if they pass or not. When I write the MSTest tests, I can run them using the Test Explorer in VS, using a command line, or in TFS.

UIs are great for learning how to do things and for one-off actions, but writing code scales much better in terms of time.

Comments closed

Sorting Data In Scala

Published 2018-08-03 by Kevin Feasel

Randhir Singh walks us through several methods of sorting in Scala:

sortBy(attribute)

Here is signature
def sortBy[B](f: A => B)(implicit ord: Ordering[B]): Repr 
The sortBy function is used to sort one or more attributes.
Here is a small example.
sort based on a single attribute of the case class.

Click through for several examples.

Comments closed

Finding And Fixing The N+1 Problem With ORMs

Published 2018-07-31 by Kevin Feasel

Richie Rump explains the N+1 problem with object-relational mappers and shows you how to avoid it with Entity Framework:

The problem is that in our original query we’re not getting data from the LinkedPosts entity, just data from Posts and PostTags. Entity Framework knows that it doesn’t have the data for the LinkPosts entity, so it very kindly gets the data from the database for each row in the query results.

Whoops!

Obviously, making multiple calls to the database instead of one call for the same data is slower. This is a perfect example of RBAR (row by agonizing row) processing.

Read the comments for more answers on top of Richie’s. My answer (only 70% tongue in cheek)? Functional programming languages don’t require ORMs.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31