Using Akka In A Streaming Solution

Artem Rukavytsia shows us how you can easily integrate Akka into a solution with Kafka and Spark Streaming:

Akka gives you the opportunity to make logic for producing/consuming messages from Kafka with the Actor model. It’s very convenient if actors are widely used in your code and it significantly simplifies making data pipelines with actors. For example, you have your Akka Cluster, one part of which allows you to crawl of web pages and the other part of which makes it possible to index and send indexed data to Kafka. The consumer can aggregate this logic. Producing data to Kafka looks as follows:

The Actor model, which Akka implements, is something I kind of understand, but have never spent much time trying to implement.  I can see how it’d make perfect sense communicating with Kafka, though, given the scale and independence of consumers within a consumer group that Kafka provides.

What’s New In Spark 2.2?

Geetika Gupta shows us some of the updates in Apache Spark 2.2:

The major addition to this release is Structured Streaming. It has been marked as production ready and its experimental tag has been removed.

Some of the high-level changes and improvements :

  • Production ready Structured Streaming

  • Expanding SQL functionalities

  • New distributed machine learning algorithms in R

  • Additional Algorithms in MLlib and GraphX

Read on for more details.

Errors Using Native Prediction In SQL Server

Sacha Tomey walks us through a few potential issues when converting code which uses SQL Server Machine Learning Services’s sp_execute_external_script procedure to native PREDICT calls:

Stumble One:

Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80004001.
Model type is unsupported.


Not all models are supported. At the time of writing, only the following models are supported:

  • rxLinMod
  • rxLogit
  • rxBTrees
  • rxDtree
  • rxdForest

sp_rxPredict supports additional models including those available in the MicrosoftML package for R (I was using attempting to use rxFastTrees). I presume this limitation will reduce over time. The list of supported models is referenced in the PREDICT function (Documentation).

sp_rxPredict does require CLR, but it’s a viable alternative if you need to use a model not currently supported—like rxNeuralNet.

More On The New Service Model

Randolph West summarizes the new SQL Server patching model:

  • Every twelve months after GA, the installation files will be updated to contain all the Cumulative Updates in what is effectively now a service pack, but won’t be called that. This will also become the slipstream update. In other words, you’re more likely to be up to date when installing from scratch, later in the release cycle.

  • Customers on the GDR (General Distribution Release) release cycle will only get important security and corruption fixes, as before. You can switch to the standard CU release cadence any time, but once you do, you can’t switch back to GDR.

Brent Ozar thinks CU12 might become the new SP1 in the minds of managers:

So now fast forward to late 2018, early 2019. You’re about to build a new SQL Server for a project, and you have two choices:

  • SQL Server 2018 – which is basically the new dev branch, getting monthly updates, or
  • SQL Server 2017 (or 2016, or 2014) – which is the stable branch, getting quarterly updates

Once a version has hit CU12, and it only gets updates once a quarter, it might be considered Good Enough For Our Apps. Managers might see 2017/2016/2014 interchangeably at that point – which might be great for the second most recent version’s adoption.

It will be interesting to see how companies adopt this new model.

Active Directory On CentOS

Drew Furgiuele shows how to configure a box running CentOS to work with Active Directory:

Before we start though, there’s a few things you’re going to need to have already set up:

  • An Active Directory Domain to test in, and rights to administer it. Since we’re going to be creating (and possibly deleting, if there are errors) computer objects and a service account, you’ll need a domain account with adequate permissions.

  • My example assumes you have a Microsoft DNS server running alongside your domain services. It is possible to use a separate DNS server to get this to work, but you might need some additional network configuration (see below). Also, depending on your environment, you might need a reverse lookup zone defined. If you notice long ping times or other weird lookups, I’d set one up in your DNS.

  • A machine (virtual or otherwise) that is running CentOS 7 or later (and this guide was written and tested against CentOS 7). For this demo, we’ll be using the Server (minimal install) installation option.  If you’re new to Linux, you might opt a desktop version (server with a GUI). When you download a CentOS disk image to install it, you get all these options on the default media; you won’t need separate downloads

There are a few more prereqs, so read the whole thing.  This route is easier than Ubuntu, as Drew notes.

Unattended Installation Of SQL Server 2017 On Linux

Denzil Ribeiro walks us through an unattended installation and configuration of SQL Server 2017 on Linux:

SQL 2017 bits are generally available to customers today. One of the most notable milestones in the 2017 release is SQL Server on Linux. Setup has been relatively simple for SQL Server on Linux, but often there are questions around unattended install. For SQL Server on Linux, there are several capabilities that are useful in unattended install scenarios:

  • You can specify environment variables prior to the install that are picked up by the install process, to enable customization of SQL Server settings such as TCP port, Data/Log directories, etc.

  • You can pass command line options to Setup.

  • You can create a script that installs SQL Server and then customizes parameters post-install with the mssql-conf

The sample script link seems like it’s broken, but you can see it all on Denzil’s Github repo.

New Docker Tag For SQL Server 2017

Hamish Watson notes that there are new Docker tags for SQL Server 2017:

I have been using SQL Server 2017 running on Linux for a while now (blog post pending) and use the official images from:

To get the latest I used to run

docker pull microsoft/mssql-server-linux:latest

However today I noticed that the :latest tag had been removed:

Click through to see the tag you probably want to use.

Using XPath To Shred HTML

Shannon Lowder shows off the HTML Agility Path project to help him parse the contents of webpages:

Let’s say we wanted the table.  We could use the XPath /html/body/table to retrieve it. We can also use XPath to refer to a collection.  Let’s say we wanted all the rows. We would use the XPath /html/body/table/tr. We would get a collection of three rows.  Notice the XPath looks a lot like a Linux or windows folder path.  That’s the idea of XPath!

I would like to point out a couple of extra points.  First, XPath is case sensitive.  So if I had tried to use /html/body/table/TR, I would find no nodes.

Second, you can use “short hand” in your XPath queries.  //body/table/tr would get you to the same place /html/body/table/tr did.

This intro is part of a series Shannon has started on scraping data from websites.


October 2017
« Sep