Akka gives you the opportunity to make logic for producing/consuming messages from Kafka with the Actor model. It’s very convenient if actors are widely used in your code and it significantly simplifies making data pipelines with actors. For example, you have your Akka Cluster, one part of which allows you to crawl of web pages and the other part of which makes it possible to index and send indexed data to Kafka. The consumer can aggregate this logic. Producing data to Kafka looks as follows:
The Actor model, which Akka implements, is something I kind of understand, but have never spent much time trying to implement. I can see how it’d make perfect sense communicating with Kafka, though, given the scale and independence of consumers within a consumer group that Kafka provides.
The major addition to this release is Structured Streaming. It has been marked as production ready and its experimental tag has been removed.
Some of the high-level changes and improvements :
Production ready Structured Streaming
Expanding SQL functionalities
New distributed machine learning algorithms in R
Additional Algorithms in MLlib and GraphX
Read on for more details.
Stumble One:Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80004001. Model type is unsupported.
Not all models are supported. At the time of writing, only the following models are supported:
sp_rxPredict supports additional models including those available in the MicrosoftML package for R (I was using attempting to use rxFastTrees). I presume this limitation will reduce over time. The list of supported models is referenced in the PREDICT function (Documentation).
sp_rxPredict does require CLR, but it’s a viable alternative if you need to use a model not currently supported—like rxNeuralNet.
Every twelve months after GA, the installation files will be updated to contain all the Cumulative Updates in what is effectively now a service pack, but won’t be called that. This will also become the slipstream update. In other words, you’re more likely to be up to date when installing from scratch, later in the release cycle.
Customers on the GDR (General Distribution Release) release cycle will only get important security and corruption fixes, as before. You can switch to the standard CU release cadence any time, but once you do, you can’t switch back to GDR.
So now fast forward to late 2018, early 2019. You’re about to build a new SQL Server for a project, and you have two choices:
- SQL Server 2018 – which is basically the new dev branch, getting monthly updates, or
- SQL Server 2017 (or 2016, or 2014) – which is the stable branch, getting quarterly updates
Once a version has hit CU12, and it only gets updates once a quarter, it might be considered Good Enough For Our Apps. Managers might see 2017/2016/2014 interchangeably at that point – which might be great for the second most recent version’s adoption.
It will be interesting to see how companies adopt this new model.
Before we start though, there’s a few things you’re going to need to have already set up:
An Active Directory Domain to test in, and rights to administer it. Since we’re going to be creating (and possibly deleting, if there are errors) computer objects and a service account, you’ll need a domain account with adequate permissions.
My example assumes you have a Microsoft DNS server running alongside your domain services. It is possible to use a separate DNS server to get this to work, but you might need some additional network configuration (see below). Also, depending on your environment, you might need a reverse lookup zone defined. If you notice long ping times or other weird lookups, I’d set one up in your DNS.
A machine (virtual or otherwise) that is running CentOS 7 or later (and this guide was written and tested against CentOS 7). For this demo, we’ll be using the Server (minimal install) installation option. If you’re new to Linux, you might opt a desktop version (server with a GUI). When you download a CentOS disk image to install it, you get all these options on the default media; you won’t need separate downloads
There are a few more prereqs, so read the whole thing. This route is easier than Ubuntu, as Drew notes.
SQL 2017 bits are generally available to customers today. One of the most notable milestones in the 2017 release is SQL Server on Linux. Setup has been relatively simple for SQL Server on Linux, but often there are questions around unattended install. For SQL Server on Linux, there are several capabilities that are useful in unattended install scenarios:
You can specify environment variables prior to the install that are picked up by the install process, to enable customization of SQL Server settings such as TCP port, Data/Log directories, etc.
You can pass command line options to Setup.
You can create a script that installs SQL Server and then customizes parameters post-install with the mssql-conf
The sample script link seems like it’s broken, but you can see it all on Denzil’s Github repo.
I have been using SQL Server 2017 running on Linux for a while now (blog post pending) and use the official images from:
To get the latest I used to run
docker pull microsoft/mssql-server-linux:latest
However today I noticed that the :latest tag had been removed:
Click through to see the tag you probably want to use.
Let’s say we wanted the table. We could use the XPath /html/body/table to retrieve it. We can also use XPath to refer to a collection. Let’s say we wanted all the rows. We would use the XPath /html/body/table/tr. We would get a collection of three rows. Notice the XPath looks a lot like a Linux or windows folder path. That’s the idea of XPath!
I would like to point out a couple of extra points. First, XPath is case sensitive. So if I had tried to use /html/body/table/TR, I would find no nodes.
Second, you can use “short hand” in your XPath queries. //body/table/tr would get you to the same place /html/body/table/tr did.
This intro is part of a series Shannon has started on scraping data from websites.