Blockchain For Business Notes

Allison Tharp has some notes on an edX course entitled Blockchain for Business.  This looks like it will be a multi-part series.  Part one:

distributed ledger is a data structure that is spread across multiple computers (which are usually spread across locations or regions).  Distributed ledger technologies have three basic components:

  • A data model to capture the current state of the ledger
  • A language of transactions to track the changes in the ledger state
  • A protocol that builds consensus among participants around which the transactions can be accepted

In other words, we can think of a distributed ledgers as databases which are shared among peers and do not rely on any central authority or intermediary.  Instead of having a central database, every participant has their own copy which stays in sync via the pre-established protocol.  Each participant verifies transactions and speaks a common language to ensure universal agreement on the state of the ledger.

Part two:

Another consensus algorithm is called the Proof of Stake algorithm.  With this algorithm, the nodes are known as validators and instead of mining the blockchain, they validate the transactions to earn a transaction fee.  Instead of creating new coins (as is the case in Bitcoin), all of the coins exist from the very beginning.  Another way to look at this is that the nodes are randomly selected to validate blocks.  The likelihood of the random selection will depend on how many coins the node holds (this is known as the amount of stake they hold).

Blockchain has gone from wacky idea to interesting business concept over the course of about a decade.  It’ll be interesting to see if it catches on to be a vital business concept in the next ten years.

Avro Schemas In Kafka

Kevin Feasel



Stephane Maarek explains the value of using Apache Avro as a schema structure for your Kafka topics:

  • Avro has support for primitive types ( intstringlongbytes, etc…), complex types (enumarraysunions, optionals), logical types (datestimestamp-millisdecimal), and data record (name and namespace). All the types you’ll ever need.

  • Avro has support for embedded documentation. Although documentation is optional, in my workflow I will reject any Avro Schema PR (pull request) that does not document every single field, even if obvious. By embedding documentation in the schema, you reduce data interpretation misunderstandings, you allow other teams to know about your data without searching a wiki, and you allow your devs to document your schema where they define it. It’s a win-win for everyone.

  • Avro schemas are defined using JSON. Because every developer knows or can easily learn JSON, there’s a very low barrier to entry

Read on for more about Avro as well as the possibilities of using other techniques for defining schemas in Kafka.

When Spark Meets Hive

Anna Martin and Rosaria Silipo look at combining HiveQL and SparkQL:

We set our goal here to investigate the age distribution of Maine residents, men and women, using SQL queries. But the question is… on Apache Hive or on Apache Spark? Well, why not both? We could use SparkSQL to extract men’s age distribution and HiveQL to extract women’s age distribution. We could then compare the two distributions and see if they show any difference.

But the main question, as usual, is: Will SparkSQL queries and HiveQL queries blend?

Topic: Age distribution for men and women in the U.S. state of Maine.

Challenge: Blend results from Hive SQL and Spark SQL queries.

Access mode: Apache Spark and Apache Hive nodes for SQL processing.

Using KNIME, the authors are able to blend together data from different sources.

Troubleshooting Polybase Installation Errors

John Paul Cook has an article looking at what to do when you have Oracle JRE 9 and want to use Polybase:

This article shows you how to troubleshoot a failed installation of SQL Server and how to implement a workaround to allow SQL Server 2017’s PolyBase feature to be installed when version 9 of the Java Runtime Environment (JRE) is present. An installation of all features in SQL Server 2017 has three external dependencies. Python, R, and the JRE are third party or open source software needed in a full installation. Changes to external software after the release of SQL Server 2017 can introduce breaking changes. Oracle, the company that owns Java, changed how Windows registry keys are named. This caused a breaking change for SQL Server 2017. Version 8 of the JRE is compatible with the SQL Server 2017 installer. Version 9 of the JRE is not. If version 9 of the JRE is the only version of the JRE on a Windows machine, it is not possible to install the PolyBase feature. The JRE version bug also is found in the SQL Server 2016 installer. The same workaround works for both SQL Server 2016 and 2017.

Good article.

Using mssql-cli

Kevin Feasel



Alan Yu announces mssql-cli, a command-line interface for SQL Server:

Mssql-cli is a new and interactive command line tool that provides the following key enhancements over sqlcmd in the Terminal environment:

  • T-SQL IntelliSense
  • Syntax highlighting
  • Pretty formatting for query results, including Vertical Format
  • Multi-line edit mode
  • Configuration file support

Mssql-cli aims to offer an improved interactive command line experience for T-SQL. It is fully open source under the BSD-3 license, and a contribution to the dbcli organization, an open source suite of interactive CLI tools for relational databases including SQL Server, PostgresSQL, and MySQL. The command-line UI is written in Python and the tool leverages the same microservice backend (sqltoolsservice) that powers the VS Code SQL extension, SQL Operations Studio, and the other Python CLI tool we announced earlier, mssql-scripter.

Something very cool that Alan points out is that this is “the first time our team is contributing source code to an existing open source organization with a commitment to be a good citizen in an existing open source community.”

The Power BI Gateway

Reza Rad explains how the Power BI gateway works:

You don’t need a gateway in all scenarios. Only if the data source is located on-premises, you need a gateway. For online or cloud-based data sources, no gateway is required. For example; if you are getting data from CRM Online, you don’t need a gateway. However, if you are getting data from SQL Server database located on your local domain server, then you need a gateway. For Azure SQL DB you don’t need a gateway. However, a SQL Server database located on Azure Virtual Machine is considered as on-premises and needs gateway.

This post could not have come at a better time for me, so I’m definitely happy to see it.

Take The 2018 Data Professional Salary Survey

Brent Ozar has the 2018 edition of his Data Professional Salary Survey:

A few things to know:

  • It’s totally anonymous (we’re not getting your email, IP address, or anything like that.)

  • It’s open to all database platforms.

  • As with last year’s results, we’ll publish the raw data in Excel for anyone to analyze. If you want to set up your analysis ahead of time, here’s the incoming raw results as they happen, and we’ll share them in that exact same format.

Please take the survey, especially if you’re hitting Curated SQL for the analytics or Hadoop/Spark side of things rather than the SQL Server side.  That way there’s a broader distribution of entries.

Failed To Open Loopback Connection

Arun Sirpal diagnoses a connection problem:

I could not read my error log on one of my local SQL Servers, when I executed the following code:

EXEC sp_readerrorlog

I received the below:

Msg 22004, Level 16, State 1, Line 2 Failed to open loopback connection. Please see event log for more information. Msg 22004, Level 16, State 1, Line 2 Error log location not found.

Fortunately, the error logs had a bit more detail, so Arun has the answer for you.

Unable To Create The Certificate Binding

Hamish Watson walks through changing an expired certificate in SQL Server Reporting Services:

This blog post is around the situation where you have SSRS setup to use HTTPS and thus using a certificate and the certificate expires (or just needs replacing). We had caught the initial error via our Continuous Monitoring of the SSRS site — basically when the certificate expired we got an exception and alerted on it.

The client installed a new certificate but the issue arose where in Reporting Service Configuration Manager we went to use the new certificate but when we chose it we got this error:

We are unable to create the certificate binding

Read on to see how to get past this.


December 2017
« Nov Jan »