Avro Schemas In Kafka

Stephane Maarek explains the value of using Apache Avro as a schema structure for your Kafka topics:

  • Avro has support for primitive types ( intstringlongbytes, etc…), complex types (enumarraysunions, optionals), logical types (datestimestamp-millisdecimal), and data record (name and namespace). All the types you’ll ever need.

  • Avro has support for embedded documentation. Although documentation is optional, in my workflow I will reject any Avro Schema PR (pull request) that does not document every single field, even if obvious. By embedding documentation in the schema, you reduce data interpretation misunderstandings, you allow other teams to know about your data without searching a wiki, and you allow your devs to document your schema where they define it. It’s a win-win for everyone.

  • Avro schemas are defined using JSON. Because every developer knows or can easily learn JSON, there’s a very low barrier to entry

Read on for more about Avro as well as the possibilities of using other techniques for defining schemas in Kafka.

When Spark Meets Hive

Anna Martin and Rosaria Silipo look at combining HiveQL and SparkQL:

We set our goal here to investigate the age distribution of Maine residents, men and women, using SQL queries. But the question is… on Apache Hive or on Apache Spark? Well, why not both? We could use SparkSQL to extract men’s age distribution and HiveQL to extract women’s age distribution. We could then compare the two distributions and see if they show any difference.

But the main question, as usual, is: Will SparkSQL queries and HiveQL queries blend?

Topic: Age distribution for men and women in the U.S. state of Maine.

Challenge: Blend results from Hive SQL and Spark SQL queries.

Access mode: Apache Spark and Apache Hive nodes for SQL processing.

Using KNIME, the authors are able to blend together data from different sources.

Troubleshooting Polybase Installation Errors

John Paul Cook has an article looking at what to do when you have Oracle JRE 9 and want to use Polybase:

This article shows you how to troubleshoot a failed installation of SQL Server and how to implement a workaround to allow SQL Server 2017’s PolyBase feature to be installed when version 9 of the Java Runtime Environment (JRE) is present. An installation of all features in SQL Server 2017 has three external dependencies. Python, R, and the JRE are third party or open source software needed in a full installation. Changes to external software after the release of SQL Server 2017 can introduce breaking changes. Oracle, the company that owns Java, changed how Windows registry keys are named. This caused a breaking change for SQL Server 2017. Version 8 of the JRE is compatible with the SQL Server 2017 installer. Version 9 of the JRE is not. If version 9 of the JRE is the only version of the JRE on a Windows machine, it is not possible to install the PolyBase feature. The JRE version bug also is found in the SQL Server 2016 installer. The same workaround works for both SQL Server 2016 and 2017.

Good article.

Using mssql-cli

Alan Yu announces mssql-cli, a command-line interface for SQL Server:

Mssql-cli is a new and interactive command line tool that provides the following key enhancements over sqlcmd in the Terminal environment:

  • T-SQL IntelliSense
  • Syntax highlighting
  • Pretty formatting for query results, including Vertical Format
  • Multi-line edit mode
  • Configuration file support

Mssql-cli aims to offer an improved interactive command line experience for T-SQL. It is fully open source under the BSD-3 license, and a contribution to the dbcli organization, an open source suite of interactive CLI tools for relational databases including SQL Server, PostgresSQL, and MySQL. The command-line UI is written in Python and the tool leverages the same microservice backend (sqltoolsservice) that powers the VS Code SQL extension, SQL Operations Studio, and the other Python CLI tool we announced earlier, mssql-scripter.

Something very cool that Alan points out is that this is “the first time our team is contributing source code to an existing open source organization with a commitment to be a good citizen in an existing open source community.”

The Power BI Gateway

Reza Rad explains how the Power BI gateway works:

You don’t need a gateway in all scenarios. Only if the data source is located on-premises, you need a gateway. For online or cloud-based data sources, no gateway is required. For example; if you are getting data from CRM Online, you don’t need a gateway. However, if you are getting data from SQL Server database located on your local domain server, then you need a gateway. For Azure SQL DB you don’t need a gateway. However, a SQL Server database located on Azure Virtual Machine is considered as on-premises and needs gateway.

This post could not have come at a better time for me, so I’m definitely happy to see it.

Take The 2018 Data Professional Salary Survey

Brent Ozar has the 2018 edition of his Data Professional Salary Survey:

A few things to know:

  • It’s totally anonymous (we’re not getting your email, IP address, or anything like that.)

  • It’s open to all database platforms.

  • As with last year’s results, we’ll publish the raw data in Excel for anyone to analyze. If you want to set up your analysis ahead of time, here’s the incoming raw results as they happen, and we’ll share them in that exact same format.

Please take the survey, especially if you’re hitting Curated SQL for the analytics or Hadoop/Spark side of things rather than the SQL Server side.  That way there’s a broader distribution of entries.

Failed To Open Loopback Connection

Arun Sirpal diagnoses a connection problem:

I could not read my error log on one of my local SQL Servers, when I executed the following code:

EXEC sp_readerrorlog

I received the below:

Msg 22004, Level 16, State 1, Line 2 Failed to open loopback connection. Please see event log for more information. Msg 22004, Level 16, State 1, Line 2 Error log location not found.

Fortunately, the error logs had a bit more detail, so Arun has the answer for you.

Unable To Create The Certificate Binding

Hamish Watson walks through changing an expired certificate in SQL Server Reporting Services:

This blog post is around the situation where you have SSRS setup to use HTTPS and thus using a certificate and the certificate expires (or just needs replacing). We had caught the initial error via our Continuous Monitoring of the SSRS site — basically when the certificate expired we got an exception and alerted on it.

The client installed a new certificate but the issue arose where in Reporting Service Configuration Manager we went to use the new certificate but when we chose it we got this error:

We are unable to create the certificate binding

Read on to see how to get past this.

Protecting Sensitive Data In Docker

Jatin Demla shows how to create Docker secrets:

Managing the password, access tokens and private keys are being tedious in the application. Any small mistakes accidentally expose all the secret information. Even storing such thing in docker images can be easily accessible one should just run the image in the interactive mode container and all your application code is available in containers. Docker provides secrets to protect all secret data.

This blog explains the low-level of storage information as well as secured access to docker secret. so, let’s get started.

Read the whole thing, especially if you’ve gone container-happy.

Taking A Random Walk

Dan Goldstein describes the basics of Brownian motion:

I was sitting in a bagel shop on Saturday with my 9 year old daughter. We had brought along hexagonal graph paper and a six sided die. We decided that we would choose a hexagon in the middle of the page and then roll the die to determine a direction:

1 up (North)
2 diagonal to the upper right (Northeast)
3 diagonal to the lower right (Southeast)
4 down (South)
5 diagonal to the lower left (Southwest)
6 diagonal to the upper left (Northwest)

Our first roll was a six so we drew a line to the hexagon northwest of where we started. That was the first “step.”

After a few rolls we found ourselves coming back along a path we had gone down before. We decided to draw a second line close to the first in those cases.

We did this about 50 times. The results are pictured above, along with kid hands for scale.

Javi Fernandez-Lopez then shows how to generate an animated GIF displaying Brownian motion:

Last Monday we celebrated a “Scientific Marathon” at Royal Botanic Garden in Madrid, a kind of mini-conference to talk about our research. I was talking about the relation between fungal spore size and environmental variables such as temperature and precipitation. To make my presentation more friendly, I created a GIF to explain the Brownian Motion model. In evolutionary biology, we can use this model to simulate the random variation of a continuous trait through time. Under this model, we can notice how closer species tend to maintain closer trait values due to shared evolutionary history. You have a lot of information about Brownian Motion models in evolutionary biology everywhere!

Another place that this is useful is in describing stock market movements in the short run.


December 2017
« Nov