Tips For Creating Population Share Maps

Lisa Charlotte Rost uses election results to give us some tips on building map-based comparisons:

This map shows us that both parties received a higher vote share in the east than in the west. But it also artificially increases the polarisation: If the AfD gets just one more vote than the Linke, the whole district flips from pink to blue. And we would need to create a third category, “tied”, for the nine election districts in which there were exactly as many AfD voters as Linke voters. (The New York Times created that category for their “Extremely Detailed Map of the 2016 Election”.)

There is another option: We could show the percentage point difference between the two shares. To do so, we subtract the AfD votes from the Linke votes. If the result is positive, we show the district in blue. If it’s negative, we show it in pink.

This is a case where there’s not a huge difference between methods, but it can make a big difference in other situations.

Using Kubernetes To Support Microservices

Samir Behara walks us through a high-level explanation of how you can use Kubernetes to support development of microservices:

Kubernetes is an open source container-orchestration system for automating deployments, scaling and management of containerized applications. In this tutorial, you will learn how to get started with Microservices on Kubernetes. I will cover the below topics in details —

  • How does Kubernetes help to build scalable Microservices?

  • Overview of Kubernetes Architecture

  • Create a Local Development Environment for Kubernetes using Minikube

  • Create a Kubernetes Cluster and deploy your Microservices on Kubernetes

  • Automate your Kubernetes Environment Setup

  • CI/CD Pipeline for deploying containerized application to Kubernetes

This sticks mostly to a high-level architecture discussion, and does a good job at that.

Choose Your Next SQL Job Title

Tomaz Kastrun has created a job title generator in T-SQL:

While writing a sample random function in using T-SQL Server, I have remembered, why not write a job title generator for T-SQL domain only. You might have seen so called bulls**t job title generator and similar, but this one is T-SQL SQL server specific.

So, why not come up with random, yet funny T-SQL job titles. And making it, I have to tell you, it was fun. And I was simply hitting that F5 button in SSMS, to get new job title generated and laugh out loud.

It took me a few clicks, but I got “Qualitative R ggplot library Stackover subscriber,” which might be hitting the mark a little close.

Enabling Preview Features In Power BI

Jeanne Combrinck walks us through how to enable Power BI preview features:

Every month PowerBI releases new features. Some of the features are in preview mode and unless you turn it on you don’t get to use the preview features. This post explains how to turn them on.

Firstly you need to have the latest version of PowerBI to get the latest features. You can download it here.

Click through to see the remaining steps.  There are some interesting preview features that I’d expect to make it to the general product in the next few months.

Gaps And Islands: Solving Stochastic Islands Problems

Itzik Ben-Gan shares with us a special case of the islands problem:

In your database you keep track of services your company supports in a table called CompanyServices, and each service normally reports about once a minute that it’s online in a table called EventLog. The following code creates these tables and populates them with small sets of sample data:


The special islands task is to identify the availability periods (serviced, starttime, endtime). One catch is that there’s no assurance that a service will report that it’s online exactly every minute; you’re supposed to tolerate an interval of up to, say, 66 seconds from the previous log entry and still consider it part of the same availability period (island). Beyond 66 seconds, the new log entry starts a new availability period. So, for the input sample data above, your solution is supposed to return the following result set (not necessarily in this order):

It’s a neat twist on an old problem.

Backing Up Query Store Data

Grant Fritchey explains that Query Store data gets backed up like regular data, but with a caveat:

The core of the answer is very simple. Query Store, like any other data written to a database, whether a system table or a user table, is a logged operation. So, when you backup the database, you’re backing up Query Store data. When you backup the logs, you’re also backing up Query Store data. A point in time will include all the data written to the Query Store at that point.

However, that’s the kicker. At what point was the Query Store information written to disk?

Read on to learn when, and what you can do about it if you prefer otherwise.

Switching To Managed Disks In Azure

Chris Seferlis walks us through an easy method to convert unmanaged disks to managed disks in Azure:

First off, why would you want a managed disk over an unmanaged one?

  • Greater scalability due to much higher IOPs and storage limits. There’s no longer the need to add additional storage accounts when you’re adding disk space, which has been a challenge for users that were using large virtual machines and required large storage space.

  • Better availability and reliability which ensures that disks are now isolated from each other in different storage scale units.

  • Managed disks offer an over 99.99% uptime, plus are always stored with 3 replicas of the data.

  • More granular access control by employing role-based access control (RBAC) security. You have granular capability to assign access to various people in your organization.

Keep reading to learn how to switch.

Flint: Time Series With Spark

Li Jin and Kevin Rasmussen cover the concepts of Flint, a time-series library built on Apache Spark:

Time series analysis has two components: time series manipulation and time series modeling.

Time series manipulation is the process of manipulating and transforming data into features for training a model. Time series manipulation is used for tasks like data cleaning and feature engineering. Typical functions in time series manipulation include:

  • Joining: joining two time-series datasets, usually by the time
  • Windowing: feature transformation based on a time window
  • Resampling: changing the frequency of the data
  • Filling in missing values or removing NA rows.

Time series modeling is the process of identifying patterns in time-series data and training models for prediction. It is a complex topic; it includes specific techniques such as ARIMA and autocorrelation, as well as all manner of general machine learning techniques (e.g., linear regression) applied to time series data.

Flint focuses on time series manipulation. In this blog post, we demonstrate Flint functionalities in time series manipulation and how it works with other libraries, e.g., Spark ML, for a simple time series modeling task.

Basho went all-in on a time-series product for Riak and it did not work out well for them.  I’ll be curious to see if Flint has more staying power.

Against Multi-Cloud Models

Tyler Treat argues against companies looking at multi-cloud models:

A multi-cloud strategy looks great on paper, but it creates unneeded constraints and results in a wild-goose chase. For most, it ends up being a distraction, creating more problems than it solves and costing more money than it’s worth. I’m going to caveat that claim in just a bit because it’s a bold blanket statement, but bear with me. For now, just know that when I say “multi-cloud,” I’m referring to the idea of running the same services across vendors or designing applications in a way that allows them to move between providers effortlessly. I’m not speaking to the notion of leveraging the best parts of each cloud provider or using higher-level, value-added services across vendors.

Multi-cloud rears its head for a number of reasons, but they can largely be grouped into the following points: disaster recovery (DR), vendor lock-in, and pricing. I’m going to speak to each of these and then discuss where multi-cloud actually does come into play.

It’s an interesting article.  I think that Tyler is right, but that companies should be capable of switching between cloud providers or even creating hybrid approaches should the need arise.

An Update To ssisUnit

Bartosz Ratajczyk has added some functionality to ssisUnit:

Second – you can get and set the properties of the project and its elements. Like – overwriting project connection managers (I designed it with this particular need on my mind). You can now set the connection string the different server (or database) – in the PropertyPath of the PropertyCommand use \Project\ConnectionManagers, write the name of the connection manager with the extension, and use one of the Properties. You can do it during the Test setup (or all tests setup), but not during the test suite setup, as ssisUnit is not aware of the project until it loads it into the memory.

Good on Bartosz for resurrecting a stable but moribund project and adding some enhancements.


September 2018
« Aug