Kafka Consumer Group Assignment

Kevin Feasel

2016-10-11

Hadoop

David Brinegar discusses how consumers within an Apache Kafka consumer group get assigned work:

Or one might want some assignment that results in uniform workloads, based on the number of messages in each partition.  But until we have pluggable assignment functions, the reference implementation has a straightforward assignment strategy called Range Assignment.  There is also a newer Round Robin assignor which is useful for applications like Mirror Maker, but most applications just use the default assignment algorithm.

The Range Assignor tries to land on a uniform distribution of partitions, at least within each topic, while at the same time avoiding the need to coordinate and bargain between nodes.  This last goal, independent assignment, is done by each node executing a fixed algorithm:  sort the partitions, sort the consumers, then for each topic take same-sized ranges of partitions for each consumer.  Where the sizes cannot be the same, the consumers at the beginning of the sorted list will end up with one extra partition.  With this algorithm, each application node can see the entire layout by itself, and from there take up the right assignments.

Click through to see an example of how this is implemented.

Dockerizing ASP.Net Applications

Elton Stoneman shows how to package an ASP.Net web application into a Docker image:

It’s how you start to package “legacy” ASP.NET apps in Docker images, so you can run them in containers on Windows 10 and Windows Server 2016. Once you’ve packaged your app into a container image you have:

  • a central artifact which dev and ops teams can work with, which helps you transition to DevOps;

  • an app that runs the same on your laptop, on the server, on Azure, on AWS, which helps you move to the cloud;

  • an app platform which supports distributed systems, which helps you break down the monolith into microservices.

This is part one of a series, but if you read through this post, you’ll end up with a fully-functional app.

Scheduler Timing

Ewald Cress continues his look at schedulers:

To simplify things initially, we’ll forget about hidden schedulers and assume hard CPU affinity. That gives us an execution environment that looks like this:

  • Each CPU is physically tied to a scheduler.

  • Therefore, out of all the workers in the system, there is a subset of workers that will only run on that CPU.

  • Workers occasionally hand over control of their CPU to a different worker in their scheduler.

  • At any given moment, each CPU is expected to be running a worker that does something of interest to the middle or upper layers of SQL Server.

  • Some of this useful work will be done on behalf of the worker’s scheduler siblings.

  • However, a (hopefully) tiny percentage of a worker’s time is spent within the act of scheduling.

As usual, this is worth the read.

What Is Kafka?

I start a new series on Apache Kafka:

The broker serves several purposes:

  1. Know who the producers are and who the consumers are.  This way, the producers don’t care who exactly consumes a message and aren’t responsible for the message after they hand it off.
  2. Buffer for performance.  If the consumers are a little slow at the moment but don’t usually get overwhelmed, that’s okay—messages can sit with the broker until the consumer is ready to fetch.
  3. Let us scale out more easily.  Need to add more producers?  That’s fine—tell the broker who they are.  Need to add consumers?  Same thing.
  4. What about when a consumer goes down?  That’s the same as problem #2:  hold their messages until they’re ready again.

So brokers add a bit of complexity, but they solve some important problems.  The nice part about a broker is that it doesn’t need to know anything about the messages, only who is supposed to receive it.

This is an introduction to the product and part one of an eight-part series.

Dashboards Or Reports

Reza Rad compares dashboards to reports:

Dashboard: General

Stephen Few‘s definition of Dashboard: A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.

Report: General

A Report on the other hand is any informational work. This information can be at any format. Table, Chart, text, number or anything else.

Reza then ties it back to Power BI, showing how to take advantage of both of these concepts.

Interactive Graphics With ggiraph

David Smith sheds some light on the ggiraph project:

R’s ggplot2 package is a well-known tool for producing beautiful static data visualizations that you can include in a printed report. But what if you want to include a ggplot2 graphic on a webpage and provide the ability for the user to interact with the data? The ggiraph package by David Gohel  (available for installation via CRAN). WIth ggiraph, you can take an existing ggplot2 bar chart, scatterplot, boxplot, map, or many other types of chart and add one or both of the following iteractions:

  • Display a tooltip of your choice (e.g. data values or labels) when the cursor hovers over sections of the chart

  • Perform an action (a javascript function you provide: jump to another page, for example) when the viewer clicks on an element of the chart

I like it.

Identity As A Service

Cristian Satnic argues that we should look at Identity as a Service solutions for our applications:

What exactly is Azure Active Directory B2C?

  • Cloud identity service with support for social accounts and app-specific (local) accounts

  • For enterprises and ISVs building consumer facing web, mobile & native apps

  • Builds on Azure Active Directory – a global identity service serving hundreds of millions of users and billions of sign-ins per day (same directory system used by Microsoft online properties – Office 365, XBox Live and so on)

  • Worldwide, highly-available, geo-redundant service – globally distributed directory across all of Microsoft Azure’s datacenters

I am a big fan of OAuth and making it easy for line-of-business developers to deal with authentication (lest they get harebrained ideas like rolling their own encryption algorithms).

Word Cloud Visual

Devin Knight shows off the word cloud custom visual in Power BI:

Key Takeaways

  • Great for parsing unstructured data

  • Utilize stop words to remove commonly used filler words like a, the, an, etc…

    • You can use the default stop word that are provided and add your own that you would like to remove from the visual.
  • The size of the words in the visual tell you how frequently the word is used.

Cf. yesterday’s word cloud example.  I’m not sure how truly valuable word clouds are for visualization purposes, but at least they’re fun to peruse.

Nonclustered Columnstore Indexes On Indexed Views

Niko Neugebauer notes that non-clustered columnstore indexes can now sit on top of indexed views, as of SQL Server 2016:

From the perspective of the disk access, this is where you will definitely win at least a couple of times with the amount of the disk access while processing the information, amount of memory that you will need to store and process (think hashing and sorting for the late materialisation phases), and you will pay less for the occupied storage.

Another noticeable thing was that the memory grants for the Indexed Views query was smaller compared to the query that was processing the original columnstore table FactOnlineSales.

Clustered indexes are currently not available as an option; we’ll see if that changes in the next version of SQL Server.

Auditing In Power BI

Ginger Grant shows off some of the auditing capabilities within Power BI:

As you can see by looking at the available Power BI options, there are a number of options to choose from. If you select the top item PowerBI activities, then everything gets selected. After doing that click outside of the menu for the menu to go away. Select a date and time range of your choosing, select specific users if you wish, then click on the Search button. Depending on how big your date range is, this may take some time to load. Once you see the results, you have the ability to filter as well.

Another day, another two dozen new Power BI features…  This one’s a good one.

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31