An Overview Of Apache Kafka

Leona Zhang has a series going on Apache Kafka.  Part one covers some of the concepts around messaging systems:

There is a difference between batch processing applications and stream processing applications. A visible boundary determines the most significant difference between batch processing and stream processing. If it exists, it is called batch processing. For example, a client collects the data once every hour, sends this data to the server for statistics, and then saves the statistical results in the statistical database.
If the boundary doesn’t exist, the processing is called streaming data (stream processing). Here is an example of stream processing: logs and orders are generated continuously on a large website just like a data flow. If the processing of each log and order takes less than several hundred milliseconds or several seconds after its generation, the application is called a stream application. If the collection of logs and orders happens once every hour followed by a unified transmission, the original stream data converts into batch data.
Occasionally, stream processing becomes imperative. For example, Jack Ma wanted to display the orders and sales on Tmall for November 11 on a large screen. If the data center works in a T+1 mode and can obtain data for November 11 on November 12, Jack Ma would not be happy.

Part two is an overview of the architectural components used in Kafka:

Kafka uses the group concept to integrate the producer/consumer and publisher/subscriber models.
One topic may have multiple groups, and one group may include multiple consumers. Only one consumer in the group can consume one message. For different groups, consumers are in the publisher/subscriber model. All groups receive one message. 
Note: Allocate one partition to only one consumer in the same group. If there are three partitions and four consumers in one of the groups, one consumer is redundant and cannot receive any data.

This looks to be the start to a good series.

Related Posts

Contrasting Oracle’s Architecture with SQL Server’s

Kellyn Pot’vin-Gorman helps us understand where Oracle’s architecture differs from SQL Server’s: The first thing you’ll notice is what Oracle refers to as an INSTANCE is different to what SQL Server calls one. Oracle’s instance is most closely related to what SQL Server calls their database, (although it includes the files that Oracle includes in […]

Read More

MRAppMaster Errors Running MapReduce Jobs

I have a post looking at potential causes when PolyBase MapReduce jobs are unable to find the MRAppMaster class: Let me tell you about one of my least favorite things I like to see in PolyBase: Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster This error is not limited to PolyBase but is instead […]

Read More

Categories

December 2018
MTWTFSS
« Nov Jan »
 12
3456789
10111213141516
17181920212223
24252627282930
31