Introducing Confluent Operator for Kubernetes
Kubernetes has become the open source standard for orchestrating containerized applications, but running stateful applications such as Kafka can be difficult and requires a specialized skill set. Thus, we decided to automate the process for you.
For the past few months, we have been working closely with a set of customers and partners as part of a preview program to gather their early feedback. We are now ready to release Confluent Operator, our enterprise-ready implementation of the Kubernetes Operator API to automate deployment and key lifecycle operations of Confluent Platform on Kubernetes.
Looks like they’ve been busy lately.
Satish Sharma has a four-part series covering stream processing with Apache Kafka. Part 1 gives us an overview of Kafka:
Apache Kafka is an open-source distributed stream processing platform originally developed by LinkedIn and later donated to Apache in 2011.
We can describe Kafka as a collection of files, filled with messages that are distributed across multiple machines. Most of Kafka analogies revolve around tying these various individual logs together, routing messages from producers to consumers reliably, replicating for fault tolerance, and handling failure gracefully. Its architecture inherits more from storage systems like HDFS, HBase, or Cassandra than it does from traditional messaging systems that implement JMS or AMQP. The underlying abstraction is a partitioned log, essentially a set of append-only files spread over several machines. This encourages sequential access patterns. A Kafka cluster is a distributed system that spreads data over many machines both for fault tolerance and for linear scale-out.
Kafka Streams API
Kafka Streams API is a Java library that allows you to build real-time applications. These applications can be packaged, deployed, and monitored like any other Java application — there is no need to install separate processing clusters or similar special-purpose and expensive infrastructures!
The Streams API is scalable, lightweight, and fault-tolerant; it is stateless and allows for stateful processing.
For quick testing, let’s start a handy console consumer, which reads messages from a specified topic and displays them back on the console. We will use the same to consumer to read all of our messages from this point forward. Use the following command:
Linux -> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic tutorial-topic --from-beginning
Windows -> bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic tutorial-topic --from-beginning
Part 4 is forthcoming.
I teach online about Apache Kafka, and a very frequent and recurring question I get is:
How can I learn Confluent Kafka?
Let’s get right to it!
I’ve gone through a couple of Stephane’s Kafka courses and they’re excellent. There’s still a lot more for me to go through but if you’re interesting in learning Kafka, this is a great method.
One of the more interesting parts of SQL Server 2019 CTP 3.2’s release notes is the relationship between Microsoft and Azul Systems. Travis Wright covers it in some detail, as well as what it means for customers.
Prior to SQL Server 2019 CTP 3.2, installing PolyBase required an installation of Oracle’s Java Runtime Environment 7 Update 51 or higher, either directly from Oracle or through OpenJDK.
Java is still required if you want to read from or write to Hadoop or Azure Blob Storage. Oracle’s flavor of Java is no longer required, however.
I’m going to admit, that the reason I didn’t embrace Powershell at first, was most of the examples I found were of full of hardcoded values. I found it incredibly obtuse, but I started to realize that it came from many sources who might not have the scripting history that those of other shells, (this was just my theory, not a lot of evidence to prove on this one, so keep that in mind…) As Powershell scripts have matured, I’ve noticed how many are starting to build them with more dynamic values and advance scripting options, and with this, I’ve become more comfortable with Powershell.
I think the best way to learn is to see real examples, so let’s demonstrate.
Read on for those examples.
If you use replication, you have had the situation occur where you had to restore a replicated database. You’ve have doubtless been paged to restore a replicated database. You have experienced the ineffable joy of being tearing down replication-dependent indexed views (if you have them), blowing away replication, doing the restore, putting replication and indexing back together again, and finally redeploying your indexed views. I know I have.
In fact, I’ve done it enough times that I didn’t want to do it anymore. So, you may ask, did I go to a different modality of replicating my data? Did I go to Availability Groups or mirroring instead? No. I actually like replication. It’s invaluable when you need to write code around real-time data (especially from a third party database), but you aren’t able to index the original copy. It’s been around for a long time and is well vetted, and pretty forgiving, once you understand how it works. So, no need to reinvent the wheel. I decided to automate replication instead.
This is specific to transactional replication. There’s a whole ‘nother kettle of fish for merge replication.
First, look under the release documentation and search for the deb package. In my case I’m install the amd64 version.
Then, right-click on the “powershell-preview_7.0.0-preview.2-1.ubuntu.18.04_amd64.deb”, and select “Copy link address“.
This is also something you could script out. It’s quite useful in scenarios where your apt repositories don’t have the latest version of something but you still need or want it.
Although the method is fairly simple, there are simpler methods if you just need the raw data from your data model (and not the specific aggregations or measures that the visual contains):
– Use DAX Studio to download all tables from your data model at once
– Use DAX Studio to download specific tables from your data model (one by one)
– Or use R or Python to download specific tables if you’re comfortable with these languages. This method also allows scheduled refreshes in the service.
In short, this is probably a fifth-best solution, but it does work.
First, the violin plot is now a certified custom visual. This means that it has been tested by the Power BI team to ensure it meets certain requirements, one of which is that the visual does not access external services or resources. You can be confident your data isn’t being sent externally when you use the violin plot.
As for the functional enhancements, a new legend has been added. This is a great addition to make the chart clearer and more easily read, especially for audiences that may not be familiar with how the violin plot works. The customizable legend calls out what markers are used for mean, median, and quartiles.
Meagan is quite pleased with these updates.