Integrating Kafka Into A Data Scientist’s Workflow

Liz Bennett from Stitch Fix has a guest post on the Confluent blog:

Our main requirement for this new project was to build infrastructure that would be 100 percent self-service for our Data Scientists. In other words, my teammates and I would never be directly involved in the discovery, creation, configuration and management of the event data. Self-service would fix the primary shortcoming of our legacy event delivery system: manual administration that was performed by my team whenever a new dataset was born. This manual process hindered the productivity and access to event data for our Data Scientists. Meanwhile, fulfilling the requests of the Data Scientists hindered our own ability to improve the infrastructure. This scenario is exactly what the Data Platform Team strives to avoid. Building self-service tooling is the number one tenet of the Data Platform Team at Stitch Fix, so whatever we built to replace the old event infrastructure needed to be self-service for our Data Scientists. You can learn more about our philosophy in Jeff Magnusson’s post Engineers Shouldn’t Write ETL.

This is an architectural overview and a good read.

Related Posts

Working with Columns in Spark

Achilleus has a two-parter on working with columns in Spark. Part 1 covers some of the basic syntax and several functions: Also, we can have typed columns which is basically a column with an expression encoder specified for the expected input and return type. scala> val name = $"name".as[String]name: org.apache.spark.sql.TypedColumn[Any,String] = namescala> val name = […]

Read More

Creating Threadpools with ExecutorService in Kafka

Prasanth Nair shows how we can use Java’s ExecutorService to create threadpools for Kafka consumers: Apache Kafka is one of today’s most commonly used event streaming platforms. While using the Kafka platform, quite often, we run into a scenario where we have to process a large number of events/messages that are placed on a broker. […]

Read More

Categories

September 2018
MTWTFSS
« Aug Oct »
 12
3456789
10111213141516
17181920212223
24252627282930