Optimizing Apache Flink

Ivan Mushketyk has a few tips for speeding up programs using Apache Flink:

One more way to optimize your Flink application is to provide some information about what your user-defined functions are doing with input data. Since Flink can’t parse and understand code, you can provide crucial information that will help to build a more efficient execution plan. There are three annotations that we can use:

  1. @ForwardedFields: Specifies what fields in an input value were left unchanged and are used in an output value.

  2. @NotForwardedFields: Specifies fields that were not preserved in the same positions in the output.

  3. @ReadFields: Specifies what fields were used to compute a result value. You should only specify fields that were used in computations and not merely copied to the output.

Click through for his four tips.

Related Posts

Handling Errors in Kafka Connect

Robin Moffatt shows us some techniques for handling errors in your Kafka topics: We’ve seen how setting errors.tolerance = all will enable Kafka Connect to just ignore bad messages. When it does, by default it won’t log the fact that messages are being dropped. If you do set errors.tolerance = all, make sure you’ve carefully thought through […]

Read More

Batch Consumption from Kafka with Spark

Swapnil Chougule shares a few tips on performing batch processing of a Kafka topic using Apache Spark: Spark as a compute engine is very widely accepted by most industries. Most of the old data platforms based on MapReduce jobs have been migrated to Spark-based jobs, and some are in the phase of migration. In short, […]

Read More

Categories

October 2017
MTWTFSS
« Sep Nov »
 1
2345678
9101112131415
16171819202122
23242526272829
3031