Don’t Use Cron For Scheduling Hadoop Jobs

Matthew Rathbone explains why cron is not a great choice for scheduling Hadoop and Spark jobs:

Reason 3: Poor transparency for teammates

Which jobs are running right now? Which are going to run today? How long do these jobs take? How do I schedule my job? What machine should I schedule it on? These are all questions that are impossible to answer without building custom orchestration around your Cron process – time you’d be better off spending on building a better system.

Matthew then gives us four alternative products.

Related Posts

KSQL Deployment Options

Hojjat Jafarpour shows us two deployment options for Kafka Streams with KSQL: As I mentioned, we have implemented KSQL on top of the Kafka Streams API. This means that every KSQL query is compiled into a Kafka Streams application. Therefore, KSQL queries follow the same execution model of Kafka Streams applications.A query can be executed […]

Read More

Summarizing Improvements In Spark 2.4

Anmol Sarna summarizes Apache Spark 2.4 and pushes his meme game at the same time: The next major enhancement was the addition of a lot of new built-in functions, including higher-order functions, to deal with complex data types easier.Spark 2.4 introduced 24 new built-in functions, such as  array_union, array_max/min, etc., and 5 higher-order functions, such as transform, filter, etc.The entire […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031