Protecting Hadoop Clusters From Malware

Michael Yoder and Suraj Acharya remind us that Hadoop clusters are made up of computers on a network, which means people will try to install malicious software:

Roughly two years ago there were a spate of attacks against the open source database solution MongoDB, as well as Hadoop. These attacks were ransomware: the attacker wiped or encrypted data and then demanded money to restore that data. Just like the recent attacks, the only Hadoop clusters affected were those that were directly connected to the internet and had no security features enabled. Cloudera published a blog post about this threat in January 2017. That blog post laid out how to ensure that your Hadoop cluster is not directly connected to the internet and encouraged the reader to enable  Cloudera’s security and governance features.

That blog post has renewed relevance today with the advent of XBash and DemonBot.

The origin story of XBash and DemonBot illustrates how security researchers view the Hadoop ecosystem and the lifecycle of a vulnerability. Back in 2016 at the Hack.lu conference in Luxembourg, two security researchers gave a talk entitled Hadoop Safari: Hunting for Vulnerabilities. They described Hadoop and its security model and then suggested some “attacks” against clusters that had no security features enabled. These attacks are akin to breaking in to a house while the front door is wide open.

Their advice is simple, but simple is good here:  it means you should be able to implement the advice without much trouble.

Related Posts

Databricks Runtime 5.2 Released

Nakul Jamadagni announces Databricks Runtime 5.2: Delta Time TravelTime Travel, released as an Experimental feature, adds the ability to query a snapshot of a table using a timestamp string or a version, using SQL syntax as well as DataFrameReader options for timestamp expressions.Sample codeSELECT count() FROM events TIMESTAMP AS OF timestamp_expressionSELECT count() FROM events VERSION AS OF version Time travel looks a bit like temporal tables in SQL Server.

Read More

Kafka And The Differing Aims Of Data Professionals

Kai Waehner argues that there is an impedence mismatch between data engineers, data scientists, and ML production engineers: Data scientists love Python, period. Therefore, the majority of machine learning/deep learning frameworks focus on Python APIs. Both the stablest and most cutting edge APIs, as well as the majority of examples and tutorials use Python APIs. […]

Read More

Categories

November 2018
MTWTFSS
« Oct Dec »
 1234
567891011
12131415161718
19202122232425
2627282930