MapR Goes Spark-First

MapR has introduced a new version of their platform which is based on Spark:

With the emergence of Spark as a unified computing engine, developers can perform ETL and advanced analytics in both continuous (streaming) and batch mode either programmatically (using Scala, Java, Python, or R) or with procedural SQL (using Spark SQL or Hive QL).

With MapR converging the data management platform, you can now take a preferential Spark-first approach. This differs from the traditional approach of starting with extended Hadoop tools and then adding Spark as part of your big data technology stack. As a unified computing engine, Spark can be used for faster batch ETL and analytics (with Spark core instead of MapReduce and Hive), machine learning (with Spark MLlib instead of Mahout), and streaming ETL and analytics (with Spark Streaming instead of Storm).

MapReduce is so 2012…

Related Posts

Testing Kafka Streams Applications

Yeva Byzek continues her series on testing Kafka-based streaming applications: When you create a stream processing application with Kafka’s Streams API, you create a Topologyeither using the StreamsBuilder DSL or the low-level Processor API. Normally, the topology runs with the KafkaStreams class, which connects to a Kafka cluster and begins processing when you call start(). For testing though, connecting to a running […]

Read More

Auto ML With SQL Server 2019 Big Data Clusters

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem: H2O provides popular open source software for data science and machine learning on big data, including Apache SparkTM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930