"Dean Wampler argues that Spark/Scala is a better data processing engine than MapReduce/Java because tools inspired by mathematics, such as FP, are ideal tools for working with data."
Advantage of Spark over Hadoop is that it does not need to save data to disk after each step like Map/Reduce, providing significant performance gain (sometimes 100x). He suggest that Spark is to Hadoop what Spring is to J2EE, a significant improvement and simplification.
Spark is written in Scala, but usable from Java and Python,
as well as variations of SQL (HiveQL).
It also includes modules for Machine Learning.
Compute Model: "RDD" Resilient Distributed Dataset.
Unified Big Data Processing with Apache Spark @ InfoQ
Apache Spark 1.2.0 Supports Netty-based Implementation, High Availability and Machine Learning APIs
Use Script Action in HDInsight to install Spark on Hadoop cluster| Azure
Spark, Storm and Real Time Analytics
Apache Spark™ - Lightning-Fast Cluster Computing
No comments:
Post a Comment