Saturday, October 03, 2015

Apache Spark vs Hadoop

Accelerating real-time analytics with Spark - O'Reilly Radar
"Apache Spark is an open source, general-purpose computational framework with more flexibility than MapReduce. Spark brings to Hadoop the productivity of functional programming with the speed of in-memory data processing. For example, as shown in Figure 1, in a Logistic Regression performance test, Spark ran several orders of magnitude faster than Hadoop MapReduce in memory."

logistic-regression
"in a Logistic Regression performance test, Spark ran several orders of magnitude faster than Hadoop MapReduce in memory."



"Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive,Tachyon, and any Hadoop data source."

No comments: