Saturday, October 03, 2015

Apache Spark vs Hadoop

Accelerating real-time analytics with Spark - O'Reilly Radar
"Apache Spark is an open source, general-purpose computational framework with more flexibility than MapReduce. Spark brings to Hadoop the productivity of functional programming with the speed of in-memory data processing. For example, as shown in Figure 1, in a Logistic Regression performance test, Spark ran several orders of magnitude faster than Hadoop MapReduce in memory."

logistic-regression
"in a Logistic Regression performance test, Spark ran several orders of magnitude faster than Hadoop MapReduce in memory."



"Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive,Tachyon, and any Hadoop data source."

Azure IoT Suite, IoT Hub, Container Service

Azure IoT Suite | Microsoft

ScottGu's Blog - AzureCon Keynote Announcements: India Regions, GPU Support, IoT Suite, Container Service, and Security Center

"The Microsoft Azure IoT Suite helps you connect and integrate with devices more easily, and to capture and analyze untapped device data by using our preconfigured solutions, which are engineered to help you move quickly from proof of concept and testing to broader deployment. Today we support remote monitoring, and soon we will be delivering support for predictive maintenance and asset management solutions."
image

"Azure IoT Hub service which is a fully managed service that enables reliable and secure bi-directional communications between millions of IoT devices and an application back end. Azure IoT Hub offers reliable device-to-cloud and cloud-to-device hyper-scale messaging, enables secure communications using per-device security credentials and access control, and includes device libraries for the most popular languages and platforms."
image

"The Azure Container Service enables users to easily create and manage a Docker enabled Apache Mesos cluster."image[24]

Azurecon 2015 | Microsoft Azure


Azure Data Lake

Microsoft Announces Azure Data Lake, A Data Repository For Big Data Analytics | TechCrunch
"The idea behind Data Lake is — as the name implies — to give developers a single place to store all of their structured and semi-structured data in its native format without having to worry about storage and capacity limitations on individual files."
Data Lake @ Azure
"Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics."



Introducing Azure Data Lake | Microsoft Azure Blog


"...a suite of big data and advanced analytics solutions like Azure HDInsight, Azure Data Factory, Revolution R Enterprise and Azure Machine Learning."

RAGHU RAMAKRISHNAN, Technical Fellow, Data Platforms

By Mary Jo Foley for All About Microsoft 
"... Microsoft made it official: The technological underpinnings of the coming Azure Data Lake service are based on the very ones that the company uses internally as part of its "Cosmos" big-data storage and analytics service."

Same as  Amazon AWS, Microsoft provides same Azure services it uses internally.