Thursday, August 11, 2016

Big Data: Apache Spark + GPUs

Apache Spark is leveraging RAM to process data significantly faster than classic Hadoop Map/Reduce. Now it can also leverage GPUs for even better performance.

"Spark has emerged as the infrastructure of choice for developing in-memory distributed analytics workloads. It provides high-level abstractions in multiple languages (e.g., Java, Scala, and Python) that hide the underlying data and work distribution operations such as data transfer to and from the Hadoop Distributed File System (HDFS) or that maintain resiliency in the presence of system failures. Spark also provides libraries for relational Online Analytical Processing (OLAP) using SQL, machine learning, graph analytics, and streaming workloads."

Spark components being accelerated

No comments: