Thursday, August 11, 2016

Big Data: Apache Spark + GPUs

Apache Spark is leveraging RAM to process data significantly faster than classic Hadoop Map/Reduce. Now it can also leverage GPUs for even better performance.

"Spark has emerged as the infrastructure of choice for developing in-memory distributed analytics workloads. It provides high-level abstractions in multiple languages (e.g., Java, Scala, and Python) that hide the underlying data and work distribution operations such as data transfer to and from the Hadoop Distributed File System (HDFS) or that maintain resiliency in the presence of system failures. Spark also provides libraries for relational Online Analytical Processing (OLAP) using SQL, machine learning, graph analytics, and streaming workloads."

Spark components being accelerated

Seagate 60 TB SSD (more expensive than gold)

'World's largest' SSD revealed as Seagate unveils 60TB monster | ZDNet
"Seagate has been showing off its monster 60TB solid-state drive (SSD) this week, which breezes past the 15TB SSD (2.5", $10K) that Samsung launched in March.

So this SSD price is $10,311.99 / 4.94 oz = $2087 / oz.

That is 50% more than current price of gold per oz: 
speed of memory and caching