DraganSr: data, future, person: Michael Stonebraker, No Map-Reduce, NoSQL merges with SQL

Monday, June 22, 2015

data, future, person: Michael Stonebraker, No Map-Reduce, NoSQL merges with SQL

The future of data at scale - O'Reilly Radar
The O'Reilly Radar Podcast: Turing Award winner Michael Stonebraker on the future of data science.

("There is no Nobel's prize for Computer Science, so Turing's Award is the highest recognition")

In March 2015, database pioneer Michael Stonebraker was awarded the 2014 ACM Turing Award “for fundamental contributions to the concepts and practices underlying modern database systems.

"It’s all going to move to data science as soon as enough data scientists get trained by our universities to do this stuff. It’s fairly clear to me that you’re probably not going to retread a business analyst to be a data scientist because you’ve got to know statistics, you’ve got to know machine learning. You’ve got to know what regression means, what Naïve Bayes means, what k-Nearest Neighbors means. It’s all statistics.

All of that stuff turns out to be defined on arrays. It’s not defined on tables. The tools of future data scientists are going to be array-based tools. Those may live on top of relational database systems. They may live on top of an array database system, or perhaps something else. It’s completely open."

"Stonebraker discusses the problem of curating data at scale in more detail in his contributed chapter in a new free ebook, Getting Data Right."

He claims that Google, who created Map-Reduce 10 years ago, stopped using Map-Reduce 5 years ago... Will Hadoop adjust and transform itself?

JSON is apparently his data format of choice for semi-structured data,
and SQL preferred high-level interface. ACID is required, so HDFS is not good enough.

free ebook
http://www.tamr.com/landing-pages/getting-data-right/
"preview edition of the O'Reilly ebook "Getting Data Right: Tackling the Challenges of Big Data Volume and Variety" (available Fall 2015) and get early access to Dr. Michael Stonebraker's chapter on "Data Curation at Scale".

Google Re-Imagines MapReduce, Launches DataFlow
"Google Cloud Dataflow is a managed service for creating data pipelines that ingest, transform, and analyze massive amounts of data, up into the exabyte range."

The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google : Paper Trail

"It was known for decades that generalised dataflow engines adequately capture the map-reduce model as a fairly trivial special case. However, there was real doubt over whether such engines could be efficiently implemented on large-scale cluster computers."

Syncfusion Big Data Platform | Big Data Platform simplifies working with Hadoop on Windows

"The Syncfusion Big Data Platform is the first and the only complete Hadoop distribution designed for Windows. 100% free for everyone"

Monday, June 22, 2015

data, future, person: Michael Stonebraker, No Map-Reduce, NoSQL merges with SQL

No comments: