The O'Reilly Radar Podcast: Turing Award winner Michael Stonebraker on the future of data science.
("There is no Nobel's prize for Computer Science, so Turing's Award is the highest recognition")
In March 2015, database pioneer Michael Stonebraker was awarded the 2014 ACM Turing Award “for fundamental contributions to the concepts and practices underlying modern database systems.
"It’s all going to move to data science as soon as enough data scientists get trained by our universities to do this stuff. It’s fairly clear to me that you’re probably not going to retread a business analyst to be a data scientist because you’ve got to know statistics, you’ve got to know machine learning. You’ve got to know what regression means, what Naïve Bayes means, what k-Nearest Neighbors means. It’s all statistics.
All of that stuff turns out to be defined on arrays. It’s not defined on tables. The tools of future data scientists are going to be array-based tools. Those may live on top of relational database systems. They may live on top of an array database system, or perhaps something else. It’s completely open."
"Stonebraker discusses the problem of curating data at scale in more detail in his contributed chapter in a new free ebook, Getting Data Right."
free ebook
http://www.tamr.com/landing-pages/getting-data-right/
"preview edition of the O'Reilly ebook "Getting Data Right: Tackling the Challenges of Big Data Volume and Variety" (available Fall 2015) and get early access to Dr. Michael Stonebraker's chapter on "Data Curation at Scale".
Google Re-Imagines MapReduce, Launches DataFlow
"Google Cloud Dataflow is a managed service for creating data pipelines that ingest, transform, and analyze massive amounts of data, up into the exabyte range."
In March 2015, database pioneer Michael Stonebraker was awarded the 2014 ACM Turing Award “for fundamental contributions to the concepts and practices underlying modern database systems.
"It’s all going to move to data science as soon as enough data scientists get trained by our universities to do this stuff. It’s fairly clear to me that you’re probably not going to retread a business analyst to be a data scientist because you’ve got to know statistics, you’ve got to know machine learning. You’ve got to know what regression means, what Naïve Bayes means, what k-Nearest Neighbors means. It’s all statistics.
All of that stuff turns out to be defined on arrays. It’s not defined on tables. The tools of future data scientists are going to be array-based tools. Those may live on top of relational database systems. They may live on top of an array database system, or perhaps something else. It’s completely open."
He claims that Google, who created Map-Reduce 10 years ago, stopped using Map-Reduce 5 years ago... Will Hadoop adjust and transform itself?
JSON is apparently his data format of choice for semi-structured data,
and SQL preferred high-level interface. ACID is required, so HDFS is not good enough.
and SQL preferred high-level interface. ACID is required, so HDFS is not good enough.
free ebook
http://www.tamr.com/landing-pages/getting-data-right/
"preview edition of the O'Reilly ebook "Getting Data Right: Tackling the Challenges of Big Data Volume and Variety" (available Fall 2015) and get early access to Dr. Michael Stonebraker's chapter on "Data Curation at Scale".
"Google Cloud Dataflow is a managed service for creating data pipelines that ingest, transform, and analyze massive amounts of data, up into the exabyte range."
No comments:
Post a Comment