Monday, June 13, 2016

Diffbot: Web + WebKit + AI => global knowledge database ==

Another Stanford-related startup that could "change the world"...
"Semantic Web" vision may be realized after all, just in very different way, built by AI :)

very interesting podcast interview:
Using AI to build a comprehensive database of knowledge - O'Reilly Media

"Diffbot - a company dedicated to building large-scale knowledge databases. Diffbot is at the heart of many web applications, and it’s starting to power a wide array of intelligent applications.

… Roughly, what happens when our robot first encounters a page is we render the page in our own customized rendering engine, which is a fork of WebKit that's basically had its face ripped off. It doesn't have all the human niceties of a web browser, and it runs much faster than a browser because it doesn't need those human-facing components. … The other difference is we've instrumented the whole rendering process. We have access to all of the pixels on the page for each XY position. … [We identify many] features that feed into our semi-supervised learning system. Then millions of lines of code later, out comes knowledge."

Diffbot announced it was working on its version of an automated "Knowledge Graph" by crawling the web and using its automatic web page extraction to build a large database of structured web data.

