Tuesday, August 08, 2017

GCP Cloud Dataflow, Apache Beam


Gaming Analytics Platform with Kir Titievsky, Eric Anderson, and Tino Tereshko | Google Cloud Platform Podcast




What is Google Cloud Pub/Sub?  |  Cloud Pub/Sub Documentation  |  Google Cloud Platform

Cloud Dataflow with Frances Perry | Google Cloud Platform Podcast

Cloud Dataflow - Batch & Stream Data Processing  |  Google Cloud Platform

BigQuery - Analytics Data Warehouse  |  Google Cloud Platform



Apache Beam
"An advanced unified programming model
Implement batch and streaming data processing jobs that run on any execution engine."


Beam Overview
"Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

Beam is particularly useful for Embarrassingly Parallel data processing tasks, in which the problem can be decomposed into many smaller bundles of data that can be processed independently and in parallel. You can also use Beam for Extract, Transform, and Load (ETL) tasks and pure data integration. These tasks are useful for moving data between different storage media and data sources, transforming data into a more desirable format, or loading data onto a new system."

No comments: