Hadoop analytics - Worked on Hadoop system, migrated all the existing processes in google
cloud platform. As a primary tool we used dataproc cluster. Processes were written in apache
spark. For real time streaming POC was done on pubsub and apache kafka. Scheduling was done
through Apache airflow
Responsibility:
* Working on existing architecture of the Hadoop ecosystem.
* POC on google cloud platform tools- Dataproc, Google cloud storage, Bigquery, apache
beam. Apache Airflow
* Migrated processes into google cloud platform using Dataproc, Apache spark,
Bigquery,pubsub.