▶ Hello, world. I'm

Prashant Gangoliya
Data Engineer.

I design and build the pipelines, platforms, and infrastructure that turn raw data into reliable, scalable, and actionable intelligence.

6+ Years experience
12 Projects shipped
TBs of Data processed

01 / About

Who I am

Hi! I'm Prashant Gangoliya, a Data Engineer based in Bengaluru, India. I specialize in building robust data infrastructure — from ingestion and transformation to serving — that empowers teams to make confident, data-driven decisions.

With a background in data engineering, data analytics and cloud architecture, I care deeply about data quality, pipeline observability, and engineering systems that are easy to reason about and maintain.

When I'm not wrangling data, learning about data engineering or AI/ ML, or exploring a new trail on my bike.

⇩ Download Resume
⚡ Data Pipeline Engineering

Building fault-tolerant batch and streaming pipelines at scale, from ingestion to delivery.

☁ Cloud & Platform Infrastructure

Designing lakehouse architectures and cloud-native data platforms on AWS, GCP, and Azure.

🔍 Data Quality & Observability

Instrumenting pipelines with lineage, monitoring, and testing so teams trust their data.

What I work with

Ingestion & Streaming

Databricks Apache Spark Structured Streaming

Processing & Transform

Apache Spark PySpark SQL dbt

Orchestration

Databricks Workflows Azure Data Factory Apache Airflow

Storage & Warehousing

Delta Lake Apache Iceberg Parquet

Cloud

Azure Docker

Languages

Python SQL Bash Scala

03 / Projects

Things I've built

🔄
Real-time CDC Pipeline

End-to-end change data capture system using Debezium and Kafka to stream Postgres changes into a Delta Lake data lakehouse with sub-second latency.

Kafka Debezium Delta Lake PySpark
🏗️
dbt Analytics Platform

Modular dbt project implementing a Medallion architecture (Bronze / Silver / Gold) on Snowflake, with CI/CD, data contracts, and automated testing.

dbt Snowflake GitHub Actions
📊
Airflow Orchestration Framework

Reusable Airflow DAG framework with dynamic task generation, Slack alerting, SLA monitoring, and a custom operator library for common data tasks.

Airflow Python Docker AWS
☁️
Lakehouse on AWS

Infrastructure-as-code for a production lakehouse on S3 + Glue + Athena + Iceberg, provisioned with Terraform and cost-optimized with intelligent tiering.

Terraform AWS Glue Iceberg Athena
🔍
Data Quality Monitor

Lightweight data observability tool that profiles tables, detects anomalies, tracks schema changes, and pushes alerts to Slack and PagerDuty.

Python Great Expectations Postgres
Streaming Analytics Dashboard

Flink-powered streaming job that aggregates e-commerce events in real time and serves live KPIs to a Grafana dashboard via a Redis sink.

Flink Redis Kafka Grafana

Let's build something together

I'm open to freelance projects, full-time roles, and interesting collaborations. Drop me a message and I'll get back to you within a day.

✉ E-Mail