Hello!! I am Advait Khawase
a Data Engineer.

dedicated to building robust, data-driven solutions.

My Skills
Databricks
PySpark
Spark
Hadoop
Kafka
Airflow
Azure Data Factory
Azure Cloud
AWS Cloud
Git
Shell Scripting
NiFi
LangGraph
SQL
Experience
Data EngineerDec 2023 - Present
Jio Platform Limited (JPL)
  • Migrated 50+ data workflows handling 1M–10M+ records daily for Partner Center from Informatica to Databricks gold layer, achieving ~15% performance boost via serverless clusters and optimized joins.
  • Migrated 25+ projects from SAP HANA stored procedures to Hadoop-based PySpark pipelines, reducing SAP HANA compute costs by ~35% via partition pruning and predicate pushdown.
  • Conducted POC for NiFi and ADF during cloud migration, leading to adoption of NiFi for historical ingestion of ~1TB+ data per business unit and ADF for full & incremental ETL across 400+ pipelines.
  • Managed 300+ workflows covering 6,000+ tables, implementing Medallion architecture with schema evolution, data quality checks, and log maintenance.
  • Developed 100+ shell scripts for Sqoop and Airflow automation and 25+ Python scripts for NiFi development, cutting manual effort from hours to under 5 minutes per task.
About Me

Data Engineer at Jio Platforms owning 300+ workflows across 6,000+ tables in trade compliance, partner operations, and document processing. I design and deliver end-to-end data solutions — from raw ingestion to gold-layer analytics — on Databricks, Hive/HDFS, and Azure.

Specialized in Medallion architecture with schema evolution, data quality enforcement, and log maintenance at scale. Core stack: PySpark, ADF, Airflow, NiFi, Python, SQL. I've led cloud migrations off SAP HANA and Informatica, cutting compute costs by ~35% and reducing manual effort from hours to minutes through automation.

Beyond enterprise pipelines, I build AI-augmented tools — including a full-stack job aggregation platform powered by Gemini LLM and LangGraph orchestration that received a seed funding offer. I hold a B.Tech in Computer Science (CGPA 8.7) and am an AWS Certified Cloud Practitioner.

AWS Certified Cloud Practitioner
Projects
PythonGemini APILangGraphPlaywrightNode.jsReactPostgreSQLAWS
Full-stack job platform with an AI-driven scraper using Gemini LLM to autonomously find job links, paginate, and navigate ATS iframes (Greenhouse, Lever, Workday) via 3-tier fallback. Scrapes 3,000+ jobs with async Playwright (5-tab concurrency) into AWS RDS PostgreSQL. React + Node.js frontend/backend with AWS Cognito auth and CI/CD; migrating orchestration to LangGraph. Received seed funding offer.
PySpark DataFrame Lineage Tool
PythonPython ASTPySparkReactFlowDagreReactVite
Static analysis tool parsing PySpark via Python AST — no execution required — extracting 10+ operation types into a node-edge graph. React frontend renders interactive dataframe dependency graphs with column-level lineage and Dagre auto-layout.
Apache AirflowKafkaSparkCassandraDockerPostgreSQLPython
Designed and implemented a real-time data pipeline for new user tracking utilizing Apache Airflow, Kafka, Spark, and Cassandra, with all components containerized via Docker. The pipeline automated data ingestion through a scheduled Airflow DAG that fetched user data from an API, published it to a Kafka topic, and processed it with Spark to stream into Cassandra — all within a latency of under 10 seconds.
PythonAWS S3AWS GlueAmazon RedshiftREST API
Developed a data pipeline that collects real-time stock market data through an external API and stores it in Amazon S3. An AWS Glue crawler automatically detects and catalogs the latest data in S3. The data is then processed using AWS Glue jobs and loaded into Amazon Redshift, enabling efficient querying and advanced analysis for reporting and business intelligence use cases.
Contact Me

Want to start a Conversation?
Let's get Connected.

Open to new opportunities, collaborations, and interesting conversations. I'll get back to you as soon as possible.

© 2026 Advait Khawase. All rights reserved.