Hello!! I am Advait Khawase
a Data Engineer.

dedicated to building robust, data-driven solutions.

My Skills
Databricks
PySpark
Spark
Hadoop
Kafka
Airflow
Azure Data Factory
Azure Cloud
AWS Cloud
Inforamtica Powercenter
Git
Shell Scripting
Cpp
SQL
Experience
Data EngineerDec 2023 - Present
Jio Platform Limited (JPL)
I have worked extensively with technologies such as PySpark, Hive, Apache Airflow, Informatica PowerCenter, Azure Data Factory, SQL, and Linux shell scripting. My experience includes developing and optimizing data pipelines, automating processes, managing workflows, and improving ETL efficiency while ensuring high system reliability and quick incident response.
About Me
I am a data engineering professional with hands-on experience designing and implementing real-time data pipelines using Apache Airflow, Kafka, Spark, and Databricks. I have built automated workflows to ingest new user data from APIs, publish to Kafka topics, and process high-velocity data streams with Spark, achieving low-latency, reliable data delivery. My expertise also includes leveraging Databricks for scalable data processing and transformation, as well as monitoring Kafka message flow and topic health to ensure seamless operations. I am passionate about building efficient, production-ready data solutions that drive business value.

logo
Projects
HRDocFlow - Unstructured Document Pipline for Jio, Retail and ENM
Developed Informatica and Linux shell-based workflows to efficiently track, filter, and transfer thousands of daily HR files from NAS to HDFS across multiple teams, including Jio, Retail, and ENM. Additionally, I scripted Apache Airflow jobs to identify and selectively move key HR documents based on document IDs into Azure storage for downstream processing. Leveraging Databricks, I processed and loaded files from Azure Data Lake Storage into business-specific volumes by handling data with a three-day lag to prevent missing files caused by weekends or delays.
I designed and implemented a real-time data pipeline for new user tracking utilizing Apache Airflow, Kafka, Spark, and Cassandra, with all components containerized via Docker. The pipeline automated data ingestion through a scheduled Airflow DAG that fetched user data from an API, published it to a Kafka topic, and processed it with Spark to stream into Cassandra—all within a latency of under 10 seconds. Additionally, I managed Airflow metadata using PostgreSQL and monitored Kafka message flow and topic health through the Kafka Control Center to ensure smooth and reliable data processing.
Developed a data pipeline project that collects real-time stock market data through an external API and stores it in an Amazon S3 bucket. An AWS Glue crawler is used to automatically detect and catalog the latest data stored in S3. The data is then processed using AWS Glue jobs and loaded into Amazon Redshift, enabling efficient querying and advanced analysis for reporting and business intelligence use cases.
© 2025 Advait Khawase. All rights reserved.