Hello!! I am Advait Khawase
a Data Engineer.

dedicated to building robust, data-driven solutions.

My Skills

Databricks

PySpark

Spark

Hadoop

Kafka

Airflow

Azure Data Factory

Azure Cloud

AWS Cloud

Inforamtica Powercenter

Git

Shell Scripting

Cpp

SQL

Experience

Data EngineerDec 2023 - Present

Jio Platform Limited (JPL)

I have worked extensively with technologies such as PySpark, Hive, Apache Airflow, Informatica PowerCenter, Azure Data Factory, SQL, and Linux shell scripting. My experience includes developing and optimizing data pipelines, automating processes, managing workflows, and improving ETL efficiency while ensuring high system reliability and quick incident response.

About Me

I am a data engineering professional with hands-on experience designing and implementing real-time data pipelines using Apache Airflow, Kafka, Spark, and Databricks. I have built automated workflows to ingest new user data from APIs, publish to Kafka topics, and process high-velocity data streams with Spark, achieving low-latency, reliable data delivery. My expertise also includes leveraging Databricks for scalable data processing and transformation, as well as monitoring Kafka message flow and topic health to ensure seamless operations. I am passionate about building efficient, production-ready data solutions that drive business value.

logo

Projects

HRDocFlow - Unstructured Document Pipline for Jio, Retail and ENM

Developed Informatica and Linux shell-based workflows to efficiently track, filter, and transfer thousands of daily HR files from NAS to HDFS across multiple teams, including Jio, Retail, and ENM. Additionally, I scripted Apache Airflow jobs to identify and selectively move key HR documents based on document IDs into Azure storage for downstream processing. Leveraging Databricks, I processed and loaded files from Azure Data Lake Storage into business-specific volumes by handling data with a three-day lag to prevent missing files caused by weekends or delays.

Real-Time Data Streaming Pipeline

I designed and implemented a real-time data pipeline for new user tracking utilizing Apache Airflow, Kafka, Spark, and Cassandra, with all components containerized via Docker. The pipeline automated data ingestion through a scheduled Airflow DAG that fetched user data from an API, published it to a Kafka topic, and processed it with Spark to stream into Cassandra—all within a latency of under 10 seconds. Additionally, I managed Airflow metadata using PostgreSQL and monitored Kafka message flow and topic health through the Kafka Control Center to ensure smooth and reliable data processing.

Stock api batch processing

Developed a data pipeline project that collects real-time stock market data through an external API and stores it in an Amazon S3 bucket. An AWS Glue crawler is used to automatically detect and catalog the latest data stored in S3. The data is then processed using AWS Glue jobs and loaded into Amazon Redshift, enabling efficient querying and advanced analysis for reporting and business intelligence use cases.

Want to start a Conversation
Let's get Connected..

advaitkhawase15@gmail.com https://www.linkedin.com/in/advait-khawase https://github.com/advaitkhawase15 https://leetcode.com/u/advaitkhawase15 https://www.geeksforgeeks.org/user/advaitkhawase

Hello!! I am Advait Khawase a Data Engineer.

Want to start a Conversation Let's get Connected..

Hello!! I am Advait Khawase
a Data Engineer.

Want to start a Conversation
Let's get Connected..