Passionate about building powerful, real-time data systems. Specialize in transforming complex data challenges into innovative, efficient solutions.
About Me
I’m a passionate Data Engineer who thrives on building powerful, real-time data systems and streamlining workflows to make data smarter and more efficient. I focus on creating solutions that handle massive amounts of information with ease, ensuring everything works seamlessly together. From automating processes to boosting data accuracy, I’m all about helping organizations make fast, informed decisions. My goal? To transform data challenges into opportunities that drive innovation and impact. Let’s work together to make your data truly amazing.
Skills
S3
Redshift
Glue
EMR
SageMaker
Lambda
Kinesis
Athena
Step Functions
CloudWatch
DataSync
ADF
Synapse Analytics
Python
SQL
PySpark
Databricks
HDFS
Fivetran
Matillion
Airflow
Snowflake
Tableau
Power BI
Projects
Amazon Ads Performance Analytics
Built a scalable data pipeline for real-time Amazon Ads performance analysis and optimization.
As a Data Engineer at Amazon, I designed and deployed a production-grade data pipeline to process and analyze Amazon advertising campaign performance data. This project supported the Ads team by transforming raw impression, click, and conversion data into actionable insights, enabling optimization of ad spend and campaign strategies. The pipeline handled terabyte-scale datasets, integrated seamlessly with AWS infrastructure, and delivered near real-time KPIs to stakeholders via a centralized data warehouse.
Responsibilities & Features:
Data Ingestion: Built a scalable ingestion layer to pull raw ad event data (impressions, clicks, conversions, spend) from Amazon’s internal S3 buckets, processing millions of records daily.
ETL Pipeline: Engineered an ETL workflow using Apache Spark and Python to clean, enrich, and aggregate ad data, computing KPIs like click-through rate (CTR), cost-per-click (CPC), and return on ad spend (ROAS).
Storage: Loaded processed data into Amazon Redshift, optimizing for high-performance analytical queries across campaign, region, and time dimensions.
Analytics: Delivered structured datasets supporting queries for campaign performance trends, budget efficiency, and audience targeting effectiveness.
Automation: Orchestrated the pipeline with AWS Step Functions and Amazon CloudWatch Events, ensuring daily execution and monitoring for reliability.
Collaboration: Partnered with Data Scientists and Product Managers to refine data models and ensure outputs met reporting needs.
Tech Stack:
Languages: Python (Pandas, PySpark for data processing)
Big Data: Apache Spark (via Amazon EMR for distributed processing)