Raghu Arun Maram

About Me

I’m a passionate Data Engineer who thrives on building powerful, real-time data systems and streamlining workflows to make data smarter and more efficient. I focus on creating solutions that handle massive amounts of information with ease, ensuring everything works seamlessly together. From automating processes to boosting data accuracy, I’m all about helping organizations make fast, informed decisions. My goal? To transform data challenges into opportunities that drive innovation and impact. Let’s work together to make your data truly amazing.

Skills

S3

Redshift

Glue

EMR

SageMaker

Lambda

Kinesis

Athena

Step Functions

CloudWatch

DataSync

ADF

Synapse Analytics

Python

SQL

PySpark

Databricks

HDFS

Fivetran

Matillion

Airflow

Snowflake

Tableau

Power BI

Projects

Amazon Ads Performance Analytics

Built a scalable data pipeline for real-time Amazon Ads performance analysis and optimization.

Python, EMR, AWS S3, Glue, Redshift, Lambda, Step Functions, CloudWatch,

As a Data Engineer at Amazon, I designed and deployed a production-grade data pipeline to process and analyze Amazon advertising campaign performance data. This project supported the Ads team by transforming raw impression, click, and conversion data into actionable insights, enabling optimization of ad spend and campaign strategies. The pipeline handled terabyte-scale datasets, integrated seamlessly with AWS infrastructure, and delivered near real-time KPIs to stakeholders via a centralized data warehouse.

Responsibilities & Features:

Data Ingestion: Built a scalable ingestion layer to pull raw ad event data (impressions, clicks, conversions, spend) from Amazon’s internal S3 buckets, processing millions of records daily.
ETL Pipeline: Engineered an ETL workflow using Apache Spark and Python to clean, enrich, and aggregate ad data, computing KPIs like click-through rate (CTR), cost-per-click (CPC), and return on ad spend (ROAS).
Storage: Loaded processed data into Amazon Redshift, optimizing for high-performance analytical queries across campaign, region, and time dimensions.
Analytics: Delivered structured datasets supporting queries for campaign performance trends, budget efficiency, and audience targeting effectiveness.
Automation: Orchestrated the pipeline with AWS Step Functions and Amazon CloudWatch Events, ensuring daily execution and monitoring for reliability.
Collaboration: Partnered with Data Scientists and Product Managers to refine data models and ensure outputs met reporting needs.

Tech Stack:

Languages: Python (Pandas, PySpark for data processing)
Big Data: Apache Spark (via Amazon EMR for distributed processing)
Cloud: AWS S3 (raw data storage), AWS Redshift (data warehousing), AWS Lambda (lightweight transformations), AWS Step Functions (workflow orchestration)
Monitoring: Amazon CloudWatch (logs and alerts)
Database: Amazon Redshift (with local PostgreSQL for prototyping)
Version Control: Git (internal Amazon repos)

Data Migration for Legacy Systems

Migrated 5TB of data to Snowflake with zero downtime, boosting query performance by 30% using Airflow and Tableau dashboards.

Snowflake, Airflow, S3, Tableau