Data Engineer · Data Scientist · AI Systems · Los Angeles, CAUCLA Anderson

SHREYANSSATPATHY

Architecting data platforms that power AI-driven systems and enable real-world business decisions at scale.

Scroll
01 — About

Who I Am

Bridging the gap between
raw data and
strategic, AI-enabled outcomes.

$1.6M
Annual cost savings delivered
91%
ML clustering accuracy
40%
Reduction in claims lifecycle
20+
Data sources unified in lakehouse

Shreyans Satpathy is a data engineer specializing in building scalable data platforms that power AI-driven systems and decision-making. With experience at Mercedes-Benz, he has architected lakehouse solutions on Azure and Databricks to unify complex data ecosystems and enable real-world business impact.

Currently pursuing a Master's in Business Analytics at UCLA Anderson, he focuses on bridging the gap between data engineering and strategic, AI-enabled outcomes. He is also actively working with UCLA Health on architecting data platforms for AI/ML medical devices and venture capital research.

UCLA Anderson School of Management
Master of Science, Business Analytics
2025–2026
Vellore Institute of Technology
B.Tech, Computer Science Engineering
2019–2023
02 — Experience

Where I've Built

Mercedes-Benz R&D
AUG 2023 – AUG 2025
Bengaluru, India
Data Science Engineer
  • Architected FACTS, a unified field quality analytics platform consolidating warranty data from 10+ sources into a single lakehouse, delivering $1.6M in annual cost savings and reducing claims lifecycle time by 40%.
  • Built a RAG-based Workshop Co-Pilot using Databricks Vector Search, hybrid retrieval (semantic + keyword), and Unity Catalog metadata filtering, enabling technicians to query historical repair data in real-time.
  • Developed auto-clustering pipelines for multilingual workshop claims using LLM-driven text translation and fine-tuned BERT; achieved 91% clustering accuracy across 6 languages.
  • Implemented ML classification models on warranty claims to auto-identify Top Issues and Severity levels, enabling proactive quality containment and prioritized root-cause investigation.
  • Designed A/B experiments benchmarking GenAI vs. legacy rule-based systems for anomaly detection, achieving 91% accuracy in identifying defect patterns and fraud anomalies.
  • Engineered end-to-end Medallion Architecture pipelines (Silver → Gold → Platinum) on Databricks + Azure Data Factory, processing millions of warranty records with optimized Spark jobs.
DatabricksDelta LakeAzure ADFRAGBERTPySparkMLflowTableau
Also as: Data Engineering Intern (Jan–Aug 2023)
  • Optimized data ingestion using parallelized API calls on Spark, reducing pipeline runtime by 90% for downstream ML models.
  • Built data quality validation frameworks catching logic and schema issues pre-production; reduced pipeline failure rates by 30%.
  • Re-architected semantic models with supplier analytics teams, improving dashboard latency and data accuracy for reporting layers.
UCLA Biodesign & BCG
2025 – PRESENT
Los Angeles, CA
Graduate Student Researcher, Data Platforms
  • Conducting EDA on investment trends and operational costs, identifying key correlations between funding stages and AI adoption rates to support strategic recommendations.
  • Working part-time with UCLA Health to architect a data platform consolidating use-cases around AI/ML medical devices and venture capital funding research.
Data PlatformsGCPPythonEDAForecasting
03 — Stack

Tools of the Trade

Python
PySpark
Apache Spark
Databricks
Delta Lake
Azure ADF
Airflow
Kafka
LLMs
RAG Pipelines
LangChain
Vector Search
BERT
MLflow
Medallion Architecture
SQL
GCP
AWS
Tableau
Power BI
Unity Catalog
Data Mesh
Python
PySpark
Apache Spark
Databricks
Delta Lake
Azure ADF
Airflow
Kafka
LLMs
RAG Pipelines
LangChain
Vector Search
BERT
MLflow
Medallion Architecture
SQL
GCP
AWS
Tableau
Power BI
Unity Catalog
Data Mesh
Databricks Certified Data Engineer Professional
Databricks · Professional
Certified
Databricks Certified Data Analyst Associate
Databricks · Associate
Certified
Languages & Platforms
PythonSQLPySparkApache SparkDatabricksAzure (ADF, Synapse, ADLS Gen2)GCPAWS
Data Engineering
ETL / ELTApache AirflowKafkaHadoopMedallion ArchitectureData LakehouseData MeshDelta Lake
AI / ML
LLMsRAG PipelinesLangChainVector SearchBERT Fine-TuningScikit-learnMLflowForecasting
Analytics & Tools
TableauPower BIA/B TestingExperimentationGit / CI/CDREST APIsUnity Catalog
04 — Writing

Thinking Out Loud

LinkedIn · Shreyans Satpathy
LinkedIn · Shreyans Satpathy
LinkedIn · Shreyans Satpathy

Want the
full picture?

Download my resume for a complete overview of my experience, education, and technical skills.

Download Resume →
05 — Contact

Let's
Build
Together.

Open to data engineering, data science and AI platform roles. Always happy to connect, collaborate, or just talk data.