Selected work · in progress

Projects.

Things I'm building and exploring at the intersection of Big Data and AI — with a bias toward PB-scale, AI-first systems, and toward what GPUs unlock when you point massive parallel compute at massive data.

Building

Log Analytics Agent

MCP ServerAI AgentAI SkillAWS

An MCP server / skill / agent that runs multi-step, sequential analysis over terabytes of logs in Amazon S3 and CloudWatch — using the analytics engine of choice (Athena, Spark, or OpenSearch) — and returns an actual diagnosis: why an EMR job failed, where a Spark job is bottlenecked, which stage spilled, what to change. It turns “grep through terabytes of logs” into a conversation.

MCP · LLM agent · Amazon S3 · CloudWatch · Athena / Spark · EMR

Exploring

Petabyte Log Intelligence on GPUs

GPUBigDataAI Agent

Push the log-analytics agent to petabyte scale by crunching telemetry on GPUs: RAPIDS / cuDF for columnar log processing and FAISS-GPU for semantic search and clustering over embeddings — so anomalies and root causes surface in seconds, not hours, on volumes where CPU pipelines fall over.

RAPIDS (cuDF) · FAISS-GPU · embeddings · Parquet on S3

Exploring

GPU-native ETL for model training

GPUData platforms

Attack the data-ingestion bottleneck that actually gates training throughput. RAPIDS-based feature pipelines that turn multi-hour CPU ETL into minutes, so teams can iterate and retrain on TB-scale data daily instead of weekly.

RAPIDS · Spark-RAPIDS · Apache Arrow · feature stores

Idea

Near-real-time wildfire spread prediction

GPUGeospatialML

A harder, real-world target: fuse petabytes of satellite imagery with IoT sensor and weather streams, and run GPU geospatial + ML to forecast wildfire spread faster than today's systems can. The bottleneck isn't the model — it's crunching multimodal data fast enough to matter.

GPU geospatial · multimodal fusion · streaming · ML

More writeups land in the notes as these ship — code and demos to follow.