NexGen Cloud
Multi-region GPU cloud (OpenStack-based). Title updated during internal restructuring; scope unchanged.
- Built the infrastructure operations function around a clear operating model, escalation paths, and permission/IAM scopes, separating L1/L2 support from infrastructure engineering and reducing repeat escalations into the engineering team.
- Owned observability platform strategy: designed a unified monitoring architecture feeding a new Network Operations Centre (NOC) and led build-vs-buy / total-cost-of-ownership selection (open-source Prometheus/VictoriaMetrics + DCGM vs commercial), scaling the approach toward a large-scale NVIDIA B200 SuperPOD region.
- Primary engineer for centralised bare-metal observability, building a NetBox-driven stack where in-region collectors feed a central VictoriaMetrics and Grafana deployment with an alert suite tuned for signal over noise. Currently centralising monitoring across the EU bare-metal region (two clusters).
- Built CX and L2 enablement across OpenStack, Linux, and networking: training tracks, runbooks, decision trees, and scoped self-service workflows.
- Coordinated data-centre, hardware, and partner engagement, and led the observability procurement process through vendor evaluation, scenario presentation, and partner justification.
- Established incident-response and root-cause-analysis (RCA) practice; led major-incident response and authored RCAs for customer-impacting outages.