Role Summary:
A strategic and hands-on Operations Lead to ensure the resilience, performance, and cost-effectiveness of our Azure-based data platform. This role is at the heart of our data ecosystem,
combining platform reliability, incident response, SLA management, cost optimization (FinOps), and deployment oversight.
You will be the single point of contact for operational issues, driving rapid resolution during outages, leading communications with stakeholders, and shaping the processes that keep our
platform running smoothly and efficiently.
Responsibilities:
- Own the day-to-day stability and performance of our Azure data platform (Synapse, Databricks, ADF, Power BI).
- Act as the primary point of contact for incidents and outages — driving resolution, root cause analysis, and clear stakeholder communication.
- Define, implement, and enforce SLAs for critical pipelines, datasets, and reporting assets.
- Run FinOps forums with business stakeholders to improve cost transparency, accountability, and efficiency across the platform.
- Oversee CI/CD pipelines and deployments, ensuring reliable, safe, and compliant delivery of data platform changes.
- Champion monitoring, observability, and automation to detect and resolve issues proactively while reducing manual intervention.
- Develop and maintain operational runbooks, escalation protocols, and incident playbooks to strengthen resilience.
- Partner with data engineering and analytics teams to align operational strategy with business goals and future platform roadmap.
Skills Required:
- Operational Leadership: Proven track record in leading operations for large-scale data platforms, ensuring stability, performance, and stakeholder trust.
- Incident & SLA Management: Skilled in incident triage, root cause analysis, escalation handling, and defining/enforcing SLAs with cross-functional teams.
- Azure Data Stack: Hands-on experience with Azure Synapse, Databricks, ADF, and Power BI, with the ability to guide best practices and optimisations.
- Automation & CI/CD: Familiar with CI/CD processes and automation to streamline deployments and reduce manual intervention.
- FinOps Mindset: Experience in cost management, usage reporting, and running forums with business stakeholders to drive accountability and efficiency.
- Monitoring & Observability: Knowledge of modern monitoring, alerting, and data quality frameworks to ensure proactive platform health management.