Responsible for ensuring the stability, availability, and performance of the organization’s technology
platforms. This position focuses on production operations, system reliability, incident management,
and continuous operational improvement across backend services, web applications, and mobile
applications. The role works closely with engineering, infrastructure, and security teams to maintain
resilient, observable, and well-governed production environments.
Responsibilities:
Production Operations & Reliability
• Own the operational health of production systems, including backend services, web platforms, and mobile applications
• Monitor system availability, performance, and capacity to ensure adherence to reliability Objectives
• Lead and participate in incident response, troubleshooting, root cause analysis, and post incident remediation
• Define, measure, and continuously improve operational metrics, SLAs, and SLOs
On-Call & Application Support
• Participate in a scheduled on-call rotation supporting 24/7 production environments
• Provide operational support for Node.js backend services, web applications, and React
Native mobile applications
• Serve as an escalation point for high-severity production incidents
• Respond to alerts within defined response time objectives and ensure timely resolution
• Ensure accurate incident documentation and follow-up actions
Monitoring, Observability & Reporting
• Implement, maintain, and optimize monitoring and observability solutions
• Build and maintain monitoring dashboards to provide real-time and historical visibility into system health
• Utilize tools such as New Relic to monitor application performance, errors, and user experience
• Continuously refine alerting thresholds to improve signal quality and reduce operational noise
Technical Operations
• Maintain hands-on involvement with cloud infrastructure, operating systems, and production configurations
• Support release, deployment, and change management processes with an emphasis on system stability
• Develop, maintain, and review operational runbooks, playbooks, and escalation procedures.
Automation & Continuous Improvement
• Design and implement automation to reduce manual operational effort and risk
• Improve system resilience through redundancy, failover mechanisms, and disaster recovery planning
• Identify recurring operational issues and drive permanent technical resolutions
Risk, Security & Compliance
• Collaborate with security teams to support vulnerability management and incident response
• Participate in disaster recovery testing, business continuity planning, and compliance activities
• Ensure production systems comply with operational, security, and regulatory requirements
Cross-Functional Collaboration
• Work closely with engineering teams to improve production readiness and operational reliability
• Provide operational input into system architecture and design decisions
• Act as a senior technical escalation resource without direct people management responsibilities
Skills & Qualifications:
• Production support of backend services (Node.js preferred)
• Web application support across frontend and backend components
• Mobile application support and operational troubleshooting (React Native preferred)
• Application performance monitoring using New Relic
• Design and maintenance of monitoring dashboards and alerting systems
• Cloud platforms (AWS, Azure, or GCP)
• Linux/Unix operating systems and networking fundamentals
• Infrastructure-as-code and automation tools
• Incident, change, and problem management practices
• Experience with third-party libraries and APIs
• Wide knowledge of the general mobile landscape, architectures, trends, and emerging technologies
• Excellent written, verbal and social skills
• Experience operating high-availability or distributed systems
• Familiarity with CI/CD pipelines for web and mobile applications
• Experience supporting 24/7 on-call rotations
• Effectively work in a matrix organization. Lead through influence
• Get things done attitude. Must be self-motivated and results-oriented
• Ability to work in a cross-functional, multi-cultural team and in a collaborative environment.
• Should be able to multi-task and plan, organize and prioritize multiple projects. Must have hands-on mentality
• Ability to work in a fast paced, multiple project environment on an independent basis and with minimal supervision
• Required Technologies: React Native, Javascript, JS Frameworks, Redux, GraphQL, AEM, Storybook, Chromatic, Typescript, UI Frameworks
• Relevant Technologies: NextJS, Angular, Cache, Adobe Experience Manager (AEM)
Behavioural Fit:
• Effectively work in a matrix organization. Lead through influence
• Get things done attitude
• Must be self-motivated and results oriented
• Ability to work in a cross-functional, multi-cultural & remote teams and in a collaborative environment.
• Ability to multi-task and plan, organize and prioritize multiple projects
• Works under pressure with constantly changing priorities and deadlines
• Must have a hands-on mentality