Key Skills: Observability, Site Reliability Engineer, DevOps
Roles and Responsibilities:
- Manage and support the organization's monitoring and observability platforms across a global footprint.
- Collaborate with various internal teams to understand requirements and design enterprise-level observability solutions.
- Maintain and support legacy monitoring systems within the Production Management organization.
- Drive the strategic development and delivery of end-to-end observability solutions across the enterprise.
- Perform deep analysis using interpretive thinking to identify problems and develop innovative solutions.
- Influence and contribute to strategic decisions impacting business outcomes through technical expertise and advisory support.
- Communicate effectively with stakeholders and use strong interpersonal and diplomacy skills to influence outcomes.
- Ensure system reliability, performance, and scalability through proactive monitoring and automation.
- Perform additional duties and responsibilities as assigned.
Skills Required:
- Strong experience in observability and site reliability engineering practices is required.
- Hands-on experience with monitoring tools, logging systems, and distributed tracing is expected.
- Understanding of DevOps practices, CI/CD pipelines, and automation is beneficial.
- Experience in troubleshooting complex systems and performing root cause analysis is required.
- Knowledge of cloud platforms and scalable system design is advantageous.
- Ability to analyze large datasets and derive actionable insights is important.
- Strong communication, collaboration, and stakeholder management skills are expected.
- Experience working in global or enterprise environments is preferred.
Education: Bachelor's degree in Computer Science, Engineering, or a related field is required.