Role: SRE Architect
Experience: 15- 22 years
Location: Greater Noida/Pune/Hyderabad
Core Skills: SRE, Observability, Cloud, Pre-sales
Work Mode: WFO
We at Coforge are looking for SRE Architects with following skill set.
We are looking for a highly skilled and client-facing SRE Architect to join our dynamic team. This role is pivotal in driving reliability-focused solutions across client engagements, supporting pre-sales activities, and shaping SRE strategy and implementation for diverse industries.
Preferred Location: Noida or Pune or Hyderabad (last choice)
Client Engagement & Solutioning:
- Understand client environments, pain points, and reliability goals.
- Conduct discovery workshops and technical assessments.
- Design and propose tailored SRE solutions aligned with client needs.
- Define SLIs/SLOs, error budgets, and reliability metrics.
Architecture & Strategy:
- Architect scalable, resilient, and observable systems.
- Define SRE frameworks, tooling strategies, and automation roadmaps.
- Lead reliability reviews and other initiatives.
Presales & Consulting:
- Collaborate with sales and solution teams to support RFPs/RFIs.
- Create solution presentations, demos, and technical proposals.
- Represent SRE COE in client pitches and industry forums.
Tooling & Implementation:
- Recommend and integrate observability, incident management, and automation tools.
- Guide implementation teams on best practices for reliability engineering.
- Evaluate and adopt emerging technologies in the SRE space.
Thought Leadership & Enablement:
- Develop reusable assets, templates, and accelerators for SRE adoption.
- Mentor junior SREs and contribute to internal capability building.
- Publish whitepapers, blogs, and participate in community events.
Required Skills:
- 15+ years of total experience with 6+ years of relevant experience
- Must have strong understanding of SRE concepts and principles i.e. SLIs, SLOs, error budgets.
- Must have working experience on AI-Driven Observability & Monitoring
- Must have hands-on experience with any one of observability tools (e.g., Prometheus, Grafana, Datadog, Dynatrace, Splunk, AppDynamics etc.).
- Must have prior expertise working with any two of cloud platforms i.e. AWS, Azure, GCP.
- Must have experience with Incident Prediction, Root Cause Analysis and Blameless Postmortems
- Must have prior experience working with Automation & Self-Healing Systems
- Must have Proficiency in automation tools is essential, including but not limited to Terraform, Ansible, and CI/CD pipelines
- Must have prior experience working with Incident Management tools like PagerDuty or Opsgeine
- Must have experience in designing and proposing tailored SRE solutions including prior experience working on solution presentations, demos, and technical proposals.
- Should have familiarity with incident management, reliability reviews, and automation strategies.
- Should have ability to understand client environments and reliability goals.
- Should have strong communication and stakeholder management skills.
- Should have capability to architect scalable, resilient, and observable systems.
- Nice to have, ability to publish whitepapers, blogs, and contribute to community events.
- Nice to have, prior experience mentoring junior engineers and building internal capabilities