Job Description
AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
ABOUT THE ROLE
We are looking for a Senior Site Reliability Engineering to strengthen our platform reliability and observability capabilities. You will own the design and operation of monitoring infrastructure — including Datadog APM, alerting, and distributed tracing — across Kubernetes-based microservices on AWS. The role spans backend engineering and SRE practice in roughly a 65/35 split, with direct involvement in CI/CD integration and observability automation. You will also support internal teams in adopting monitoring best practices as we modernize our R&D platform.
WHAT YOU WILL DO
- Design, build, and maintain scalable backend and platform components;
- Implement and manage observability solutions across distributed systems;
- Configure dashboards, alerts, and APM for tracing, metrics, and logging;
- Monitor and improve system reliability, scalability, and performance;
- Deploy, operate, and maintain services in Kubernetes environments;
- Integrate observability tools into CI/CD pipelines and cloud infrastructure;
- Automate monitoring and operational workflows using scripting;
- Provide operational and training support for observability platforms, especially Datadog;
- Collaborate with engineering teams to improve system visibility and reliability practices.
MUST HAVES
- 4+ years of experience with Python, Node.js, or Java;
- Hands-on experience with API integrations;
- Strong experience in Kubernetes environments;
- Experience with Datadog or similar tools such as Prometheus and Grafana;
- Ability to configure dashboards, alerts, and APM;
- Experience monitoring containerized and microservices architectures;
- Hands-on experience with AWS;
- Experience integrating observability tools into cloud environments;
- Experience with CI/CD integrations for observability;
- Ability to automate monitoring and operational tasks using scripting;
- Upper-intermediate English level.
NICE TO HAVES
- Experience owning and operating an internal engineering platform, especially observability platforms;
- Demonstrated ownership of reliability, scalability, and performance;
- Ability to proactively lead maintenance and platform improvements;
- Experience installing and configuring Datadog agents and integrations;
- Experience managing API keys and secure configurations;
- Experience managing user roles and access controls;
- Familiarity with Go (Golang);
- Experience with additional observability tools such as New Relic, Dynatrace, Elastic Stack, or Splunk.
PERKS AND BENEFITS
- Remote work & Local connection: Work where you feel most productive and connect with your team in periodic meet-ups to strengthen your network and connect with other top experts.
- Legal presence in India: We ensure full local compliance with a structured, secure work environment tailored to Indian regulations.
- Competitive Compensation in INR: Fair compensation in INR with dedicated budgets for your personal growth, education, and wellness.
- Innovative Projects: Leverage the latest tech and create cutting-edge solutions for world-recognized clients and the hottest startups.