Description
AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we're looking for talented people who want to help.
You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You'll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you'll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
As an Network Development Engineer on the Capacity Restoration Team, you are an experienced Builder. You will own the design and delivery of automation systems that transform capacity restoration from a manual, labor-intensive process into a scalable, self-service operation. You will architect solutions, lead technical projects end-to-end, mentor NDEs, and partner with service teams to integrate restoration automation into the broader tooling ecosystem. This role demands strong software engineering skills, deep networking knowledge, and the ability to drive results across organizational boundaries.
Technical strategy is defined; component design is not. You are trusted with autonomy and are expected to make pragmatic trade-off decisions at product and component levels, identify and eliminate patterns affecting reliability and availability, and define and simplify team processes.
Key job responsibilities
Automation & System Design
- Own design and delivery of automation systems that restore out-of-service capacity with minimal manual intervention.
- Develop end-to-end link automation frameworks that transform manual troubleshooting into automated, system-guided restoration workflows.
- Build and improve centralized device health validation services for border, backbone, and regional network layers.
- Design self-service frameworks and next-step engines for automated troubleshooting and remediation.
- Architect solutions that are scalable, secure, maintainable, and extensible across a growing multi-timezone team.
Metrics & Monitoring
- Create and maintain the metrics and monitoring infrastructure for CRT: capacity out-of-service dashboards, burndown tracking, TTR, restoration success rate, and SLA compliance.
- Build systems to track and optimize bandwidth utilization trends, available capacity vs. projected peak, and redundancy coverage.
- Design automation to improve in-team resolution percentage and reduce escalation to Operations and Engineering teams.
Technical Leadership & Collaboration
- Lead technical projects end-to-end; drive engineering best practices for development, testing, deployment, and operational excellence.
- Mentor NDEs; actively participate in hiring and conducting technical assessments.
- Partner with Engineering, Operations, Tooling, and Software teams to integrate restoration automation into the broader ecosystem.
- Identify and proactively address architectural or process deficiencies affecting restoration performance, reliability, and scalability.
Operations
- Participate in on-call rotations and operational reviews for follow-the-sun coverage.
- Lead complex incident response for capacity events; drive root-cause analysis and remediation for systemic failures.
A day in the life
- Review the capacity out-of-service dashboard; triage restoration priorities by customer impact and network health blocking status.
- Design and implement scalable automation to systematically restore capacity at fleet level (e.g., end-to-end link lifecycle automation).
- Solve complex network problems applying correct technologies and best practices; lead design reviews and architecture discussions for restoration tooling.
- Identify patterns affecting reliability/availability and drive them out through automation improvements.
- Partner with service teams to integrate self-service automation capabilities into the broader tooling ecosystem.
- Mentor engineers on software engineering best practices and system design.
- Participate in hiring, interview processes, and technical assessments for the team.
- Contribute to long-term technical strategy and automation roadmap; hand off active restoration work to your counterpart in the next time zone.
About The Team
The Capacity Restoration Team (CRT) sits within Backbone, Enterprise, Regional Engineering (BERE) organization. Our mission is to own end-to-end restoration of out-of-service inter-metro and intra-metro network capacity.
CRT reduces operational backlog, improves network health monitoring systems, and drives time-to-remediate (TTR) down. Today, a significant network capacity is out of service at any given time. The operational workload is capacity related and heavily manual and we are building the team and automation to change that.
We operate across Seattle, Dublin, Hyderabad, and Sydney for follow-the-sun coverage. CRT acts as the connective layer between Engineering, Operations, Tooling, and Software teams.
About Aws
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn't followed a traditional path, or includes alternative experiences, don't let it stop you from applying.
Why AWS
Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that's why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there's nothing we can't achieve in the cloud.
Inclusive Team Culture
Here at AWS, it's in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness.
Mentorship and Career growth
We're continuously raising our performance bar as we strive to become Earth's Best Employer. That's why you'll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
Basic Qualifications
- Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent work experience
- Experience working in a Linux/Unix environment
- - 5+ years of experience in network development engineering, software engineering for networking, or network automation.
- - Proficiency in Python or another scripting/programming language for building network automation tools and services.
- - Experience with routing protocols (BGP, OSPF, IS-IS) in large-scale network environments.
- - Experience designing and implementing automation for network operations at scale.
- - Experience with workflow orchestration systems and service-oriented architectures.
Preferred Qualifications
- Knowledge of network hardware and packet forwarding architectures
- Experience working in a large-scale networking environment
- Experience building end-to-end network automation (link lifecycle management, device provisioning, health validation).
- Experience with network design and selecting platforms for implementation at hyperscale.
- Experience with workflow orchestration engines and event-driven architectures.
- Experience with distributed systems concepts applied to networking at scale.
- Experience building automation for large-scale networks with no scheduled downtime.
- Familiarity with deployment automation tools, link testing services, and operational event monitoring systems.
- Experience with data analysis and mining for operational insights.
- CCNP, JNCIP, or equivalent networking certification.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.
Company - Amazon Dev Center India - Hyderabad
Job ID: A10426734