Role Overview:
SourceFuse Technologies is currently seeking Managed Services L2 Support Engineer/Senior Engineer to join its India team for supporting one of its clients.
Key Responsibilities:
- As a Service desk agent, you will be required to perform basic troubleshooting, collect all pertaining details from the customer's issue or request, set correct expectations, and manage the creation and handling of support tickets. It will be a work from office in a 24x7 rotational shift pattern.
- Continuous monitoring of Applications via monitoring tool deployed in the Infrastructure & observe over the monitoring dashboard.
- Incident Management & Triage:
- Act as the first point of contact for all production incidents, alerts, and user-reported issues related to the Plan & Build [Site Manager, Procurement Manager, and Facility Manager application].
- Proactively monitor application performance, infrastructure health, and system alerts using monitoring tools.
- Perform initial diagnosis and root cause analysis (RCA) to quickly identify the source and scope of issues within a microservices architecture.
- Categorize, prioritize, and escalate incidents to appropriate internal teams (e.g., Network, Security, Firewall, Cloud, Infrastructure, Application Development, User Management) in a timely manner.
- Troubleshooting & Resolution:
- Collaborate effectively with various technical teams to drive incident resolution, acting as a central coordinator.
- Skillfully diagnose and troubleshoot a wide range of customer issues, from basic inquiries to more technical challenges observed within the system framework.
- Utilize logs, monitoring dashboards, and diagnostic tools to pinpoint issues across different microservices and underlying infrastructure components.
- Document troubleshooting steps, findings, and resolutions accurately for knowledge base articles and future reference.
- Communication & Stakeholder Management:
- Provide timely, clear, and professional communication to internal stakeholders and end- users regarding incident status, expected resolution times, and post-incident reports.
- Prepare and deliver comprehensive Root Cause Analysis (RCA) reports for critical incidents, outlining the problem, impact, resolution, and preventative measures.
- Act as a crucial bridge between engineering/technical teams and product/business teams, translating technical details into understandable business impacts and vice-versa.
- SLA Adherence & Performance:
- Ensure all incidents are handled within agreed-upon Service Level Agreements (SLAs) for response, resolution, and communication.
- Contribute to the continuous improvement of incident management processes and tooling.
- Knowledge Management & Process Improvement:
- Develop and maintain comprehensive knowledge base articles, runbooks, and troubleshooting guides.
- Identify recurring issues and collaborate with engineering teams to implement permanent solutions and improve system resilience.
- Participate in post-incident reviews to identify lessons learned and implement corrective actions.
Qualifications:
- Bachelor's degree in Electronics and Comm, Information Technology, or a related field.
- Strong domain knowledge in Microservices Applications
- 3+ years of experience in technical support, operations, or SRE/L2 role, preferably supporting enterprise-level applications.
- Ability to differentiate between application & platform issues and proven track record of taking them to closure.
- Strong interpersonal skills and the ability to collaborate effectively with cross-functional teams.
Skills & Abilities:
- Experience in containerization technologies: Kubernetes/Docker swarm/Mesos-Marathon/Cloud Foundry
- Experience in RDBMS like Oracle, MySQL, Sybase etc.
- Basic understanding of networking concepts (TCP/IP, DNS, Load Balancers, Firewalls) and security principles
- Proficiency in monitoring tools (e.g., Prometheus, Grafana) for application and infrastructure monitoring.
- Familiarity with cloud computing concepts (e.g., AWS, Azure, GCP) and data center environments.
- Collaboration: Ability to work in a team-oriented environment and effectively communicate with both technical and non-technical stakeholders.
Preferred
- Experienced in
- AWS platform or certified in AWS(Solution Architect/SysOps)
- Lambda, API Gateways, Kinesis, ElasticSearch, ElasticCache, Dynamo DB, Athena
- Linux
- NoSQL Database(Dynamo DB preferred)
- Trouble ticketing tools(Jira Software & Jira Service Desk preferred)
- Hands on experience on New Relic and AWS Cloudwatch tools
- ITIL certification