Job Summary:
The Service Operations Engineer is responsible for monitoring, maintaining, and supporting IT services and infrastructure to ensure high availability and performance. This role acts as the first line of defense for service incidents, working closely with cross-functional teams to resolve issues, implement improvements, and maintain operational excellence.
Key Responsibilities:
- Monitor IT systems and services to ensure uptime, performance, and reliability
- Respond to incidents, service requests, and alerts in a timely manner
- Perform root cause analysis and implement permanent fixes for recurring issues
- Collaborate with engineering, development, and support teams to resolve technical problems
- Document incidents, resolutions, and operational procedures
- Participate in change management processes and deployment activities
- Maintain system logs, metrics, and dashboards for visibility and reporting
- Identify opportunities for automation and process improvements
- Support business continuity and disaster recovery activities
- Ensure compliance with internal policies, security standards, and SLAs
Required Skills and Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or related field
- 2+ years of experience in IT operations, service desk, or system administration roles
- Familiarity with monitoring tools, ticketing systems, and ITSM frameworks
- Experience with Linux or Windows server environments
- Basic knowledge of cloud platforms such as AWS, Azure, or GCP
- Understanding of networking, system health, and incident response procedures
- Strong troubleshooting, communication, and documentation skills
Preferred Qualifications:
- ITIL Foundation certification
- Experience with scripting tools like Bash, PowerShell, or Python
- Exposure to CI/CD tools and DevOps environments
- Familiarity with tools like Splunk, Nagios, ServiceNow, PagerDuty, or similar
- Understanding of compliance standards (e.g., ISO 27001, SOC 2)