
Search by job, company or skills
Company: Kensaltensi Powered by Alkimi
Location:Bangalore, Karnataka, India
Employment Type: Full-time
Work Schedule: Rotational Shifts (24x7 Coverage including weekends)
Immediate joiners preferred.
The maximum compensation for this role is 7 LPA.
Please apply only if this aligns with your expectations.
About the Role
We are seeking two experienced Production Support Engineers to join our operations team. These critical positions will be responsible for monitoring and supporting our production infrastructure round the clock, ensuring optimal system performance, availability, and incident resolution. You will work closely with our development teams who build applications using Java, Node.js, and Python backends, along with Node.js frontend applications. The ideal candidates will have strong DevOps backgrounds and the ability to handle production issues independently while collaborating effectively with development teams for root cause analysis and incident resolution.
Key Responsibilities
Production Monitoring & Support
Monitor production systems 24x7 using our observability stack (Prometheus, Grafana, Loki)
Provide first-line support for all production incidents and alerts
Ensure system availability and performance across all production environments
Track and respond to alerts within defined SLAs
Perform health checks and routine maintenance tasks
Incident Management & Resolution
Provide immediate response to production incidents and outages
Collaborate with Java, Node.js, and Python development teams to perform root cause analysis
Debug application-level issues across backend services and frontend applications
Implement workarounds and fixes for production issues
Escalate complex issues to development teams or senior engineers when necessary
Document incident resolutions and update knowledge base Application Support
Support Java, Node.js, and Python applications in production
Analyze application logs and metrics to identify issues
Perform application restarts, deployments, and rollbacks as needed
Monitor batch jobs and scheduled tasks
Assist with production deployments during maintenance windows
Collaboration & Communication
Work closely with Java, Node.js, and Python development teams to identify root causes of issues
Collaborate with backend and frontend teams for end-to-end troubleshooting
Provide clear and timely updates on production issues to stakeholders
Participate in handover meetings during shift changes
Maintain detailed documentation of incidents and resolutions
Bridge the gap between operations and development teams during incident resolution
Required Qualifications - Technical Skills
Experience 3-5 years in Production Support/DevOps/SRE roles
Monitoring Tools: Hands-on experience with Prometheus, Grafana, and log aggregation systems (Loki preferred)
Programming Languages: Working knowledge of Java, Node.js, and Python to effectively troubleshoot backend services
Frontend Technologies: Basic understanding of Node.js frontend applications for end-to-end debugging
Cloud Platforms:Experience with cloud infrastructure (AWS/GCP/Azure)
Container Orchestration:Working knowledge of Kubernetes and Docker for application support
Scripting:Proficiency in Python, Bash, or similar scripting languages for automation
CI/CD: Familiarity with Jenkins, GitHub Actions for deployment support
Databases: Experience with database operations, queries, and basic troubleshooting
Application Servers: Experience with application server management and troubleshooting
Soft Skills
Strong problem-solving abilities with methodical troubleshooting approach
Excellent communication skills for incident reporting and stakeholder updates
Ability to work under pressure during production outages
Customer service mindset with focus on resolution times
Strong documentation skills for knowledge base management
Team player with collaborative mindset
Ability to prioritize multiple incidents based on business impact
Patience and persistence in resolving complex production issues
Preferred Qualifications
Experience in 24x7 production support environments
ITIL certification or knowledge of ITIL processes
Experience debugging Java, Node.js, and Python applications in production
Familiarity with Java application servers and JVM troubleshooting
Experience with Node.js performance monitoring and debugging
Knowledge of message queuing systems (Kafka, RabbitMQ)
Experience with database administration (MySQL, PostgreSQL, ClickHouse)
Experience with ticketing systems (ServiceNow, JIRA)
Understanding of change management and release processes
Job ID: 136865433