Search by job, company or skills

  • Posted 6 hours ago
  • Be among the first 20 applicants
Early Applicant
Quick Apply

Job Description

Senior DevOps & Site Reliability Engineer (DevOps + SRE)

About the Role

We are seeking a highly experiencedSenior DevOps & Site Reliability Engineerto support and scale our cloud-native, containerized IoT platform built on AWS. You will work closely with the Technical Manager to automate infrastructure, build CI/CD pipelines, manage large-scale deployments, and ensure the platform's reliability, security, and performance.

This role requires deep hands-on expertise inAWS, Docker/Kubernetes, serverless workflows, infrastructure automation, scripting (Python), and IoT-scale distributed systems reliability.

Key Responsibilities

DevOps Responsibilities

Design, implement, and maintainCI/CD pipelinesusing GitHub Actions, AWS CodePipeline, or GitLab CI.

Develop and automate deployment workflows followingDevOps strategy and best practices.

ManageDocker containerization, including multi-stage builds, optimization, and image security.

Orchestrate containers usingKubernetes (EKS)or AWSECS(Fargate/EC2).

Manage and optimizeECRfor image storage and versioning.

Implement Infrastructure-as-Code usingAWS CDK, Terraform, or CloudFormation.

Build automated workflows for backend, microservices, and IoT services deployment.

Supportserverless architecturesusing AWS Lambda, Step Functions, EventBridge, etc.

Implement secure secrets management using AWS IAM, KMS, and Secrets Manager.

Handle configuration, environment management, and zero-downtime deployment strategies.

Site Reliability Engineering (SRE) Responsibilities

Build and maintainmonitoring, logging, tracingpipelines using CloudWatch, Grafana, Prometheus, X-Ray, and OpenTelemetry.

Define and implementSLIs, SLOs, error budgets, and reliability dashboards.

Ensure high availability, resilience, and performance of all systems under production.

Conduct incident management, root cause analysis, and post-incident reviews.

Optimize cost, compute utilization, autoscaling policies, and failover strategies.

Implement cloud reliability patternscircuit breaker, retries, throttling, canary and blue-green deployments.

Manage production readiness, release safety, and operational excellence.

Required Skills & Qualifications

7+ yearsof experience in DevOps, SRE, or Cloud Infrastructure roles.

Deep hands-on experience with:

o Docker containerization & orchestration

o Kubernetes (EKS)and/orAWS ECS

o AWS ECR(image lifecycle management)

o AWS IoT Core, Lambda, API Gateway, VPC, S3, IAM, CloudWatch

Strong scripting experience Python expertise preferred(Bash is a plus).

Proficiency withGitHubfor code management, automation, and CI/CD workflows.

Strong background inInfrastructure-as-Code: AWS CDK, Terraform, or CloudFormation.

Experience with reliability engineering frameworks, large-scale distributed systems, and HA/DR design.

Knowledge ofserverless computingand event-driven architectures.

Strong understanding of cloud security, identity management, and compliance.

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

Job ID: 137687667

Similar Jobs