Search by job, company or skills

G

Staff Software Development Engineer

2-5 Years
Save
new job description bg glownew job description bg glow
  • Posted 2 days ago
  • Over 50 applicants
Quick Apply

Job Description

Position Summary

We are seeking a highly skilled Site Reliability Engineer (SRE) / DevOps Engineer to join our infrastructure team. You will be responsible for designing, building, and maintaining resilient, scalable, and secure infrastructure in cloud-native environments. This role will involve close collaboration with development, QA, and security teams to automate operations, streamline deployments, and drive best practices in observability, security, and performance.

Key Responsibilities

  • Design, implement, and manage cloud infrastructure (GCP/AWS/Azure) using Infrastructure as Code (Terraform)
  • Build, maintain, and optimize CI/CD pipelines with tools such as GitLab CI, CircleCI, ArgoCD
  • Ensure high availability and performance of applications running on Kubernetes (GKE/EKS/AKS) and container orchestration tools
  • Implement observability solutions using Prometheus, Grafana, ELK, and other monitoring/logging tools
  • Work with development teams to enhance application performance and deployment workflows
  • Automate and manage IAM, RBAC, network policies, and vulnerability scanning
  • Participate in incident management, root cause analysis, and postmortem processes
  • Continuously improve infrastructure reliability and reduce manual operational efforts (toil)

Basic Qualifications

  • Strong knowledge of Linux system administration
  • Proficiency in scripting languages such as Python, Bash, or Go
  • Solid hands-on experience with cloud platforms (GCP preferred; AWS or Azure acceptable)
  • Proficient in Kubernetes operations, including Helm charts, service meshes, and operators
  • Experience with Terraform and Infrastructure as Code best practices
  • Experience building and maintaining CI/CD pipelines (e.g., GitLab CI, CircleCI, ArgoCD)
  • Familiarity with observability tools (Prometheus, Grafana, ELK, etc.)
  • Good understanding of networking concepts: TCP/IP, DNS, Load Balancing, Firewalls

Preferred Qualifications

  • Experience with advanced networking and service meshes (e.g., Istio)
  • Familiarity with SRE principles: SLOs, SLIs, error budgets
  • Exposure to multi-cluster or hybrid-cloud infrastructure setups
  • Experience with incident response and post-incident review processes

Key Skills (Comma-Separated)

Site Reliability Engineering, DevOps, GCP, AWS, Azure, Terraform, CI/CD, GitLab CI, CircleCI, ArgoCD, Kubernetes, GKE, EKS, AKS, Helm, Prometheus, Grafana, ELK, Python, Bash, Go, IAM, RBAC, Network Policies, Service Mesh, Istio, TCP/IP, DNS, Load Balancers, Firewalls, Monitoring, Logging, Error Budgets, SLOs, SLIs, Incident Management

More Info

About Company

Gruve was founded on the premise that new technologies in Machine Learning, Data Sciences, Artificial Intelligence, and Software Development are transforming Enterprise Services. Our goal is to harness these advancements to deliver services with superior efficiency and tangible outcomes.

Job ID: 122884789

Similar Jobs

Pune, India

Skills:

SqlDesign PatternsMicroservicesAlgorithmsData structuresAWSNosqlKubernetesAzureGcpCloud debuggingGolang programmingSoftware engineering best practices