Search by job, company or skills

EMERGEnT

Software Engineer - Infrastructure

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 21 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Platform And Infrastructure

The candidate will have responsibilities across the following functions:

  • Maintain stability of our platform consisting of distributed microservices closely interacting with Kubernetes and cloud providers (GCP, AWS).
  • Manage Kubernetes workloads with ArgoCD (GitOps) deploy, monitor, and troubleshoot application syncs, resource trees, and rollouts.
  • Debug and resolve complex Kubernetes issues across clusters.
  • Manage CDN and edge infrastructure (Cloudflare) for performance, caching, and traffic management.
  • Automate infrastructure lifecycle operations and workflows.

Observability And Incident Response

  • Own the observability stack: Grafana (dashboards, Loki logs, Prometheus metrics), New Relic (APM, golden metrics, transaction analysis).
  • Enhance monitoring, alerting, and distributed tracing across services.
  • Participate in on-call rotation via PagerDuty, handle incident response, and perform root cause analysis.
  • Proactively identify reliability risks before they become incidents.

AI Agent Infrastructure

  • Support the platform that runs AI agent workloads, job scheduling, trajectory tracking, environment provisioning, deployments and cost attribution.
  • Develop Kubernetes controllers and operators to extend platform capabilities for agent orchestration.

Collaboration and Internal Tooling

  • Work closely with product and backend teams to ensure platform scalability and reliability.
  • Build internal tools, automate workflows, and integrate systems to improve team productivity.
  • Stay current with Kubernetes releases, CNCF ecosystem updates, and cloud-native best practices.

Core Requirements

The core requirements for the job include the following:

  • 3+ years of software/platform engineering experience with production systems.
  • Strong proficiency in Go or Python, you write production code in at least one daily.
  • Hands-on experience building and deploying services on Kubernetes, not just YAML; you've developed something that runs on K8S.
  • Experience with GitOps tooling (ArgoCD, Flux, or similar).

Systems Fundamentals

  • Strong networking and DNS fundamentals, TCP/IP, HTTP, load balancing, DNS resolution, TLS, and debugging connectivity issues.
  • Solid Linux/OS fundamentals, process management, filesystem, memory, systemd, and comfortable debugging with tools like strace, tcpdump, and netstat.

Data And Messaging Infrastructure

  • Relational databases experience with PostgreSQL, MySQL, or similar; indexing, query optimisation, replication, and backup/restore procedures.
  • NoSQL databases familiarity with MongoDB, DynamoDB, Redis, or similar for document/key-value workloads.
  • Caching experience with Redis, Memcached, or similar for application and infrastructure-level caching.
  • Message queues and streaming hands-on with Kafka, SQS, RabbitMQ, or similar for event-driven architectures.
  • Strong SQL skills for debugging and operational queries.

Infrastructure And Observability

  • Comfortable with the CNCF ecosystem, Helm, Kustomize, cert-manager, Ingress controllers, CNI/CSI interfaces.
  • Hands-on with at least one observability stack (Grafana/Prometheus/Loki, New Relic, Datadog, or similar).
  • Familiarity with GCP and/or AWS managed Kubernetes (GKE/EKS), networking, IAM, storage, and cloud-native services (SES, SQS, S3 etc. )
  • Experience with CDN/edge platforms (Cloudflare, CloudFront, or similar).

Nice To Have

  • Experience building Kubernetes Operators (kubebuilder, operator-sdk, or controller-runtime).
  • Experience tuning Kubernetes core components (API server, kubelet, scheduler).
  • Familiarity with AI/LLM infrastructure, token management, cost tracking, and agent orchestration.
  • Experience with CI/CD pipelines (GitHub Actions, automated testing, deployment pipelines).
  • Infrastructure as Code experience (Terraform, Pulumi, or similar).
  • Previous work on large-scale distributed systems or platform-as-a-service.
  • Startup experience, you thrive in fast-paced, ambiguous environments.

Expectations

  • You're a generalist who can context-switch between debugging a K8S deployment, setting up a Grafana alert, and configuring CDN rules all in the same day.
  • You enjoy solving complex infrastructure challenges and automating away toil.
  • You dig deep when something breaks, you find the root cause, not just the workaround.
  • You communicate clearly and can collaborate effectively in a fast-moving, distributed team.

Tech Stack

  • We don't require previous experience with our entire stack, but enthusiasm for learning is key.
  • Go Python Kubernetes ArgoCD Helm GCP AWS Cloudflare Grafana Prometheus Loki New Relic PagerDuty PostgreSQL MongoDB Redis Kafka GitHub.

This job was posted by Akhil Girijan from Emergent.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147189231

Similar Jobs

Bengaluru, India

Skills:

ElkPrometheusKafkaGrafanaActivemqRabbitmqGcpDockerTerraformAnsibleAzureHelmPythonKubernetesAWSCoralogixGoOpenSearch

Bengaluru, India

Skills:

AlgorithmsJavaSoftware DesignStorageNetworksDistributed Systemsdata structuresPythoncompute technologiesGoaccessible technologieslarge-scale infrastructureHardware Architecture

Bengaluru, India

Skills:

NginxVMwareKvmPrometheusDnsTcp IpGrafanaHttpsDockerTerraformPure StorageCephTlsPythonRedhatJavaPowerShellOpenshiftHttpSSLF5AppdynamicsGcpAnsibleVsphereNetappSplunkAzureKubernetesHaproxyHyper-VMTLSDell EMCAVISticky SessionsHPE

Bengaluru

Skills:

JiraConfluenceCcnpJavaKotlinAWS

Bengaluru, India

Skills:

Test automationPythonMATLABTest FrameworksGo