Senior DevOps Engineer

Kensaltensi

Bengaluru, India

7-9 Years

Save

Posted 5 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Company Description

Alkimi Exchange is a decentralised programmatic ad exchange restoring the value exchange between advertisers, publishers, and users. Our custom blockchain infrastructure on Sui delivers a fast, scalable, and transparent solution with 0% fraud and low transaction fees. We process 25,00030,000 queries per second (QPS) with sub-100ms latency requirements for real-time bidding auctions.

Learn more at www.alkimi.org.

About This Role

This is not a typical Senior DevOps position. We're hiring a senior engineer who is ready to grow into a DevOps Architect within 612 months and we'll actively build the conditions for that transition.

You'll join as a hands-on senior individual contributor with immediate ownership of our cloud and on-premise infrastructure, while simultaneously shadowing and co-designing the architectural and people decisions that will define Alkimi's next stage of scale. By the time you're ready to step up, the title and responsibility will already be yours in practice.

If you've been the most capable person in your current team but haven't had the platform to prove it at an architectural or leadership level, this is your opportunity.

What You'll Own From Day One

Platform reliability Maintain 99% uptime and meet SLAs across all environments; own incident response, on-call rotation, and post-mortems
Cost optimisation Drive a 2030% reduction in infrastructure spend through architectural improvements, rightsizing, and automation
High-throughput infrastructure Design and operate deployment architecture for 25,00030,000 QPS systems with sub-100ms latency requirements
Multi-cloud IaC Manage AWS, DigitalOcean, and GCP environments using Terraform, Terragrunt, and Ansible
CI/CD & automation Build and maintain pipelines, monitoring systems, and automation across distributed microservices
Data systems health Troubleshoot and tune Kafka, RabbitMQ, ClickHouse, Elasticsearch, and MySQL in production
Application performance Diagnose and resolve Node.js, Python, and Java/Spring Boot application bottlenecks under load
Security & compliance Implement and uphold best practices across OAuth, OIDC, SSO, disaster recovery, and data privacy

What You'll Be Building Toward (Architect Scope)

Team architecture Input into hiring, mentoring, and structuring a growing infrastructure engineering team; transition into formal people leadership as the team scales
Technical vision Co-own the infrastructure roadmap with the CTO; define standards, patterns, and guardrails that the whole engineering org follows
Cross-functional influence Act as the DevOps voice in product and engineering planning cycles; translate business requirements into scalable infrastructure decisions
Architectural governance Lead the design and review process for new systems, integrations, and migrations; own the DR and business continuity strategy
Organisational maturity Drive improvements in engineering culture around reliability, observability, and operational excellence (SLOs, error budgets, runbooks)

Required Skills & Experience

7+ years in DevOps or Infrastructure Engineering roles, including 2+ years operating high-throughput systems (10,000+ QPS). You should be at or near the ceiling of your current role and ready to operate at the next level.

Infrastructure & Cloud

Production-grade Infrastructure as Code with Terraform, Terragrunt, and Ansible
Kubernetes and Docker at scale across complex microservices architectures
Deep AWS expertise (VPC, EC2, ECS, Fargate, S3, Glacier, RDS, Route 53, CloudFront, Lambda, API Gateway, CloudWatch); DigitalOcean, Azure, or GCP experience also valued
Advanced Linux system administration (RHEL, Ubuntu, Amazon Linux) and networking

Data & Messaging Systems

Kafka consumer/producer optimisation, lag management, high-volume tuning
RabbitMQ cluster management, message routing, Kubernetes failure debugging
ClickHouse - production operations and query optimisation at billions-of-records scale (or similar columnar/time-series database experience)
MySQL administration, replication, and backup/recovery
Elasticsearch cluster health management and bulk indexing optimisation

Development & CI/CD

CI/CD tooling: GitHub Actions, Jenkins, GitLab CI, ArgoCD (GitOps preferred)]
Python (required), Shell scripting (required); Rust or Go strongly preferred
JVM profiling, GC tuning, memory leak detection in Java/Spring Boot environments
Strong understanding of microservices architectures and API design patterns
Comfortable operating within agile and rapid-iteration engineering cultures

Observability & Incident Management

Prometheus, Grafana, and the ELK stack (Elasticsearch, Logstash, Kibana, Filebeat)
Systematic debugging under production load CPU, memory, network, latency
RED and USE metrics methodology; SLO/SLA definition and error budget management
Structured post-mortem culture and preventive engineering mindset

Leadership & Communication (Critical for Progression)

Demonstrated ability to influence engineering decisions without formal authority
Experience mentoring junior or mid-level engineers
Ability to communicate infrastructure trade-offs clearly to non-technical stakeholders
A track record of driving projects end-to-end, including stakeholder alignment
Blockchain infrastructure experience/knowledge is added advantage (Optional)