Search by job, company or skills

blessing softtech

DevOps Engineer - Real-Time Voice Platform (Docker, Terraform, AWS/GCP)

3-5 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 20 hours ago
  • Be among the first 30 applicants
Early Applicant

Job Description

Senior DevOps Engineer — Docker, Terraform, Real-Time Infra Employment type: Full-time Seniority level: Mid-Senior Workplace: In person/ Hybrid (India preferred, IST hours)

Industry: Software Development

Job function: Engineering, Information Technology

About the role

You will be working on Docker, CI/CD, IaC, monitoring, cloud deployment, secrets management.

Just a Python FastAPI app that spawns Node.js subprocesses, juggles WebRTC signaling and WebSocket audio, and handles long-lived real-time voice sessions.

Your job: make the software deployable, observable, and reliable - from scratch. If the phrase connection draining for WebRTC during rolling deploys makes you smile instead of wince, we should talk.

What you'll do
  • Containerize the app with multi-stage Docker builds: Python 3.12 + Node.js 20 (for Pipedream MCP via npx) + FAISS CPU + audio deps
  • Build the CI/CD pipeline: ruff → pytest → integration → Docker build → registry → staging → smoke → manual gate → production
  • Write Terraform for the full cloud stack: ECS Fargate or Cloud Run, ALB with WebSocket upgrade, managed Redis, managed Postgres, S3/GCS, CDN, DNS, ACM
  • Build the observability stack: structured JSON logs, Prometheus metrics (call latency, LLM TTFB, tool execution, concurrent connections), Grafana dashboards, PagerDuty alerts with runbooks
  • Migrate secrets from .env to AWS Secrets Manager / Vault, with key rotation and per-tenant credential storage
  • Configure networking: TLS, WebSocket upgrade through ALB, CORS, infra-level rate limiting, DDoS protection, VPC with private subnets
  • Build load testing (k6 or Locust) simulating concurrent voice calls, chat, and MCP tool invocations
  • Write operational runbooks: incident response, DR, rollback, on-call rotation, post-incident reviews

Tech you'll work with

Docker · docker-compose · GitHub Actions · Terraform · AWS (ECS Fargate, ALB, ElastiCache, RDS, S3, CloudFront, Route53, ACM, Secrets Manager, CloudWatch) or GCP equivalents · Prometheus · Grafana · k6 / Locust · Python 3.12 · Node.js 20 · WebSocket · WebRTC

What you bring
  • 3+ years in DevOps / SRE / Platform Engineering
  • Strong Docker: multi-stage, multi-runtime, security hardening
  • Production CI/CD pipeline design (GitHub Actions, GitLab CI, or Jenkins)
  • Terraform on real infrastructure (not tutorial-scale)
  • AWS or GCP at production scale
  • WebSocket / real-time app deployment: sticky sessions, connection draining, stateful health checks
  • Prometheus / Grafana or equivalent observability stack
  • Strong Linux and networking fundamentals

Nice to have
  • Load testing experience (k6, Locust)
  • Python app deployment (uv/pip, FastAPI, Uvicorn)
  • WebRTC operational experience

Skills
  • Docker · Terraform · AWS · GCP · CI/CD · GitHub Actions · Kubernetes · Prometheus · Grafana · Infrastructure as Code · DevOps · Site Reliability Engineering (SRE) · WebSocket · WebRTC · Linux · Networking · Observability · ECS · Fargate · PostgreSQL · Redis

Data: SQL / NoSQL databases, message queues

Observability: CloudWatch, logging, and metrics

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145802785