Search by job, company or skills

Pumex Computing, LLC

Senior DevOps Engineer (Multi-Stack & LLMOps)

Fresher
new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Senior DevOps Engineer (Multi-Stack & LLMOps) - India (Remote/Hybrid)

We are hiring a versatile Senior DevOps Engineer to own automation, deployment, and infrastructure operations across a diverse application ecosystem. This is not a single-stack role, you will support legacy PHP environments, modern Node.js/React applications, high-performance .NET services, and an expanding set of GenAI/LLM-powered features.

The ideal candidate is a polyglot infrastructure engineer who is comfortable operating in both AWS and Azure, and who treats reliability, security, and cost controls as first-class production requirements across traditional web workloads and AI workloads.

Location: India (Remote/Hybrid, depending on city/team needs)

Work Hours Requirement: Must be able to work overlapping hours through 2:30 PM Eastern Time (EST/ET).

What You'll Do (Key Responsibilities)

Multi-Stack CI/CD

  • Design, build, and maintain robust CI/CD pipelines for .NET Core, Node.js (React/Express), and PHP (Laravel/Symfony) using GitHub Actions, Azure DevOps, and/or GitLab CI.
  • Standardize build, test, security scanning, and release workflows across multiple product lines.

Infrastructure as Code (Hybrid Cloud)

  • Manage a hybrid footprint across AWS and Azure using Terraform or Pulumi, ensuring consistent, repeatable environments (dev/stage/prod).
  • Improve provisioning speed, environment parity, and drift detection.

Container Orchestration

  • Operate production-grade Kubernetes environments (EKS/AKS) including scaling, upgrades, networking, and cluster security.
  • Optimize compute for both standard web traffic and AI workloads, including scheduling and capacity planning for resource-intensive services.

LLMOps / GenAI Platform Operations (Highly Desired)

  • Build and operate the plumbing for GenAI initiatives, including model-serving stacks and integrations.
  • Deploy and manage model-serving containers (e.g., vLLM, Ollama) and support vector database infrastructure (e.g., Pinecone, Milvus).
  • Implement operational controls such as:
  • Prompt versioning and lifecycle management (repo-driven workflows, approvals, rollback)
  • Model switching/routing (by cost, latency, quality, and availability across providers like OpenAI/Anthropic and/or self-hosted)
  • Token/usage monitoring, rate-limit governance, and spend controls with cost attribution (by environment, feature, and tenant)
  • Evals/regression testing to catch prompt/model degradation before production impact

Observability & Reliability Engineering

  • Implement end-to-end observability for services and pipelines (metrics, logs, traces) using tools such as OpenTelemetry, Grafana, Datadog, or New Relic.
  • Build alerting and runbooks; participate in incident response, root-cause analysis, and reliability improvements.

Database & Data Platform Support

  • Support a range of data needs across relational systems (SQL Server, MySQL, PostgreSQL) and modern stores including NoSQL and vector databases.
  • Assist with backup/restore strategies, performance tuning basics, and production readiness.

Security & Compliance

  • Implement consistent security controls across CI/CD (SAST/DAST, dependency scanning, container scanning).
  • Manage secrets and key material with AWS Secrets Manager and/or Azure Key Vault.
  • Enforce least-privilege IAM/RBAC patterns across cloud and Kubernetes.

Twilio / Real-Time Communications (Nice-to-Have)

  • Support production usage of Twilio Voice/Video/SMS including secure webhook configuration, operational monitoring, and reliability concerns for real-time workflows.

Required Qualifications (Must Have)

Cloud & Platform Engineering

  • Hands-on experience in both AWS and Azure, including compute, networking, identity, managed services, and deployment patterns.

Application Delivery Across Multiple Stacks

  • Proven experience deploying and scaling:
  • .NET/C# (IIS, Kestrel, Azure App Service)
  • Node.js/React (Nginx, PM2, S3/CloudFront or equivalent)
  • PHP (FPM, Apache/Nginx, Composer)

Automation & Orchestration

  • Strong production experience with Docker and Kubernetes (required).

Infrastructure as Code

  • Strong experience with Terraform or Pulumi in real-world production environments.

CI/CD & Developer Enablement

  • Deep experience with CI/CD tooling (GitHub Actions/Azure DevOps/GitLab CI) and the ability to improve developer velocity safely.

Operational Excellence

  • Strong troubleshooting skills across application, infrastructure, networking, and Kubernetes layers.
  • Experience supporting production systems, on-call rotation, and incident response.

Highly Desired (LLMOps / GenAI Operations)

  • Prompt lifecycle management: prompt repositories, versioning, templating, approvals, and rollback.
  • Model operations:model switching/routing across OpenAI/Anthropic/self-hosted options, with gateway/proxy patterns and policy enforcement.
  • Usage & cost governance:token monitoring, per-tenant attribution, budget alerts, and rate-limit controls.
  • Quality workflows: eval harnesses, regression testing for prompts/models, A/B testing, safe rollout strategies.
  • Vector + retrieval operations: operating Pinecone/Milvus and supporting retrieval pipelines.

Nice-to-Have

  • Twilio Voice/Video/SMS in production (webhooks, auth, monitoring, incident response).
  • GPU scheduling/optimization experience in Kubernetes.
  • Experience with service mesh, policy-as-code, or advanced cluster security (OPA/Gatekeeper, Kyverno).

What Success Looks Like

  • Stable, repeatable releases across multiple stacks with minimal manual work.
  • Clean infrastructure workflows with low drift and fast environment provisioning.
  • Reliable Kubernetes operations with strong security posture and observability.
  • Production-grade GenAI ops: prompt versioning, model switching, token/cost monitoring, and quality guardrails.

Work Authorization / Schedule

  • Role is based in India.
  • Candidate must be able to work overlapping hours through 2:30 PM Eastern Time (ET).

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 138934657