Search by job, company or skills

  • Posted 12 days ago
  • Over 50 applicants
Quick Apply

Job Description

Job Description:Role summary

Senior Cloud/Platform Engineer on Oracle Cloud Infrastructure (OCI) focused on secure, reliable delivery of AI/ML and LLM workloads. Own IaC/GitOps, Kubernetes platform, MLOps/LLM serving, service mesh progressive delivery, and production SLOs.

Experienced candidates with 10+ years DevOps Engineer experience butwho have working experience implement AI solutions (Agentic AI, MCP, Python, Machine Learning (ML)) or have experience implementing Service Mesh/Istio specifically around blue/green deployments, or extensive observability experience expert level for example in implementing/confuring Grafana/Prometheus.

Must have qualifications

OCI platform (4+ years overall cloud, 2+ years hands on OCI): OKE, OCIR, OCI API Gateway/WAF, Vault, Logging/Monitoring/Alarms, Identity Domains/IAM, VCN/NSGs.

IaC/GitOps (3+ years): Terraform (OCI provider), Helm/Kustomize; Git based workflows; CI/CD with Jenkins or GitHub Actions; artifact/version promotion across envs.

Kubernetes at scale (3+ years): cluster/node pool design, autoscaling, upgrade strategy, RBAC, network policies, Ingress/Gateway controllers, secrets management.

Linux and networking: solid Linux admin (SELinux bonus), TCP/HTTP, TLS/mTLS, DNS, load balancing; container image hardening and SBOM awareness.

Programming/automation: proficient in Python and Bash; working knowledge of Terraform HCL and at least one of Go/Ansible. Comfortable writing reusable modules and pipelines. SQL basics for troubleshooting/data checks.

Oracle Database integration: connectivity patterns (ATP/ADW), Wallets, connection pooling, secrets rotation, and performance aware app connectivity.

Observability and SLOs: Prometheus/Grafana or OCI Monitoring, OpenTelemetry traces, logs/metrics/traces correlation, alerting on latency/error budgets/capacity.

Security and compliance: mTLS, least privilege IAM, KMS/Vault for secrets, audit trails, change management.

Service mesh and progressive delivery: Istio (or OCI Service Mesh) traffic policies, retries/timeouts/circuit breakers, and hands on blue green, canary, and A/B testing.

Communication and teamwork: clear written runbooks/Diagrams, ability to drive incident/postmortem processes.

AI/ML and LLM delivery (required exposure)

LLM/RAG fundamentals: retrieval patterns, vector search integration, prompt/config management, guardrails/safety filters, offline/online evaluations.

MCP (Model Context Protocol): concepts (tools/resources), building and operating MCP servers on Kubernetes; secure tool/resource exposure, auditability, and RAG via MCP resources.

Vector databases/indices: pgvector, OpenSearch/Elastic, Milvus, Pinecone (or equivalent); hybrid search patterns and embedding pipelines.

Certifications: OCI Architect Professional strongly preferred; plus one of (CKA/CKS), and AI/ML or Data Science professional certifications.

Key responsibilities

Design, build, and operate OCI based Kubernetes platforms for AI/ML/LLM services with strong security, observability, and reliability.

Implement and manage IaC/GitOps for repeatable environments, model/inference deployments, and traffic policies.

Enable progressive delivery (blue green/canary/A B) with metric gated rollouts and fast rollback.

Stand up and optimize LLM serving stacks, vector search, and RAG pipelines; enforce guardrails and monitor quality/cost SLOs.

Integrate Oracle Databases and OCI services securely; manage secrets, credentials, and network segmentation.

Establish SLOs, dashboards, runbooks, and incident/DR procedures; lead operational readiness reviews and postmortems.

Additional Details:

Work mode: WFO

Work type: Contract

Work location: Bangalore

About Company

HighPoints Technologies India Private Limited

Job ID: 143865493

User Avatar
0 Active Jobs

Similar Jobs