
Search by job, company or skills
Job Description:Role summary
Senior Cloud/Platform Engineer on Oracle Cloud Infrastructure (OCI) focused on secure, reliable delivery of AI/ML and LLM workloads. Own IaC/GitOps, Kubernetes platform, MLOps/LLM serving, service mesh progressive delivery, and production SLOs.
Experienced candidates with 10+ years DevOps Engineer experience butwho have working experience implement AI solutions (Agentic AI, MCP, Python, Machine Learning (ML)) or have experience implementing Service Mesh/Istio specifically around blue/green deployments, or extensive observability experience expert level for example in implementing/confuring Grafana/Prometheus.
Must have qualifications
OCI platform (4+ years overall cloud, 2+ years hands on OCI): OKE, OCIR, OCI API Gateway/WAF, Vault, Logging/Monitoring/Alarms, Identity Domains/IAM, VCN/NSGs.
IaC/GitOps (3+ years): Terraform (OCI provider), Helm/Kustomize; Git based workflows; CI/CD with Jenkins or GitHub Actions; artifact/version promotion across envs.
Kubernetes at scale (3+ years): cluster/node pool design, autoscaling, upgrade strategy, RBAC, network policies, Ingress/Gateway controllers, secrets management.
Linux and networking: solid Linux admin (SELinux bonus), TCP/HTTP, TLS/mTLS, DNS, load balancing; container image hardening and SBOM awareness.
Programming/automation: proficient in Python and Bash; working knowledge of Terraform HCL and at least one of Go/Ansible. Comfortable writing reusable modules and pipelines. SQL basics for troubleshooting/data checks.
Oracle Database integration: connectivity patterns (ATP/ADW), Wallets, connection pooling, secrets rotation, and performance aware app connectivity.
Observability and SLOs: Prometheus/Grafana or OCI Monitoring, OpenTelemetry traces, logs/metrics/traces correlation, alerting on latency/error budgets/capacity.
Security and compliance: mTLS, least privilege IAM, KMS/Vault for secrets, audit trails, change management.
Service mesh and progressive delivery: Istio (or OCI Service Mesh) traffic policies, retries/timeouts/circuit breakers, and hands on blue green, canary, and A/B testing.
Communication and teamwork: clear written runbooks/Diagrams, ability to drive incident/postmortem processes.
AI/ML and LLM delivery (required exposure)
LLM/RAG fundamentals: retrieval patterns, vector search integration, prompt/config management, guardrails/safety filters, offline/online evaluations.
MCP (Model Context Protocol): concepts (tools/resources), building and operating MCP servers on Kubernetes; secure tool/resource exposure, auditability, and RAG via MCP resources.
Vector databases/indices: pgvector, OpenSearch/Elastic, Milvus, Pinecone (or equivalent); hybrid search patterns and embedding pipelines.
Certifications: OCI Architect Professional strongly preferred; plus one of (CKA/CKS), and AI/ML or Data Science professional certifications.
Key responsibilities
Design, build, and operate OCI based Kubernetes platforms for AI/ML/LLM services with strong security, observability, and reliability.
Implement and manage IaC/GitOps for repeatable environments, model/inference deployments, and traffic policies.
Enable progressive delivery (blue green/canary/A B) with metric gated rollouts and fast rollback.
Stand up and optimize LLM serving stacks, vector search, and RAG pipelines; enforce guardrails and monitor quality/cost SLOs.
Integrate Oracle Databases and OCI services securely; manage secrets, credentials, and network segmentation.
Establish SLOs, dashboards, runbooks, and incident/DR procedures; lead operational readiness reviews and postmortems.
Additional Details:
Work mode: WFO
Work type: Contract
Work location: Bangalore
HighPoints Technologies India Private Limited
Job ID: 143865493