Search by job, company or skills

Styli

Principal Engineer - DevOps

Save
new job description bg glownew job description bg glow
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role: Principal Engineer – DevOps

Location: Bangalore

Department: Platform Engineering / DevOps

Experience: 8–12 Years

About Styli Marketplace

Launched in 2019 by Landmark Group, Styli Marketplace is the first e-commerce venture of the group, quickly becoming a leading online destination for fashion and lifestyle across the GCC, including Saudi Arabia, the UAE, Kuwait, Bahrain, and beyond. Styli connects global sellers and creators with millions of fashion-forward customers, offering the latest trends, exceptional value, and convenient services like same-day to 48-hour delivery and flexible payment options. Our mission is to make style accessible, aspirational, and exciting for all, backed by a passionate team fostering a culture of creativity and innovation. At Styli, we aim to revolutionize fashion retail and bring unique experiences to our customers.

Role Overview

We are looking for a senior, hands-on Principal Engineer – DevOps to set the technical direction and own the cloud infrastructure, container platform, and delivery pipelines that power Styli's commerce business on GCP. As the most senior individual contributor on the platform engineering team, you will combine architectural ownership with deep hands-on execution — writing Terraform, tuning GKE, debugging incidents, and driving step-changes in reliability, velocity, and cost. You will define the standards and golden paths that engineering teams build on, mentor senior engineers, and partner with leadership to scale Styli reliably through flash sales, regional expansion, and traffic peaks of 10x–20x baseline.

What You'll Do

Cloud Architecture & Infrastructure (GCP & AWS)

• Own end-to-end cloud architecture for Styli's commerce platform on GCP (primary)-

compute, networking, storage, data, and managed services across multiple regions and

projects.

• Architect multi-region, highly available environments with strong fault tolerance, disaster

recovery (RTO/RPO), and cost efficiency.

• Lead IAM strategy, organisation policy, VPC design, Shared VPC, VPC Service Controls,

Private Service Connect, Cloud Interconnect, and cross-cloud networking.

• Drive cloud cost optimisation (FinOps) — right-sizing, committed-use and Spot/preemptible

strategies, chargeback/showback, and continuous review of multi-million-dollar cloud spend.

• Manage cloud-native services at scale: GKE, Cloud Run, Cloud SQL, AlloyDB, Spanner,

Pub/Sub, Cloud CDN, Cloud Armor, Apigee, and equivalent AWS services where used.

Kubernetes & Container Orchestration

• Architect and operate production Kubernetes clusters on GKE (Standard and Autopilot) at

scale — multi-cluster, multi-region, with strong tenancy and security boundaries.

• Design workload scheduling strategies: HPA, VPA, KEDA, node pool strategy, topology

spread, PDBs, and graceful shutdown patterns for stateful and stateless services.

• Own GitOps delivery with ArgoCD or Flux, plus Helm and Kustomize patterns that scale across

dozens of services and environments.

• Enforce cluster security: RBAC, network policies, OPA/Gatekeeper or Kyverno admission

control, signed images (Cosign/Sigstore), Workload Identity, and secrets management

(GCP Secret Manager / HashiCorp Vault).

• Run a production service mesh (Anthos Service Mesh / Istio) for mTLS, traffic shaping,

progressive delivery, and L7 observability.

Scaling, Reliability & Performance Engineering

• Design and implement horizontal and vertical scaling strategies for traffic peaks — flash

sales, campaign launches, and seasonal events driving 10x–20x baseline traffic.

• Build and tune auto-scaling pipelines across cloud instances, Kubernetes workloads, and

managed services; validate behaviour end-to-end with load testing and game days.

• Define SLOs, SLIs, and error budgets for critical services; lead capacity planning with defined

headroom and continuous tuning.

• Drive observability across logs, metrics, traces, and events using Prometheus, Grafana,

OpenTelemetry, Cloud Operations Suite, and tools like Datadog or New Relic.

• Lead incident response — on-call structure, runbooks, blameless postmortems — and drive

systemic reliability improvements (chaos engineering, dependency hardening, elimination of

single points of failure).

CI/CD & Developer Experience

• Design and own fast, secure CI/CD pipelines using GitHub Actions, GitLab CI, Jenkins, or

Cloud Build — supporting microservices, containerised builds, and multi-environment

promotion.

• Build the internal developer platform and golden paths so engineering teams can ship safely

and quickly without re-solving infrastructure problems.

• Implement progressive delivery patterns: blue-green, canary releases, and feature flags with

automated rollback driven by SLO breach.

• Embed security gates in CI/CD — SAST, DAST, SCA, IaC scanning, container image

scanning, and secret scanning — with developer-friendly feedback loops.

• Drive build optimisation — caching strategies, parallelisation, and pipeline observability — to

reduce feedback cycle times.

Infrastructure as Code, Automation & Technical Leadership

• Author and maintain infrastructure using Terraform (primary) at scale — module design, state

management, policy-as-code (OPA/Checkov), and CI-integrated plan/apply workflows.

• Eliminate operational toil through automation in Python, Bash, or Go; build self-healing

systems and reproducible environments across dev, staging, and production.

• Set technical standards, reference architectures, and golden paths; lead architecture reviews

for new services and major platform changes.

• Mentor senior engineers and influence the broader engineering organisation on cloud,

container, and delivery practices — without becoming a bottleneck.

What We're Looking For

Required Skills:

• 8–12 years of hands-on DevOps / Platform / Infrastructure Engineering experience, with 3+

years at a senior, staff, or principal IC level.

• Deep hands-on GCP expertise — designing and operating large-scale, multi-region production

environments using GKE, Cloud SQL/Spanner, Pub/Sub, Cloud CDN, Cloud Armor, Apigee,

Anthos Service Mesh, and GCP networking (Shared VPC, PSC, VPC-SC, Interconnect).

• Working knowledge of AWS for multi-cloud workloads — EKS, RDS, networking, and IAM.

• Expert-level Kubernetes operations: multi-cluster GKE, workload scheduling, autoscaling,

RBAC, networking (CNI, Ingress, Service Mesh), and admission control.

• Expert with Terraform for multi-cloud IaC — module design, state management, policy-as-code,

and CI-integrated workflows.

• Strong experience with CI/CD tooling (GitHub Actions, GitLab CI, Jenkins, Cloud Build) and

progressive delivery patterns.

• Strong scripting skills in Python and/or Go; comfortable reading application code in

mainstream languages.

• Solid understanding of Linux internals, distributed systems, networking (DNS, TLS, mTLS,

HTTP/2, gRPC), and modern data stores.

• Proven experience designing for high availability and scale at e-commerce scale — load

balancing, stateless service design, multi-tier caching, and graceful degradation.

• Excellent communication and influence — able to drive alignment across engineering, product,

and leadership without relying on positional authority.

Good to Have

• Experience running infrastructure for high-traffic e-commerce events (flash sales, product

launches, or festive-season scaling) at GCC, EMEA, or APAC scale.

• SRE foundations — SLOs, error budgets, capacity planning, and incident command.

• FinOps leadership — managing multi-million-dollar cloud spend with chargeback/showback

models.

• Service mesh (Anthos Service Mesh / Istio) and GitOps (ArgoCD / Flux) in production.

• Platform-level security hardening — container security (Falco), signed images, admission

control.

• Experience with data platform infrastructure: Kafka, Spark, Airflow, dbt, or BigQuery at

production scale.

• Chaos engineering and continuous resilience testing experience.

• Relevant certifications: Google Professional Cloud Architect / DevOps Engineer, AWS

Solutions Architect / DevOps Professional, CKA / CKS.

The Scale You'll Work At

• Millions of active users across multiple countries in the Middle East such as Saudi Arabia,

UAE, Kuwait, Qatar, Bahrain, and Oman.

• High-concurrency commerce events — flash sales and campaign launches driving 10x–20x

baseline traffic.

• Microservices deployed across multi-region Kubernetes clusters with strong tenancy and

security boundaries.

• Cloud architecture hosted on GCP (primary) with additional workloads on AWS, with a focus

on regional proximity to end users across the GCC.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 147871881

Similar Jobs

Bengaluru, India

Skills:

AWSKubernetesDevSecOps principlesInfrastructure-as-Codeobservability platformscloud-native architecturesCI CD platforms

Bengaluru, India

Skills:

PrometheusBashGrafanaDatadogTerraformHelmKubernetesPythonAWSGitOpsGoVPC networkingEKS

Bengaluru, India

Skills:

GithubGroovyTerraformDockerGitlabKubernetesPythonAWSAzure DevOpsGomonitoring and observability toolingSpinnaker

Bengaluru, India

Skills:

CloudformationPowerShellBashGrafanaAzure SqlJenkinsTerraformApp ServicesMicrosoft AzureArmKubernetesPythonAzure DevOpsLogic AppsAKSLog AnalyticsSynapseGitHub ActionsService BusBicepApplication Insights

Bengaluru, India

Skills:

containerization NetworkingBashJenkinsTerraformAnsibleAzurePythonAWSInfrastructure as CodeGoTroubleshootingGitLab ActionsOrchestration technologiesCI CD pipelinesLinux Unix systems administration