Search by job, company or skills

Tessell

DBRE Lead – Multi- Cloud DBaaS Platform

new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role Summary
We are seeking an experienced DBRE Lead (Database Reliability
Engineering Lead) to ensure reliability, performance, scalability, and
operational excellence of our multi-cloud DBaaS platform across:
. Microsoft Azure
. Amazon Web Services
. Google Cloud Platform
This role combines deep database expertise with SRE principles to build
highly available, automated, and resilient database platforms. The DBRE
Lead will drive operational standards, automation frameworks, and
reliability engineering practices across distributed cloud environments.

Core Mission
. Ensure 99.9%+ reliability and SLA adherence for DBaaS workloads
. Embed SRE principles into database platform engineering
. Drive automation-first operations across multi-cloud
. Lead incident management and reliability improvement programs

Key Responsibilities
1 Reliability & Availability Engineering

. Define and implement SLOs, SLAs, and error budgets for DBaaS
workloads
. Architect high-availability (HA) and disaster recovery (DR)
strategies
. Design cross-region and cross-cloud failover models
. Lead reliability reviews and post-incident analysis

2 Database Performance & Optimization
. Monitor and optimize performance across Oracle, PostgreSQL,
MySQL, SQL Server
. Identify bottlenecks across compute, storage, networking
. Lead capacity planning and scaling strategies
. Conduct database health audits and tuning initiatives

3 Automation & Platform Engineering
. Drive infrastructure automation using Terraform / ARM /
CloudFormation
. Implement Kubernetes-based database orchestration (where
applicable)
. Automate:
o Provisioning
o Patching
o Backup validation
o DR drills
o Scaling operations
. Integrate CI/CD with database lifecycle management

4 Observability & Monitoring
. Define monitoring standards using Prometheus, Grafana, ELK, Open
Telemetry
. Build real-time telemetry dashboards
. Establish proactive alerting frameworks
. Reduce MTTR through automation and runbooks

5 Incident & Problem Management
. Lead Major Incident Management (MIM) for database outages
. Drive root cause analysis (RCA) and preventive action plans
. Improve system resilience through chaos testing and failure
simulations

6 Security & Compliance Operations
. Ensure encryption, key management, and IAM integration
. Enforce secure configuration baselines
. Support compliance audits (SOC2, ISO, GDPR)
. Conduct vulnerability assessments and patch governance

7 Cross-Functional Leadership
. Collaborate with:
o Solution Architects
o Platform Engineers
o DevOps & SRE
o Security Teams
o Product Leadership
. Standardize operational practices across engineering squads
. Mentor DBRE engineers and define skill development roadmap

Required Qualifications
. 10+ years experience in Database Administration / Platform
Engineering
. 3+ years in SRE or reliability-focused roles
. Strong hands-on expertise in:
o Oracle
o PostgreSQL
o MySQL
o SQL Server

. Experience managing databases in at least two hyperscalers
(Azure/AWS/GCP)
. Strong automation mindset (Terraform, scripting, Python/Bash)
. Experience with Kubernetes (preferred)

Preferred Qualifications
. Experience working in a DBaaS or SaaS environment
. Multi-cloud networking knowledge (VPC/VNet, peering, private
endpoints)
. Experience with large-scale, multi-tenant systems
. Cloud certifications (Azure/AWS/GCP)
. Exposure to FinOps and cost optimization practices

Core Competencies
. Strong incident leadership capability
. Automation-first thinking
. Data-driven decision making
. Deep troubleshooting expertise
. Ability to operate under pressure
. Clear communication and stakeholder management

Key KPIs / Success Metrics
. SLA adherence & uptime improvements
. Reduction in P1/P2 incidents
. Reduced MTTR and improved recovery automation
. Performance optimization benchmarks achieved
. Automation coverage percentage across operations

More Info

About Company

Tessell

Job ID: 143712243