The ideal candidate brings deep PostgreSQL expertise, hands‑on Patroni experience, and a strong background in production operations and advanced troubleshooting. Experience with MySQL in enterprise environments is highly desirable.
Key Responsibilities
PostgreSQL Architecture & Design
- Design and maintain scalable, fault‑tolerant PostgreSQL architectures aligned with Fareportal's business and growth strategy
- Define standards for PostgreSQL cluster design, configuration, versioning, and lifecycle management
- Architect and operate high‑availability PostgreSQL environments using Patroni with distributed consensus systems (etcd / Consul)
- Lead PostgreSQL upgrades, migrations, and platform modernization initiatives with minimal downtime
- Evaluate and implement PostgreSQL extensions, features, and best practices suitable for large‑scale production workloads
Operations & Production Support (Primary Focus)
- Own the day‑to‑day operational health of PostgreSQL platforms across production and non‑production environments
- Lead major incident response, including failovers, performance degradation, replication issues, and data consistency problems
- Perform deep root‑cause analysis (RCA) and drive long‑term corrective and preventive actions
- Tune PostgreSQL for performance and stability, including:
- Memory and storage optimization
- WAL and checkpoint tuning
- Autovacuum and bloat management
- Query optimization and execution plan analysis
- Manage and validate backup, restore, and point‑in‑time recovery strategies (pgBackRest, Barman, or equivalent)
- Act as a senior escalation point during on‑call rotations and high‑severity production events
Patroni & High Availability
- Design, configure, and operate Patroni‑based PostgreSQL clusters in production
- Troubleshoot and resolve complex HA scenarios such as:
- Leader election failures
- Split‑brain conditions
- Network partitions and fencing issues
- Define and tune HA behavior (failover, switchover, synchronous replication settings)
- Establish safe maintenance, patching, and upgrade procedures for HA environments
Automation, Monitoring & Reliability
- Drive automation initiatives for PostgreSQL provisioning, configuration, patching, and deployments
- Implement proactive monitoring and alerting for performance, replication, capacity, and availability
- Partner with SRE and infrastructure teams to improve platform reliability, alert quality, and operational tooling
- Develop and maintain operational documentation, runbooks, and standard operating procedures
Cross‑Database Expertise (Desired)
- Provide architectural and operational guidance for MySQL platforms where applicable
- Troubleshoot MySQL performance, replication, and availability issues
- Support coexistence or migration strategies between PostgreSQL and MySQL environments
Infrastructure & Tooling
- Strong Linux fundamentals
- Experience with cloud or hybrid environments (AWS / Azure / GCP)
- Familiarity with monitoring tools (Prometheus, Grafana, Datadog, etc.)
- Scripting skills in Bash, Python, or similar