Job Description
JOB DESCRIPTION
We are looking for an experienced Senior Cloud Database Engineer with deep expertise in Cassandra/Scylla to manage, scale, and optimize distributed data platforms. The role involves hands-on ownership of production clusters, automation to reduce operational toil, and close collaboration with engineering teams to ensure high availability, performance, and reliability.
Key Responsibilities
- Manage and operate large-scale Cassandra/Scylla clusters in production environments.
- Design, implement, and maintain backup and restore mechanisms to meet DR and compliance requirements.
- Automate operational tasks to reduce manual effort using Shell, Python, Ansible, and Terraform.
- Monitor system health and performance using Grafana and other open-source monitoring tools.
- Troubleshoot and resolve cassandra/scylla cluster-related issues including - Node failures, Data consistency and table-level issues, High load and performance bottlenecks, Client connectivity problems.
- Create and maintain runbooks for alert handling and operational procedures.
- Implement and manage AWS security and networking components, including- Security Groups and network access controls, IAM roles, policies, and permissions, S3 bucket policies and access management, Auto Scaling Groups, VPC networking, subnets, routing, and connectivity considerations.
- Collaborate with teams to adopt best practices in AWS security and networking.
- Participate in on-call rotations and handle incident responses.
- Collaborate with teams to support data modeling best practices and optimize schema design.
- Support additional data technologies such as StarRocks, AWS RDS, SQL Server, Snowflake, and Amazon Redshift.
- Mentor, train, and guide peers; share knowledge and operational best practices.
- Continuously evaluate and adopt new technologies to improve platform reliability and efficiency.
Qualifications
Required Skills & Experience:
- 8+ years of hands-on experience managing Cassandra and/or Scylla in production.
- Strong experience with AWS Cloud services.
- Proficiency in Git for version control and collaboration.
- Solid experience in automation and infrastructure as code, including:
Shell scripting Python
Ansible
Terraform
- Experience in backup/restore strategies, automation, and disaster recovery planning.
- Experience with monitoring and alerting tools such as Grafana or other open-source solutions.
- Strong troubleshooting skills across distributed systems and databases.
- Expert-level knowledge of data modeling, especially for NoSQL systems.
- Excellent communication skills, both verbal and written.
- Ability to mentor, monitor, and train team members effectively.
- Willingness and ability to participate in on-call support.
- Experience in managing AWS RDS.
- Exposure to analytical and relational databases such as: - StarRocks, Amazon Redshift, Snowflake, SQL Server.