Staff Site Reliability Engineer (SRE), S3 storage

Tesla

5-7 Years

Bengaluru, India

Early Applicant

Posted a month ago
Be among the first 20 applicants

Job Description

Key Responsibilities:

System Reliability and Monitoring: Design and implement monitoring, alerting, and automation for S3 storage clusters to achieve 99.99%+ uptime. Use tools like Prometheus, Grafana, or Catchpoint to track performance metrics, capacity utilization, and anomaly detection.

Capacity Planning and Scaling: Forecast storage needs based on data growth trends (e.g., fleet expansion exceeding 80 PB) and proactively scale S3 buckets, lifecycle policies, and multi-region replication to support up to 150 PB+ capacities.

Incident Management: Lead on-call rotations, troubleshoot storage-related incidents (e.g., data access latency, replication failures), and perform root cause analysis using methodologies like blameless post-mortems.

Automation and Infrastructure as Code: Develop and maintain automation scripts (e.g., using Terraform, Ansible, or Python) for provisioning, configuring, and managing S3 resources, including security policies, encryption, and access controls.

Performance Optimization: Optimize data ingestion, retrieval, and archival processes to handle high-throughput workloads, reducing costs through intelligent tiering (e.g., S3 Intelligent-Tiering) and data compression.

Security and Compliance: Ensure storage systems comply with data protection standards (e.g., GDPR, SOC 2), implementing features like bucket policies, versioning, and encryption at rest/transit.

Collaboration and Innovation: Work with data engineering, AI, and energy teams to integrate S3 with other systems (e.g., Kubernetes, Spark). Contribute to open-source tools or internal projects for advanced storage solutions.

Documentation and Knowledge Sharing: Maintain runbooks, contribute to knowledge bases (e.g., in Confluence), and mentor junior engineers on best practices for object storage reliability.

Qualifications

Experience: 5+ years in SRE, DevOps, or systems engineering roles, with at least 3 years focused on AWS S3 or similar object storage (e.g., GCS, Azure Blob). Proven track record managing large-scale (PB-level) storage systems.

Technical Skills:

Expertise in AWS services (S3, EC2, Lambda, CloudWatch) and infrastructure tools (Terraform, Kubernetes, Docker).

Proficiency in scripting/programming (Python, Go, Bash) for automation and tooling.

Strong understanding of distributed systems, networking, and storage concepts (e.g., eventual consistency, CRR/SRR replication).

Experience with monitoring and logging tools (Prometheus, Grafana, Splunk).

Soft Skills: Excellent problem-solving abilities, strong communication skills, and a collaborative mindset. Ability to thrive in a fast-paced, high-stakes environment.

Education: Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).

More Info

Job Type:

Permanent Job

Industry:

Other

Function:

Sre Devops Or Systems Engineering

Employment Type:

Full time

About Company

Tesla

Job ID: 148621415

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 21-07-2026 03:18:11 AM

Homejobs in Bengaluru / BangaloreStaff Site Reliability Engineer (SRE), S3 storage

Similar Jobs

Staff Site Reliability Engineer (SRE), Engineering Tools

Tesla

3-5 yrs

Bengaluru, India

Skills:

Jfrog Artifactory, Maven, Helm, Terraform, Npm, Sso, Saml, Python, Bash, Docker, Jenkins, SCIM, GitLab CI, TruffleHog, Conan, GitHub Enterprise, gh-migration-tool, GitHub Migrations API, Renovate, OIDC, GitHub Actions, GitHub Advanced Security, Dependabot, GitGuardian, PyPI, CircleCI

Staff Site Reliability Engineer, Application SRE

NetSkope Software

7-9 yrs

Bengaluru, Chennai

Skills:

Unix, C++, Perl, Data Structures, Ruby, Python, Performance Tuning