Staff Site Reliability Engineer (SRE), Engineering Tools

Tesla

3-5 Years

Bengaluru, India

Early Applicant

Posted a month ago
Be among the first 50 applicants

Job Description

About The Team
Engineering Tools owns and operates the on-prem developer platforms that every Tesla engineer depends on every day: GitHub Enterprise, JFrog Artifactory, GitHub Copilot (self-hosted), Cursor (on-prem), and the Atlassian suite (Jira Service Management + Confluence). We also run the AI-augmented support layer that fronts these platforms - a Mattermost support bot backed by our internal Nabu RAG platform, observability via Open Telemetry, and a GitOps-driven Kubernetes deployment footprint in our cluster.

If one of our systems is down, thousands of Tesla engineers stop shipping. We're hiring a Staff SRE to own the reliability, scalability, and operational maturity of that footprint.

Key Responsibilities

Platform administration: Manage GitHub Enterprise (Cloud and/or Server) organizations, teams, repos, branch protection rules, Actions runners, and Apps. Administer JFrog Artifactory repositories (local, remote, virtual), permissions, replication, and storage policies.

User support: Triage and resolve tickets covering access requests, repo migrations, build/artifact failures, authentication issues, and integrations. Define and meet SLAs.

Migrations & onboarding: Lead repo migrations into/out of GitHub (e.g., GitHub Migrations API, gh-migration tooling) and Artifactory repository imports/exports. Onboard new teams with templates and

standards.

Automation: Build scripts and tooling (Bash, Python, Terraform, GitHub Actions, JFrog CLI) to automate provisioning, permission audits, cleanup, and reporting. Eliminate repetitive support work.

Reliability & monitoring: Monitor platform health, storage usage, runner capacity, and license consumption. Coordinate upgrades, patches, and incident response with the vendor.

Security & compliance: Enforce SSO/SAML, SCIM provisioning, secret scanning, signed commits, audit logging, and least-privilege access. Support SOC 2 / ISO audits.

Integrations: Maintain integrations with CI/CD (Jenkins, GitHub Actions, GitLab CI), SAST/SCA scanners, Jira, Slack, and internal developer portals.

Documentation & enablement: Write runbooks, FAQs, and self-service guides. Host office hours and training sessions for

developers.

Required Qualifications

3+ years administering GitHub Enterprise (Cloud or Server) at scale (500+ users or 1000+ repos).

2+ years administering JFrog Artifactory (or comparable: Nexus, Cloudsmith, Harbor).

Strong scripting in Bash and Python; comfortable with REST APIs and curl/jq.

Working knowledge of Git internals (refs, packfiles, LFS, submodules) and ability to debug repo corruption, large-file issues, and merge problems.

Hands-on experience with at least one CI/CD system (GitHub Actions, Jenkins, GitLab CI, CircleCI).

Familiarity with SSO/SAML, SCIM, OIDC, and personal/fine-grained access tokens.

Excellent written communication - you can turn a confusing incident into a clear postmortem and a vague ticket into a fixable problem.

Preferred Qualifications

Experience with GitHub Migrations API, gh-migration-tool, or gei (GitHub Enterprise Importer).

Experience operating Artifactory in HA mode, with S3/blob storage, and Xray for vulnerability scanning.

Infrastructure-as-Code: Terraform providers for GitHub and Artifactory.

Container/package format expertise: Docker, npm, Maven, PyPI, Helm, Conan.

Familiarity with secret scanning tools (GitHub Advanced Security, GitGuardian, TruffleHog) and dependency management

(Dependabot, Renovate).

Prior on-call or production support experience.

Exposure to GHAS, Copilot for Business, or Copilot Enterprise rollouts.

Bonus

Experience operating self-hosted LLM inference (Copilot Enterprise, on-prem Cursor backend, vLLM, or similar), RAG pipelines, or vector databases.

Soft Skills
Excellent written communication - you can write a post-mortem that engineering leadership reads to the end, and a runbook that a junior on-call can execute at 3 AM. Strong technical influence without authority; you raise the reliability bar across teams by example and through reviews, not by mandate. Calm under pressure during sev-1 incidents affecting thousands of engineers.

Education
Bachelor's degree in Computer Science, Engineering, or related field or equivalent professional experience.

Why This Role Is Different

Customer = every Tesla engineer. Your platforms unblock Vehicle Software, Autopilot, Energy, and Manufacturing teams. The impact of every reliability improvement compounds across the company.

On-prem by design. We don't outsource our critical paths to SaaS. You'll own the full stack - hardware, network, OS, platform, application, observability - and you'll have the authority to change it.

AI-augmented support. We're not just operating platforms; we're building the AI tooling (Nabu RAG + Mattermost support bot + Copilot/Cursor integrations) that lets a small SRE team serve a very

large engineering org. You'll help shape that.

High autonomy, high ownership. Engineering Tools is small and senior-heavy. As a Staff SRE you'll set technical direction for multiple platforms - not just execute someone else's roadmap.