Search by job, company or skills

Tesla

Staff Site Reliability Engineer (SRE), Engineering Tools

Save
new job description bg glownew job description bg glow
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About The Team
Engineering Tools owns and operates the on-prem developer platforms that every Tesla engineer depends on every day: GitHub Enterprise, JFrog Artifactory, GitHub Copilot (self-hosted), Cursor (on-prem), and the Atlassian suite (Jira Service Management + Confluence). We also run the AI-augmented support layer that fronts these platforms - a Mattermost support bot backed by our internal Nabu RAG platform, observability via Open Telemetry, and a GitOps-driven Kubernetes deployment footprint in our cluster.

If one of our systems is down, thousands of Tesla engineers stop shipping. We're hiring a Staff SRE to own the reliability, scalability, and operational maturity of that footprint.

Key Responsibilities

  • Platform administration: Manage GitHub Enterprise (Cloud and/or Server) organizations, teams, repos, branch protection rules, Actions runners, and Apps. Administer JFrog Artifactory repositories (local, remote, virtual), permissions, replication, and storage policies.
  • User support: Triage and resolve tickets covering access requests, repo migrations, build/artifact failures, authentication issues, and integrations. Define and meet SLAs.
  • Migrations & onboarding: Lead repo migrations into/out of GitHub (e.g., GitHub Migrations API, gh-migration tooling) and Artifactory repository imports/exports. Onboard new teams with templates and

  • standards.
  • Automation: Build scripts and tooling (Bash, Python, Terraform, GitHub Actions, JFrog CLI) to automate provisioning, permission audits, cleanup, and reporting. Eliminate repetitive support work.
  • Reliability & monitoring: Monitor platform health, storage usage, runner capacity, and license consumption. Coordinate upgrades, patches, and incident response with the vendor.
  • Security & compliance: Enforce SSO/SAML, SCIM provisioning, secret scanning, signed commits, audit logging, and least-privilege access. Support SOC 2 / ISO audits.
  • Integrations: Maintain integrations with CI/CD (Jenkins, GitHub Actions, GitLab CI), SAST/SCA scanners, Jira, Slack, and internal developer portals.
  • Documentation & enablement: Write runbooks, FAQs, and self-service guides. Host office hours and training sessions for

  • developers.

    Required Qualifications


  • 3+ years administering GitHub Enterprise (Cloud or Server) at scale (500+ users or 1000+ repos).
  • 2+ years administering JFrog Artifactory (or comparable: Nexus, Cloudsmith, Harbor).
  • Strong scripting in Bash and Python; comfortable with REST APIs and curl/jq.
  • Working knowledge of Git internals (refs, packfiles, LFS, submodules) and ability to debug repo corruption, large-file issues, and merge problems.
  • Hands-on experience with at least one CI/CD system (GitHub Actions, Jenkins, GitLab CI, CircleCI).
  • Familiarity with SSO/SAML, SCIM, OIDC, and personal/fine-grained access tokens.
  • Excellent written communication - you can turn a confusing incident into a clear postmortem and a vague ticket into a fixable problem.

  • Preferred Qualifications


  • Experience with GitHub Migrations API, gh-migration-tool, or gei (GitHub Enterprise Importer).
  • Experience operating Artifactory in HA mode, with S3/blob storage, and Xray for vulnerability scanning.
  • Infrastructure-as-Code: Terraform providers for GitHub and Artifactory.
  • Container/package format expertise: Docker, npm, Maven, PyPI, Helm, Conan.
  • Familiarity with secret scanning tools (GitHub Advanced Security, GitGuardian, TruffleHog) and dependency management

  • (Dependabot, Renovate).
  • Prior on-call or production support experience.
  • Exposure to GHAS, Copilot for Business, or Copilot Enterprise rollouts.

  • Bonus


  • Experience operating self-hosted LLM inference (Copilot Enterprise, on-prem Cursor backend, vLLM, or similar), RAG pipelines, or vector databases.

  • Soft Skills
    Excellent written communication - you can write a post-mortem that engineering leadership reads to the end, and a runbook that a junior on-call can execute at 3 AM. Strong technical influence without authority; you raise the reliability bar across teams by example and through reviews, not by mandate. Calm under pressure during sev-1 incidents affecting thousands of engineers.

    Education
    Bachelor's degree in Computer Science, Engineering, or related field or equivalent professional experience.

    Why This Role Is Different

  • Customer = every Tesla engineer. Your platforms unblock Vehicle Software, Autopilot, Energy, and Manufacturing teams. The impact of every reliability improvement compounds across the company.
  • On-prem by design. We don't outsource our critical paths to SaaS. You'll own the full stack - hardware, network, OS, platform, application, observability - and you'll have the authority to change it.
  • AI-augmented support. We're not just operating platforms; we're building the AI tooling (Nabu RAG + Mattermost support bot + Copilot/Cursor integrations) that lets a small SRE team serve a very

  • large engineering org. You'll help shape that.
  • High autonomy, high ownership. Engineering Tools is small and senior-heavy. As a Staff SRE you'll set technical direction for multiple platforms - not just execute someone else's roadmap.







  • More Info

    Job Type:
    Industry:
    Function:
    Employment Type:

    About Company

    Job ID: 148538009