Search by job, company or skills

Billtrust

Site Reliability Engineer

Save
  • Posted 22 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Who We Are

Finance leaders choose Billtrust to get paid faster, control costs, and maximize customer satisfaction. As the leader in B2B accounts receivable workflow and payment software, we provide the world's leading brands with AI-powered solutions across the full AR lifecycle—from invoice presentment and payment processing to cash application and collections. With over 2,600 global customers, more than $1 trillion in invoice dollars processed, and a proprietary network of 13 million buyers, Billtrust delivers business value through deep industry expertise and a culture relentlessly focused on meaningful customer outcomes.

We're an AI-first company, not just in what we build for our customers, but in how we work. Across every function, our teams use AI tools daily to work faster, make better decisions, and deliver higher-quality outcomes. We hire exceptional people, give them cutting-edge AI capabilities, and measure success by the impact they create. If you want to do the best work of your career at the frontier of AI and fintech, Billtrust is the place to do it.

Our Values

Customers

We relentlessly increase value for customer and do the right thing for them.

Action

We make thoughtfully fast decisions, act quickly, cut through red tape, deliver progress not perfection, take ownership and accountability.

Team Spirit

We put the team ahead of ourselves, foster trust and respect, collaborate with passion, despise toxic politics, value our differences, and celebrate together.

Innovation

We challenge the status quo, experiment thoughtfully, and are novel and brilliant in what we create.

Excellence

We love to win, but we hate losing even more. We aspire to be the best and take pride in our work. When we fall short, we own it and come back stronger.

Site Reliability Engineer

As a Site Reliability Engineer within our Operations Engineering Center, you'll ensure the reliability, scalability, and performance of Billtrust's infrastructure that powers mission-critical order-to-cash operations. You'll participate in our follow-the-sun SRE coverage across time zones. You'll respond to incidents, implement monitoring and alerting strategies, and engineer autonomous incident response systems through agentic runbooks and intelligent triage. Your work will directly impact billions of dollars in transactions processed through our platform while pioneering AI-driven operational excellence.

Key Responsibilities

  • Respond to incidents, perform root cause analysis, and lead post-mortem discussions
  • Implement and maintain comprehensive monitoring, alerting, and observability across infrastructure
  • Establish and maintain SLO frameworks, tracking and improving reliability metrics
  • Engineer autonomous alert triage agents and agentic runbooks for incident response
  • Design and build intelligent incident correlation engines using AI/ML techniques
  • Develop and maintain infrastructure automation, CI/CD pipelines, and deployment procedures
  • Manage Kubernetes clusters, container orchestration, and cloud platform resources (AWS)
  • Lead toil reduction initiatives through automation, focusing on high-impact pain points
  • Collaborate with platform and product teams on infrastructure requirements and capacity planning

Required Qualifications

Experience & Technical Background

  • 5+ years of hands-on experience in Site Reliability Engineering or infrastructure operations
  • Strong proficiency with Linux/Unix systems administration and shell scripting
  • Experience with cloud platforms (AWS preferred, Azure or GCP acceptable)
  • Hands-on Kubernetes and container orchestration experience
  • Demonstrated expertise in incident response, troubleshooting, and post-mortem analysis
  • Strong background with monitoring tools (Datadog, Prometheus, Grafana, PagerDuty)
  • Experience with infrastructure automation and infrastructure-as-code tools (Terraform)
  • Proficiency with at least one programming/scripting language (Python, Go, Bash preferred)
  • Proficiency using Claude Code, GitHub Copilot or similar AI coding assistance

Soft Skills & Attributes

  • Excellent communication skills, particularly during high-stress incident situations
  • Problem-solving mindset with focus on automated solutions over manual workarounds
  • Reliability-first mentality with attention to detail and systems thinking
  • Ability to thrive in a distributed, follow-the-sun team environment
  • Comfort with on-call responsibilities and 24x7 operational commitment

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 148884871

Similar Jobs

Hyderabad, India

Skills:

JavaMicroservice architectureMqPostgreSQLSpring BootKafkaJIRAJenkinsGcpOpenshiftKuberneteslogging toolsOAQTektonChaos Engineering conceptsprivate public key managementGitHub ActionsCloud WAF securityCamunda process orchestration engine

Hyderabad, India

Skills:

ShellSpring BootGrafanaNginxJvmKafkaRedisPrometheusPythonBashCustom ExportersVictoria MetricsYugabyteDBAlertmanagerFluentdOpenTelemetryVictoria LogsTraces

Hyderabad, India

Skills:

Windows ServicesWindows ServerGcpElkLinuxIisPowerShellAzurePythonAWSActive Directory

Hyderabad, India

Skills:

JenkinsWindows ServerTerraformIisScriptingLinuxAWS Cloud PlatformInfrastructure as Code

Hyderabad, India

Skills:

CeleryDockerTerraformCosmos DBPostgres SqlPowerShellBashItilDatadogSqlArmKubernetesChecklyLog AnalyticsOpenTelemetryOpenAI APIsBicepApplication InsightsLangChainMicrosoft Azure CloudAI ML-based anomaly detectionPlaywrightKustoAzure Monitor