Search by job, company or skills

Bespoke Labs

Software Debugging Engineer

3-5 Years

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago

Job Description

About the Role

We are looking for a Software Debugging Engineer with deep expertise in diagnosing and resolving complex issues in large-scale software systems. In this role, you will be the go-to expert for uncovering hard-to-find bugs, performance bottlenecks, and production failures across distributed infrastructure.

Your work will directly improve system reliability, performance, and operational stability in production environments.

What You'll Do

  • Debug complex issues across large-scale and distributed software systems
  • Perform root cause analysis for production incidents and outages
  • Diagnose performance bottlenecks, memory leaks, and resource contention
  • Analyze logs, traces, and metrics to identify system failures
  • Build debugging tools, scripts, and automation to accelerate issue resolution
  • Create reproducible test cases from real production failures
  • Partner with engineering teams to implement fixes and preventive measures
  • Build and maintain observability systems including logging, tracing, and alerting
  • Write clear post-mortems and technical documentation
  • Improve system reliability through better monitoring, error handling, and diagnostics

What We're Looking For

  • 3+ years of software engineering experience with a strong focus on debugging
  • Proven experience debugging large-scale or distributed systems
  • Strong proficiency in Python for scripting, automation, and analysis
  • Deep understanding of Linux internals, system calls, and command-line tooling
  • Hands-on experience with debugging tools such as gdb, strace, perf, tcpdump, and Wireshark
  • Experience using profiling tools for CPU, memory, and I/O analysis
  • Familiarity with observability stacks such as Prometheus, Grafana, ELK, Jaeger, or similar
  • Excellent analytical and problem-solving skills
  • Strong written communication skills for documentation and post-mortems

Nice to Have

  • Experience debugging containerized systems using Docker and Kubernetes
  • Background in SRE, reliability engineering, or infrastructure roles
  • Knowledge of database internals and query optimization (PostgreSQL, Redis)
  • Experience with asynchronous systems and message queues (Kafka, Rabbit, MQ)
  • Familiarity with memory debugging tools such as Valgrind or AddressSanitizer
  • Experience participating in production incident response or on-call rotations

Why Join Us

  • The go-to expert for debugging complex, real-world systems
  • Direct impact on system reliability and production stability
  • Work with modern infrastructure, observability, and performance tooling
  • Collaborate with world-class engineers and researchers
  • Contribute to systems trusted by leading AI labs and Fortune 500 enterprises

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 141448301