Erlang Developer – Distributed Systems
Role Overview
We are looking for high‑ownership engineers who do more than write code. This role requires individuals who can understand complex system problems end‑to‑end, take accountability for outcomes, and drive solutions from design to production.
You will work on large-scale, distributed, real-time backend systems where correctness, resilience, and performance directly impact live operations. Success in this role demands strong technical judgment, system-level thinking, and the ability to operate with ambiguity.
What This Role Expects (Beyond Coding)
- Ability to understand problem statements deeply, ask the right questions, and propose technically sound solutions
- Ownership of services across their full lifecycle: design → implementation → deployment → monitoring → incident resolution
- Proactive identification of system risks, performance bottlenecks, and failure scenarios
- Willingness to make trade-offs, defend design decisions, and improve systems iteratively
- Comfort operating in production-first environments where uptime and correctness matter
Key Responsibilities
- Design, build, and own distributed backend services using Erlang/OTP
- Model complex workflows using concurrency and message-passing primitives
- Design and maintain robust supervision hierarchies and fault-recovery strategies
- Build systems handling thousands of concurrent processes with predictable behavior
- Ensure production readiness:
- Observability (logging, metrics, alerts)
- Graceful degradation and recovery
- Safe deployment strategies
- Debug and resolve production incidents, including distributed failures
- Collaborate closely with cross-functional teams on system interfaces and behavior
- Continuously improve system reliability, performance, and maintainability
Required Technical Skills
Core Expertise
- Erlang with strong production experience
- OTP Framework
- GenServer, Supervisor, FSM behaviors
- ETS and Mnesia
- Distributed Erlang
- Clustering
- Inter-node communication
- Handling partial failures and network partitions
Systems & Architecture
- Distributed systems design
- Actor-model concurrency
- Event-driven and asynchronous architectures
- Real-time / near-real-time systems
Platform & Infrastructure
- Linux (debugging, profiling, system behavior analysis)
- Networking fundamentals (TCP/UDP, REST, gRPC)
- Messaging systems (Kafka, RabbitMQ, or similar)
- Datastores:
- In-memory: ETS, Redis
- Persistent: PostgreSQL or equivalent
- Containerization and orchestration (Docker, Kubernetes)
- CI/CD, monitoring, and alerting pipelines
What We Value in Candidates
- Engineers who take responsibility, not just tickets
- Ability to reason about failure modes, not just happy paths
- Strong debugging skills in distributed, stateful systems
- Comfort working with long-running, always-on services
- Clear communication and technical articulation
- Bias toward making systems work in production
Nice to Have
- Experience with automation, robotics, or control systems
- Experience running 24x7 high-availability systems
- Cloud infrastructure experience ( GCP )
Experience Expectations
- 4+ years of backend or distributed systems development
- 3+ years of hands-on Erlang/OTP experience
Summary
This role is not for engineers who:
- Only implement predefined logic
- Avoid production ownership
- Prefer isolated problems with narrow scope
This role is for engineers who:
- Think in systems
- Own outcomes
- Turn complex problems into working, reliable solutions