Position Description
We are seeking a highly skilled advanced Confluent Kafka Engineer with deep, systems-level expertise in designing, operating, and governing large-scale distributed streaming platforms across on-prem and cloud environments.. You will work closely with application teams, architects, and DevOps to ensure reliable, secure, and scalable messaging services across the organization.
Core Expertise
- Mastery of Apache Kafka and the Confluent ecosystem, including topic/partition architecture, producer consumer semantics,
- Exactly-once processing, Kafka Connect, Kafka Streams, ksqlDB, Schema Registry, REST Proxy, and end-to-end stream governance.
- . Proven experience operating Kafka on-prem and in cloud environments (Azure and GCP), with strong understanding of high availability, fault tolerance, and disaster recovery design patterns
- Deep hands-on expertise with Confluent Replicator and Cluster Linking, including cross-cluster data replication, latency trade-offs, failure handling, and multi-region architectures
- Strong experience in Kafka cluster provisioning, lifecycle management, upgrades, and capacity planning in complex enterprise environments
- Advanced experience with Confluent for Kubernetes (CFK), including CRD-level configuration, JVM tuning, pod-level resource isolation, storage and network optimization
- High proficiency in Kubernetes, Terraform, and observability, including Kafka-specific monitoring, alerting, and performance diagnostics
- Strong DevOps and CI/CD understanding for automating Kafka infrastructure, connectors, and streaming applications
- High expertise in security and access provisioning, including LDAP integration, RBAC models, and automated entitlement scripts
- Experience integrating Kafka with enterprise platforms such as MuleSoft, Databricks, SAP, MFT/IFT systems, and microservices
- Proficiency in Python and Bash, with strong Linux systems knowledge
- Demonstrated capability in cluster cost optimization, performance tuning, and resource efficiency at scale
- Experience with monitoring tools (e.g., Dynatrace. Prometheus, Grafana).
Platform Ownership & Leadership
- Acts as L3 support owner for the Kafka platform, leading deep-dive troubleshooting, RCA, and production incident resolution
- Strong experience documenting and operating platforms using Confluence, including standards, runbooks, and architectural guidelines
- Ability to drive architectural and design decisions, define long-term Kafka strategy, and influence platform evolution across teams
- Operates with a distributed-systems mindset, anticipating failure modes and trade-offs
- Treats Kafka as a mission-critical data platform, not middleware
- Comfortable owning complex, multi-cluster, production-critical streaming ecosystems
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration abilities.
- Ability to work independently and in a team-oriented environment.