Search by job, company or skills

Data Dynamics

Software Platform Engineer

3-5 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 months ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Overview

We are seeking a skilled Platform Engineer to join our team and drive the development, deployment, and supportability of our Kubernetes-based microservices platform, deployed on-premises by customers. You will build comprehensive observability, enable log and report extraction for service cases without real-time access, and optimize our overuse of Kafka by integrating Redis and batch processing. This role requires expertise in Kubernetes, Azure DevOps, C++ support, deployment sizing, and designing for reliability, availability, and serviceability (RAS).

Responsibilities
  • Build Comprehensive Observability: Implement centralized metrics, logging, and tracing (e.g., Prometheus, Fluentd, OpenTelemetry) for .NET, Python, Java, C++, Kafka, and Redis, ensuring supportability in on-premises environments.
  • Enable Log/Report Extraction: Design customer-facing tools (e.g., CLI scripts, Helm chart options) to collect and export logs/metrics from on-premises deployments for service cases, without real-time access.
  • Optimize Kafka Usage: Audit and optimize Kafka configurations (e.g., topics, partitions, compression) to reduce metadata streaming overhead, monitored with Prometheus or Azure Monitor.
  • Implement Alternatives: Integrate Redis (e.g., Azure Cache for Redis) for metadata caching/pub-sub and batch processing (e.g., Azure Data Factory, Kubernetes Jobs) for high-volume data, reducing Kafka dependency.
  • Troubleshoot Customer Environments: Debug issues in on-premises customer deployments for services (C++, .NET, Python, Java), Kafka, and Redis, using exported logs and metrics.
  • Enhance Product Supportability: Build Azure DevOps pipelines and installers (e.g., Helm charts) for consistent, supportable deployments, with documentation for customer support.
  • Contribute to RAS: Own serviceability by building observability and diagnostic tools; support reliability/availability via Kubernetes optimization, autoscaling, and fault-tolerant designs.
  • Enforce Standards: Implement and enforce structured logging (e.g., JSON with correlation IDs) and resource sizing standards via Azure DevOps pipelines.
  • Optimize Deployment Sizing: Set Kubernetes resource requests/limits and autoscaling policies (e.g., HPA, VPA) for services, Kafka, Redis, and batch jobs, based on profiling.
  • Evaluate Service Meshes: Assess service meshes (e.g., Linkerd) for improving microservice and data platform observability and communication.
  • Support C++ Services: Assist developers in containerizing, deploying, and debugging C++ services, ensuring integration with observability, Kafka, Redis, or batch workflows.
  • Automate with Azure DevOps: Build CI/CD pipelines in Azure DevOps for automated builds, tests, and deployments, integrating with AKS, Kafka, and Redis.
Qualifications
  • Experience: 35 years with Kubernetes, Azure DevOps (AKS, pipelines), and Kafka administration.
  • Technical Skills:
  • Expert in Kubernetes (CKA/CKAD preferred) and Azure DevOps (YAML pipelines, AKS integration).
  • Proficient in observability tools (e.g., Prometheus, Grafana, Fluentd, OpenTelemetry, Azure Monitor) for metrics, logs, and tracing.
  • Experience with on-premises Kubernetes deployments and log/report extraction for service cases.
  • Proficient in Kafka optimization (e.g., topic management, consumer groups) and monitoring.
  • Knowledge of Redis (e.g., Azure Cache for Redis, pub/sub) and batch processing (e.g., Azure Data Factory, Kubernetes Jobs).
  • Familiarity with C++ build systems (e.g., CMake) and debugging (e.g., gdb) in Kubernetes.
  • Proficiency in Kubernetes resource management and autoscaling (e.g., HPA, VPA).
  • Scripting skills (e.g., Python, Bash) for automation, diagnostics, and log extraction.
  • Customer Focus: Proven ability to troubleshoot on-premises customer environments and build supportable deployment and observability tools.
  • Standards Enforcement: Experience enforcing logging, sizing, and data platform standards via Azure DevOps pipelines.
  • RAS Expertise: Ability to design for serviceability (observability, diagnostics) and contribute to reliability/availability through platform optimization.
Nice-to-Haves
  • Experience with service meshes (e.g., Linkerd, Istio) and their integration with Azure.
  • Familiarity with .NET, Python, or Java for developer collaboration.
  • Knowledge of air-gapped Kubernetes deployments (e.g., Kubeadm, K3s).

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 127708751

Similar Jobs