We're looking for a Solution Engineer (Infrastructure SDE3) to design, automate, and operate secure, scalable cloud infrastructure on Azure and/or AWS. You will coordinate production releases, own platform reliability, and lead deep-dive troubleshooting using Python and modern DevOps tooling to reduce toil and improve time-to-recovery. We're looking for a Solution Engineer (Infrastructure SDE3) to design, automate, and operate secure, scalable cloud infrastructure on Azure and/or AWS. You will coordinate production releases, own platform reliability, and lead deep-dive troubleshooting using Python and modern DevOps tooling to reduce toil and improve time-to-recovery.
Responsibilities
- Release and Environment Management: Coordinate production deployments across environments; maintain CI/CD pipelines, versioning, and tested rollback paths; and ensure clean promotion flows and audit-ready release notes.
- Cloud Infrastructure Ownership: Design, implement, and maintain resilient networks and services (VNet/VPC, subnets, VPN/peering/gateways, load balancers, storage, IAM, secrets/keys); right-size and harden resources for security, cost, and performance.
- Automation and Observability: Automate routine ops with Python and Bash (provisioning, health checks, drift detection, runbooks); improve metrics/logs/traces and build actionable dashboards/alerts.
- Container and Build Tooling: Manage Docker images/registries, supply-chain basics, administer build systems (e. g., Jenkins/GitHub Actions/Azure DevOps), and enforce branch/release strategies in Git.
- Incident Response and RCA: Triage infra, networking, and deployment issues; perform root-cause analysis; implement preventive fixes that reduce MTTR and alert noise; keep stakeholders informed.
- Documentation and Enablement: Maintain up-to-date runbooks, architecture diagrams, and how-to guides; mentor engineers on platform usage and best practices.
Requirements
- Strong Python (3x) for infra/ops automation; solid Bash on Linux.
- Hands-on with Azure and/or AWS core services: networking (VNet/VPC, VPN, gateways), storage, compute, IAM/roles, secrets management, and load balancing.
- CI/CD expertise (Jenkins, GitHub Actions, or Azure DevOps): pipeline design, artifact/version management, automated rollbacks, and environment promotions.
- Containers (Docker): image hardening, registry workflows, and basic runtime troubleshooting.
- Git proficiency: branching strategies, code review workflows, and release tagging.
- Observability: metrics, logs, and alerting (CloudWatch/Log Analytics/Prometheus/Grafana or similar).
- Proven incident management experience: on-call participation, RCA writing, and stakeholder communication.
Nice To Have
- Kubernetes (AKS/EKS) fundamentals.
- Infrastructure as code (Terraform/Bicep/CloudFormation) and policy as code (OPA/Azure Policy).
- Security/compliance awareness (least privilege, key rotation, CIS benchmarks) and cost optimization 2/2
- RDBMS basics (MySQL/PostgreSQL) for operational tasks and performance triage.
This job was posted by Laveena Soni from Skeps.