Senior DevOps Engineer

5-8 years
a month ago 29 Applied
Job Description

As a member of our Site Reliability Team, you will help drive collaboration across departments and provide global insights by being the consistent eyes on production. You will work with application, infrastructure, and product teams to ensure smooth launches while meeting proper security gates. You will have the chance to skill-up on communicating for global teams and help bring standards to teams with unique SDLCs.
This group runs extremely high on constant learning and shared education to avoid silos. To be most effective, you will want to have a solid grasp of engineering principles, infrastructure design, and a mature background in iterative product delivery.
On the Team you will participate in:
  • Driving Agile teams to support both interrupt and project work.
  • Estimating and delivering projects on-time and within budget through scope shifting and solid communication.
  • Analysis of trends on Golden Signal KPIs (or other) to provide useful feedback on anomalies.
  • Building ideas to shift conversations from outage/retro or symptom/prevention to prevention.
  • Partnerships with Engineering teams to remove drag from internal processes (Knowledge of Gitlab, Github/Git, Ansible, Terraform, Kubernetes, Docker, Consul).
  • Documenting and training others on your team and providing group training and demos.
  • Constantly improving Change, Release, Incident and Patching processes, with the goal of making them non-events.
  • Optimizing debug procedures for production issues across a variety of technical stacks.
  • Enforcing standards through communication, on design, implementation, and security.
You will be charged with having:
  • 5-8 years in Cloud commerce system delivery.
  • Expertise in at least two areas of application and infrastructure engineering.
  • A strong knowledge of AWS technologies and a willingness to self-teach.
  • Experience with CI/CD Automation (Our env: Ansible, Gitlab, Git/Github, Artifactory, Terraform)
  • Capabilities in design and delivery, bringing in projects on budget.
  • An understanding of capacity planning and how to set appropriate limits to optimize cost and performance.
  • Knowledge of identifying system scale, backoff or other throughput challenges to help prevent incidents or resolve them quickly.
  • Experience with performing to metric, SLI/SLO/SLA(s), and making meaningful commitments to customers.
  • History with product behavior, edge cases, failure modes, negative boundary behaviors, load mishaps, etc., to stop issues before they enter production.
  • A history of building and supporting multiple versions of Linux, and Windows OS.
  • An understanding of capacity planning and how to set appropriate limits to optimize resources.
Behavioral skills:
  • Team player as you will be a part of Global team distributes across different nations.
  • Problem solver & Self motivator.
  • Good in time management & task prioritization.

Technical Skills you should have:

  • Cloud platforms . AWS (preferably) or Azure & strong understanding of services like IAM, EC2, ECS, S3, Route53, DNS, LAMBDA, Elastic-Beanstalk , Elastic-Cache , RDS, etc
  • Containerization . Hands-on experience with Docker or Kubernetes. Be able to create and publish your own container images and deploy with Docker.
  • Infrastructure as code tools like Terraform for defining infrastructure in configuration files, and to create environments.
  • Process automation (CI/CD) tools preferably Gitlab, Jenkins etc Strong Knowledge of automation tools.
  • Shell scripting (or PowerShell in Windows).
  • Modern programming languages programming skills in a preferred language; Python, Java, PowerShell.
  • Observability: Good understanding of infra, application & log monitoring tools (like New-Relic , DataDogs, Zabbix, Grafana etc)

JOB TYPE

Industry

Other

Function

Skills

S3
RDS
Dns
Ec2
Iam
AWS
New-Relic
DataDogs
ECS
Elastic-Beanstalk
Elastic-Cache

People Also Considered

Career Advice to Find Better