Golang Job: Senior Site Reliability Engineer - Azure

Job added on

Company

Cambridge Resources Inc
United States of America

Location

Remote Position
(From Everywhere/No Office Location)

Job type

Full-Time

Golang Job Details

**This is a fully remote, employee role**

We are seeking a Senior System Reliability Engineer, who is passionate about delivering reliable, scalable, efficient, and highly available platforms.  Working under the general supervision of the Director Enterprise Technology Services, you will constantly be optimizing and automating our processes and systems to improve reliability, scalability, and reduce toil. 

Plus, you will participate in systems design, deployment and take on real-time responsibilities, such as monitoring, incident management, and recovery.

Primary Job Duties

  • Collaborate with and across teams to design, develop, test, implement, and support technical solutions for container orchestration platforms
  • Build standard processes and procedures to automate the deployment, troubleshooting, monitoring, and recovery of infrastructure in the cloud leveraging infrastructure as code practices
  • Architect and execute migration of existing workloads from on-prem, traditional infrastructure, to the cloud
  • Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal and external technology communities, and mentoring other members of the team.
  • Build tools to monitor systems and automate processes around the core network, storage and network infrastructure
  • Core contributor to our Architecture Review Board, change management and blameless postmortem processes
  • Collaborate with teams and assist in troubleshooting issues across the whole stack – hardware, software, applications, and network
  • Capacity planning and performance engineering related projects
  • Collaborate with other business functions to bring best of breed product and solution to fruition with automation, reliability, scalability, and observability as core tenets
  • Build for resilience. Our goal is that nobody gets called off-hours, ever!  While w work on that, participate in a weekly on-call rotation.
  • Improve our infrastructure capabilities, optimizing for cost, simplicity, and maintainability

Basic Qualifications:

  • Experience running high availability cloud deployments with a major provider, namely Microsoft Azure
  • Experience automating systems administration tasks using tools like Ansible, Terraform and languages such as Python, Bash or Go
  • Experience with cloud monitoring and observability
  • Comfortable with git and Infrastructure as Code workflows
  • At least 4 years of Linux system administration experience. In-depth experience with RHEL, CentOS, Windows Server with strong debugging, troubleshooting and problem-solving skills.
  • At least 3 years of experience in DevOps Engineering – Internship experience will be considered
  • At least 2 years of experience with Cloud Native technologies, namely Microsoft Azure
  • 2+ years’ experience with scripting and coding (Bash, Python, SQL or Golang or comparable languages)

Preferred Qualifications:

  • 2+ years of experience with Terraform or Docker or Ansible, Git, and Jenkins
  • 2+ years of experience with multi-tenant container orchestration platforms and services including Docker or Kubernetes
  • 2+ years of experience working with Agile Development Practices
  • Plus to have experience with Kubernetes based cloud-native technologies such as argo, Kubeflow, istio, linkerd, and dex
  • Experience with Docker or Kubernetes to create and manage portable, extensible, containerized workloads and services a plus