Golang Job: DevOps - SRE (Canada)

Job added on

Company

Platform.sh

Location

Toronto - Canada

Job type

Full-Time

Golang Job Details

Mission

Platform.sh is a groundbreaking hosting and development tool for web applications.

To reinforce our technical prowess, we are looking to grow our operations team. If you're looking for an exciting, high-growth opportunity with an award-winning, cutting-edge company, this could be just the job for you

For its PaaS solution https://platform.sh is looking for an Operations and Service Reliability Engineer with a taste for Python and Go, great Linux system understanding, and a real hunger for the challenges of building robust, distributed systems.

Platform.sh is a PaaS shrouded in a lot of black magic (we can consistently clone a whole running cluster, with its state, databases, indexes in a matter of seconds). We want to get this down to the hundreds of milliseconds domain. Interested? There is more...

Our external API is pure Hypermedia REST + oAuth on top of Pyramid. It mechanizes the Git layer and needs more features.

We can consistently generate from the same manifest a Docker container, an LXC one, or VM disk images (AWS, Azure, OpenStack), we want more targets.

We probably have the highest industry container density. We need to get it higher.

We support any Python, Ruby, NodeJS or PHP, Java and .NET, time to roll-out Elixir, of course, Elixir (and Rust. We need Rust).

Directly reporting to Director of Operations and Engineering and in close collaboration with Customer Support and the rest of Engineering teams, you will be responsible for:

  • Cloud operation : configuration of clusters and systems, deploying changes and container images, provision capacity and help Support team debug production issues.
  • Automating: with the rest of Engineering teams work on automating all processes, and improve, secure and update existing automation.
  • Reliability: maintain core infrastructure services as code, work on monitoring systems and capacity planning.
  • Quality: be part of on-call schedule to receive real-time alerts, write or update documentation and runbooks for alerts created by services, and respond to incidents.

This is a fully remote position for a candidate based in Canada.

The ideal candidate

  • Has proven successful experience in an operations role
  • Has demonstrated the ability to successfully manage cloud-based infrastructure for a fast growing organization
  • Has experience with containerization technologies
  • Has had exposure to cloud services such as AWS, Azure, GCP, etc
  • Understands how an OS works, knows networking, how git works, and the constraints of a distributed system,
  • Puppet experience
  • Is proficient in Python (Golang a plus)

Nice to have

  • Knowledge of Magento Ecommerce, Symfony, Drupal, eZ Platform, or Typo3
  • Ability to cover weekends

Note: we don't like stress, so we build everything to be robust and resilient, but stuff does break. This is a role with on-call duties and fire drills. If this fills you with dread... well, this might not be a fit for you.


A typical month in our team would look like this

  • Development week: writing the code and the automation to make our infrastructure run smoothly, from Puppet, Go, Python, and it really goes from monitoring tasks up to self healing & updating
  • Deploy week: every that goes live on PSH is deployed by us, and the project managers assign those updates of clusters to whom is working during that week (during the off hours). We're always improving :)
  • Escalation week: whenever there's a tough problem support can't deal with, the team is investigating why, and our team help solve it
  • On-Call week: whenever a person is on-call, we don't add anything to that person, so that teammate has time to learn something new while being available in case something happens

About Platform.sh

Platform.sh is an idea-to-cloud application platform that simplifies cloud infrastructures.

We give developers the tools they need to experiment, innovate, get rapid feedback and deliver better-quality features with speed and confidence thanks to our unique rapid cloning technology.

Platform.sh serves thousands of customers worldwide including The Financial Times, Gap, Magento Commerce, Orange, Hachette, Ikea, Stanford University, Harvard University, The British Council, and Lufthansa.

We want people who are passionate, open, multicultural, friendly, humble and smart to join us and help this fast-growing, award-winning company to revolutionize the tech industry.