Site Reliability Engineer (CF-33912732)
Fast growing IT Company, that has been growing at a compound annual growth rate of 120% for the last few years.
Here are some highlights:
- Very international work environment, most of the developers are foreigners
- Full remote work and flex time
- Flat organizational structure and great dynamic work environment
Main responsibilities are:
- Be on a call and in rotation to respond to availability incidents and provide support for service engineers with customer incidents.
- Use your on-call shift to prevent incidents from ever happening.
- Run our infrastructure as code with pulumi.
- Make monitoring and alerting alert on symptoms and not on outages.
- Document every action so your findings turn into repeatable actions and then automate them.
- Improve the deployment process to make it as systematic as possible.
- Debug production issues across services and various stack levels.
- Plan the growth and future of company's infrastructure
Here are the main requirments:
- Ability to communicate effectively in English.
- Strong programming skills - Ruby and/or Go.
- 2+ years' experience in production working with cloud platforms like GCP or AWS
- 2+ years' experience in monitoring system using tools like Stackdriver or Datadog.
- 2+ years' of experience with Kubernetes for production and configuration management tools such as Kustomize and package manager, Helm.
- 1+ years' implementation experience with CI/CD pipelines tools, such as CircleCI and ArgoCD.
- Implementation experience in infrastructure as code and setting up SLO, SLI and error budgets.
To find out more about Computer Futures, please visit www.computerfutures.com | Computer Futures についてもっと詳しく知りたい方はこちらへ→ www.computerfutures.com
Award winner of:
Great Place to Work 2019 | Growth Company of the Year by TALint Recruitment Awards 2019 | Best IT & Technology Recruitment Company of the Year by Recruitment International Awards 2018