Site Reliability Engineer

PayFit is hiring!

About

PayFit simplifies payroll management and HR processes for SMBs. We have built a fast, intuitive and automated SaaS solution to help business owners and HR professionals save time and money allowing them to refocus and what really matters: their employees. Through PayFit, employees have a dedicated access for their payslips and are able to manage efficiently their leave and expense requests. To build such a solution, we have created our own programming language: JetLang. Thanks to JetLang we were able to code Labour Code and collective agreements, and today we continue to add new features.

We have set ourselves a mission to support the digital transformation of HR management through our ever growing range of product features and services. We have a strong presence in France since 2015 and have been quickly growing in Germany, Spain and the UK. More than 3500 companies such as Big Mamma, MinuteBuzz or Sellsy to name a few already trust us. Over 500 PayFiters have already joined the adventure and we have raised 95M€ to keep growing.

Job Description

*As an SRE you will: *
Make monitoring and alerting alert on symptoms and not on outages. (Observability)
Document every action so your findings turn into repeatable actions–and then into automation. (TOIL)
Use the Home Made Payfit product to run App.Payfit.com as a first resort and improve the product resilience as much as possible (Reliability)
Improve the deployment process to make it “Easy Peasy Lemon Squeezy”.
Design, build and maintain core infrastructure pieces that allow Payfit scaling to support thousands of concurrent users.
Debug production issues across services and levels of the stack.
Plan the growth of Payfit's infrastructure.

Projects you could work on:
Coding infrastructure automation with ArgoCD and Terraform
Improving our Datadog Monitoring or building new Metrics
Plan, prepare for, and execute the new Architecture Design of New Cloud Environments
Develop a relationship with several Tribes to help them defining SLI, SLO and SLA to maintain and predict production events and Capacity Planning.

Execution:
Team organization and planning
Issue, Epic and Homemade Product LifeCycle

Collaboration and Communication:
Creating Slack posts when needed
Completing Post Mortem and Root Cause Analysis (RCA) investigations
Contributions to handbook, runbooks, general documentation
Leading and contributing to designs for issues, epics, okrs
Improving team practices in handoffs of work and incidents

Preferred Experience

You may be a fit for this role if you:
Think about systems - edge cases, failure modes, behaviors, specific implementations.
Know your way around Linux and the Unix Shell.
Know what is the use of config management systems like Helm, Ansible, Chef …
Have strong programming skills - Python and/or Go.
Have an urge to collaborate and communicate.
Have an urge to document all the things so you don't need to learn the same thing twice.
Have an enthusiastic, go-for-it attitude.
Have an urge for delivering with quality.
Share our values, and work in accordance with those values.
Have experience with ArgoCD, Redis, MongoDB, HAProxy, Docker, Kubernetes, Terraform, or similar technologies
Ability to use Github.

Leveling of Site Reliability Engineering at Payfit
Areas of expertise/contribution for Leveling
Technical:

Use Helm, ArgoCD, ... to efficiently manage our fleet of API/Application
Implement "Infrastructure as Code" using Terraform and CI/CD for automation
Load balancing the application using Proxies
Kubernetes and containerizing our system
Product knowledge
Monitoring and Metrics in Datadog, Prometheus and Grafana within integrations with Slack/PagerDuty
Logging infrastructure
Backend storage management and scaling
Disaster Recovery and High Availability strategy

Recruitment Process

What we offer ❤️
• an amazing working environment, designed for kindness and blossoming,
• an attractive remuneration package,
• a gym inside the office and Gymlib subscription with preferential rate,
• restaurant tickets,
• 4 weeks of paternity leave (fully covered) and 20 weeks of maternity leave (fully covered),
• Henner insurance (60% covered by PayFit),
• MacBooks are our standard, but we'll provide whatever equipment you need to help you get your job done!

Additional Information

  • Contract Type: Full-Time
  • Location: Paris, France (75017)