As a Senior Site Reliability Engineer for Arrive Logistics, you will be responsible for building a purposeful, proactive, and sustainable approach to reliability based on core SRE principles and practices. Your role covers the entire life-cycle of a product: from helping engineering teams with architecture and delivery to on-call incident response and triage. You will work with tools and platforms like New Relic, Docker, Azure, and Kubernetes to help teams rapidly deliver code to production that aligns with our reliability goals.
You’ll sit within the Engineering Organization supporting technology powering Arrive Logistics in the fast-paced freight industry. If you are passionate about Site Reliability Engineering and being part of a new team, read on.
What You’ll Do
- Create monitoring, alerting and dashboarding solutions that improve visibility into Arrive’s application performance and business metrics.
- Perform root cause analysis and post-mortems with an eye towards future prevention.
- Create automation to ensure repeatability, eliminate toil, reduce mean time to detection and resolution.
- Deliver features and fix bugs to ensure the reliability of systems, including supporting teams by contributing code to their applications.
- Partner with stakeholders across the organization to understand and drive requirements that take in all parties’ needs and reach optimal outcomes. Utilize knowledge of business needs and project requirements to assess benefit and risk analysis and advise on best course of action.
- Assess and advise on course corrections as needed.
- Translate high level or incomplete direction into planned tasks. Use industry knowledge to contribute to the direction of projects.
- Design CI/CD pipelines for applications and infrastructure.
- Produce high quality documentation and support tooling.
- Mentor junior engineers and share SRE knowledge across teams.
- Participate in on-call rotations and incident management.
- 4+ years of experience in systems, software, platform, site reliability, or DevOps engineering
- 2 + years of experience building or operating cloud applications on a major provider such as Azure (preferred), AWS, or GCP
- Fluency in English and Spanish
- Experience leading teams or large scale projects preferred
- Capable of owning technical design of moderate complexity without significant guidance
- Experience defining and monitoring infrastructure and applications using SLIs and SLOs with tools like New Relic (preferred), DataDog, Prometheus, or similar
- Demonstrated experience with CI / CD tools such as Azure DevOps, CircleCi, Github Actions, or similar
- Ability to handle projects end-to-end with little or no oversight
- Familiarity with cloud automation using Infrastructure as code tools like Terraform (preferred), Chef, or Puppet
- Capable of communicating technical decisions and design to non technical stakeholders
- Experience writing code in .NET, Java / Kotlin, or Python
- A history of mentoring junior engineers on technical and soft skills
- Experience being on-call and handling incidents
- Familiarity with multiple types of datastores such as Sql Server, Postgres, Redis, Cassandra, and similar
- Experience working with distributed and event based architectures
- Understanding of on-call rotations and incident management and ability to work an after hours on-call rotation
Your Arrive Experience
Our award-winning company culture is designed with you in mind. We are committed to supporting your personal and professional growth and making Arrive a place we all love to work.