Senior Site Reliability Engineer

· Remote

Location

Remote

Type

Full Time

Job Description

Order.coJobs
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Posted Yesterday
Hiring Remotely in United States
Remote or Hybrid
175K-200K Annually
Senior level
eCommerce • Fintech • Payments • Software
Order.co is a B2B Ecommerce Platform that simplifies purchasing.
The Role
The role involves ensuring software reliability and performance managing incidents developing infrastructure automation and mentoring junior engineers within a platform team.
Summary Generated by Built In

Order.co is the System of Action for the Office of the CFO transforming the way businesses purchase and pay into an intuitive B2C-like shopping experience. Order.co leverages embedded AI agents and embedded financial products to reinvent the way businesses connect with their vendors. 


End users enjoy a seamless zero-training buying experience while finance and procurement leaders gain a single platform to orchestrate how the business “should operate”. The result is an all-in-one solution that serves as a gravitational pull for spend and data automating and eliminating procurement and finance workflows from requisition to reconciliation along the way.


Order.co is on the cutting edge of B2B Agentic Commerce poised to be the market leader in creating a more predictive prescriptive and personalized experience for users. 


Founded in 2016 and headquartered in New York City Order.co oversees nearly half a billion in annualized spend across hundreds of customers like WeWork SoulCycle Lume and [solidcore]. Order.co has raised $75M in funding from industry-leading investors like MIT Stage 2 Capital Rally Ventures 645 Ventures and more. Order.co has been proudly named a 50 to Watch by Spend Matters and a Best Place to Work by BuiltIn and Inc. Magazine.

The Role

As a Senior Site Reliability Engineer on the Platform team you will ensure that software systems are reliable scalable performant and operationally efficient. You blend software engineering skills with infrastructure and operations expertise to keep critical systems running smoothly while enabling rapid product development. 

ResponsibilitiesReliability Engineering & Infrastructure Ownership
  • Design build and operate highly available scalable and fault-tolerant infrastructure and platform services
  • Own reliability availability latency and operational excellence for critical production systems and services
  • Define and maintain service level objectives (SLOs) service level indicators (SLIs) and error budgets across platform systems
  • Lead incident response efforts for complex production outages; drive root-cause analysis and long-term remediation actions
  • Build resilient systems that gracefully handle failures traffic spikes dependency degradation and regional outages
  • Continuously improve system reliability through automation observability performance tuning and capacity planning
Automation & Platform Engineering
  • Develop infrastructure automation and self-service tooling to reduce operational toil and improve engineering velocity
  • Build and maintain CI/CD pipelines deployment automation and release engineering workflows
  • Implement infrastructure as code (IaC) practices using tools such as Terraform CloudFormation and container orchestration
  • Improve developer experience by building reliable internal platforms operational tooling and standardized deployment patterns
  • Drive adoption of GitOps immutable infrastructure and automated remediation patterns
Observability & Operational Excellence
  • Design and maintain comprehensive monitoring logging tracing and alerting systems for distributed services
  • Establish actionable alerting standards that reduce noise while improving incident detection and response times
  • Analyze production trends system bottlenecks and failure patterns to proactively prevent incidents
  • Lead operational readiness reviews disaster recovery planning and game-day exercises
  • Improve mean time to detect (MTTD) and mean time to recovery (MTTR) through tooling automation and process refinement
Systems Architecture & Scalability
  • Participate actively in architecture and infrastructure design reviews
  • Propose scalable and reliable platform designs that account for multi-region deployment redundancy failover and security considerations
  • Evaluate trade-offs between reliability scalability operational complexity and engineering velocity
  • Identify systemic risks and operational gaps before they become production incidents
  • Partner with engineering teams to ensure services are designed with operability observability and resilience in mind from day one
Security & Compliance
  • Approach infrastructure and operational practices with a strong security mindset
  • Implement and maintain secure cloud networking secrets management IAM policies and infrastructure hardening standards
  • Partner with Security and Compliance teams to ensure systems meet organizational and regulatory requirements
  • Drive operational best practices around vulnerability management patching and production access controls
End-to-End Ownership & Collaboration
  • Scope and estimate infrastructure and reliability initiatives accurately
  • Coordinate production rollouts maintenance events and reliability improvements across teams
  • Communicate operational risks dependencies and incident impacts clearly to technical and non-technical stakeholders
  • Collaborate closely with Software Engineering Security Product and Operations teams to improve platform reliability and scalability
  • Serve as a trusted escalation point during critical production incidents
Mentorship & Technical Leadership
  • Mentor junior and mid-level engineers on reliability engineering principles operational excellence and infrastructure best practices
  • Raise the operational maturity of the engineering organization through documentation reviews and technical guidance
  • Drive improvements in team standards around observability incident management automation and infrastructure design
  • Influence technical decisions through credibility operational expertise and strong engineering judgment
Qualifications
  • You are motivated by accountability — you own outcomes not just tasks
  • You are results-oriented and measure success by shipped working software
  • You are motivated by correctness in code that touches money — the consequences of a bug land on real customer balances and you take that seriously
  • You love helping people on your team grow and improve
  • Writing tests is an integral part of your development process not an afterthought
  • You know how to design and build software incrementally — you don't need a complete spec to make progress
  • Collaborating with the people around you to achieve a goal motivates you
  • You are collaborative open-minded and actively developing your craft
  • You are curious and pragmatic about AI-driven solutions — you apply them where they add real value and stay skeptical where they don't
  • Familiarity with AI-assisted development tools — you understand how they work where they help and where they fail. Prior hands-on use is a plus; intellectual curiosity and the instinct to evaluate AI output critically are what matter
Technical Skills
  • Strong foundation in computer science fundamentals: data structures algorithms and system design
  • Familiarity with building production-grade applications and services using Ruby and Ruby on Rails
  • Deep expertise with Linux systems administration and production troubleshooting
  • Strong experience operating cloud infrastructure at scale particularly within AWS environments
  • Experience with Kubernetes container orchestration and cloud-native infrastructure patterns
  • Proficiency with infrastructure as code tools such as Terraform or CloudFormation
  • Expertise designing and operating CI/CD pipelines and deployment automation systems
  • Deep understanding of observability tooling including Datadog OpenTelemetry or similar platforms
  • Strong knowledge of distributed systems reliability patterns including redundancy failover autoscaling rate limiting and graceful degradation
  • Experience building automation and operational tooling using languages such as Python Go Bash or Ruby
  • Strong understanding of networking fundamentals including DNS load balancing TLS VPNs firewalls and service discovery
  • Hands-on experience with incident response root-cause analysis and production operations in high-availability environments
  • Familiarity with SRE methodologies including SLOs SLIs error budgets capacity planning and operational maturity modeling
  • Experience implementing secure infrastructure and cloud security best practices including IAM secrets management and vulnerability remediation
  • Proven ability to design scalable resilient and maintainable platform systems and APIs
  • Experience supporting distributed microservices architectures and event-driven systems
  • Strong understanding of operational excellence principles including automation-first engineering and toil reduction
  • Experience using AI-assisted engineering tools (e.g. Claude GitHub Copilot) as force multipliers while applying sound operational and engineering judgment
  • Excellent debugging and systems thinking skills across infrastructure networking application and platform layers
What Great Looks Like

A Senior Software Engineer on the Platform team who is thriving at this level demonstrates:


  • Reliable delivery of complex work — consistently ships multi-part solutions on time with low defect rates
  • Low defects in owned areas — proactively monitors and improves the quality of the systems they own; that means incident-free quarters in code paths that move funds and clean reconciliation against vendor reports
  • Measurable mentorship impact — engineers around you write better code because of your reviews and guidance


"Someone we can depend on for the work that matters — especially the work that touches money."

Failure Modes We Screen Against

We actively evaluate candidates for the following anti-patterns during the interview process:


Failure Mode

What It Looks Like

Strong coder weak owner

Ships code but doesn't manage to the task — owns the merge not the outcome; hands off and moves on without monitoring or fixing post-release issues

Solo expert

Hoards knowledge instead of sharing — becomes a single point of failure and blocks team growth

Overconfident designer

Proposes solutions without considering trade-offs — jumps to conclusions resists alternative approaches

Rubber-stamper

Produces AI-generated output without verifying it against the codebase tests or business context

Interview Process

Our 5-round process is designed to evaluate you across all competency areas. AI tools are permitted in technical rounds.


Round

Format

What We Evaluate

1 — Hiring Manager Screen

60 min conversational

Career trajectory mentorship philosophy technical influence examples communication style

2 — Take-Home + PR Discussion

72h take-home + 60 min live

Navigating unfamiliar code ownership and decomposition discipline visible in your PR root-cause judgment AI tool usage

3 — System Design + Artifact Critique

60 min Miro board

Requirements gathering schema/API design trade-off articulation calibrated code-review judgment on a teammate's PR

4 — Team Interview (conditional)

30 min behavioral

Collaboration patterns mentorship behavior negotiation behavior with cross-functional partners

5 — Culture Add

30 min People Team

Organizational values alignment


Round 4 is conditional: it runs when the team needs additional behavioral signal after Rounds 2 and 3 and is otherwise skipped. Your recruiter will tell you whether it's scheduled before your loop is finalized.


The Round 2 (Take-Home + PR Discussion) and Round 3 (System Design) exercises are drawn from real problems so the technical evaluation is grounded in the work you'd actually be doing.


What You’ll Receive
  • Competitive compensation including base salary bonus and equity
  • Employer-sponsored 401(k) with match
  • Comprehensive medical dental and vision coverage
  • Flexible time off and hybrid work environment

The anticipated annual salary range for this role is $175000 - $200000. Actual compensation and title will be commensurate with experience qualifications knowledge and skills.

Skills Required

  • Experience in building production-grade applications and services using Ruby and Ruby on Rails
  • Deep expertise with Linux systems administration and production troubleshooting
  • Experience operating cloud infrastructure at scale particularly within AWS environments
  • Familiarity with infrastructure as code tools such as Terraform or CloudFormation
  • Strong knowledge of distributed systems reliability patterns and engineering velocity
  • Experience implementing secure infrastructure and cloud security best practices

What the Team is Saying

Mike aka "Foss"
Colleen
Grant

Order.co Compensation & Benefits Highlights

  • Leave & Time Off BreadthFlexible or unlimited PTO is offered with a broad mix of paid days including holidays sick time bereavement volunteer time and emergency leave. Materials note recent revamps that increased usage.
  • Healthcare StrengthComprehensive medical dental and vision coverage is provided. Access to Wellhub (Gympass) and Talkspace augments core health benefits.
  • Parental & Family SupportGenerous parental leave is available from day one for birthing and non-birthing parents. Childcare fertility benefits family medical leave and company-sponsored family events are included.

Order.co Insights

Am I A Good Fit?
beta
Expert contributor network
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York NY
120 Employees
Year Founded: 2016

What We Do

Our strength has always been our unique edge: transforming how businesses connect with vendors through our marketplace. We're not just improving workflows - we're redefining how procurement operations accounting and payments come together to drive efficiency and innovation. Every step - requisition approval payment and reconciliation - is curated and automated to make purchasing across all your vendors locations and teams as easy as purchasing for your personal lives. Founded in 2016 and headquartered in New York City Order.co oversees nearly half a billion dollars in annualized spend across hundreds of customers like WeWork SoulCycle and Lume. Order.co has raised $70M+ in funding from industry-leading investors like MIT Stage 2 Capital Rally Ventures 645 Ventures and more. Order.co has been proudly named as a 50 to Watch by Spend Matters and a Best Place to Work by BuiltIn and Inc. Magazine.

Why Work With Us

With our core values as our North star Order.co and its team work tirelessly to foster an inclusive psychologically safe environment where team members are empowered to do their best work. We pride ourselves on solving hard problems in order with humility and most importantly together.

Gallery

Order.co Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Team members at Order.co are empowered to make the best decision for themselves regarding where they work whether from home the office or otherwise!

Typical time on-site: Flexible
Company Office Image
HQNew York NY

Similar Jobs

Order.co

Data Engineer

eCommerce • Fintech • Payments • Software
Remote or Hybrid
United States
120 Employees
175K-200K Annually

Order.co

Security Engineer

eCommerce • Fintech • Payments • Software
Remote or Hybrid
United States
120 Employees
180K-220K Annually

Order.co

Consultant

eCommerce • Fintech • Payments • Software
Remote or Hybrid
United States
120 Employees
80K-100K Annually

Order.co

Business Process Analyst (Contract)

eCommerce • Fintech • Payments • Software
Remote or Hybrid
United States
120 Employees
45-60 Hourly
Apply Now

Date Posted

05/27/2026

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories