Staff Site Reliability Engineer

Globality, Inc. · Peninsula

Company

Globality, Inc.

Location

Peninsula

Type

Full Time

Job Description

Joel Hyatt and Lior Delgo founded Globality with a vision to create prosperous and healthy economies, companies, communities, and individuals. In this new era of the Autonomous Enterprise, Globality is on a mission to unleash productivity and purpose through autonomous sourcing and procurement. Leveraging our sophisticated AI, Globality empowers leading global companies to automate their purchasing processes and optimize how they spend their money – improving their profits, advancing their objectives, and extending their impact. Our customers love Globality. You will too.

The foundation of our culture is based off of our values: Trust, Collaboration and Innovation. Our goal is to create an environment where each person feels valued and experiences a natural sense of belonging. Not only have we been recognized for our transformational technology, but we’re also humbled to be recognized for the workplace culture we’ve built here. So we encourage you to bring your work and your life experiences. Bring your problem-solving skills, sure, but don’t forget your joy and passion. Bring the talent that makes you stand out but also bring the communities that ground and support you. We are a greater, more resilient world through the power of us.

Role Summary:

Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Globality production systems running smoothly as a unit within the broader Production Engineering team.

SREs are a blend of pragmatic technical operators and tooling craftspeople that apply sound engineering principles, operational discipline, and mature automation to our production environment and the Globality codebase. We are a DevOps-driven culture with a particular team interest in improving our product stack insight, automation tooling, and scalability.

Globality is a unique product stack which brings unique challenges – it’s a ground-breaking technology utilizing a world-class, AI-powered microservices platform that revolutionizes how businesses buy and sell services. The experience of our team feeds back into other engineering groups within the company, perpetuating product improvement. We are an open, inclusive, and diverse organization and our employees are at the heart of the great products we create.

As an SRE you will:
  • Be part of the team responsible for managing an enterprise-grade AI-driven data and messaging platform.
  • Protect the health of the Production environment.
  • Respond to Globality availability incidents and provide support for other customer-impacting incidents.
  • Work to prevent incidents from even happening.
  • Run our infrastructure with tools like Spacelift, Harness, and Kubernetes.
  • Help make monitoring and alerting alert on precursor symptoms and not on
  • Protect the health of the Production environment.
  • Document every action so your findings turn into repeatable actions…and then into automation.
  • Work with the QA/TestEng team to make the deployment process as efficient and boring as possible.
  • Design, build, and maintain core production infrastructure pieces.
  • Work with devs to implement the baseline technologies, policies, and practices to build a high-velocity, high-security, strong compliance platform that allows Globality scaling to support exponential growth.
  • Keep a keen eye on security issues in every project you work on, contributing to improving security in the systems that were already in place.
  • Debug production issues across services and levels of the stack.
  • Help plan the growth of Globality's infrastructure.
  • Establish strong relationships with other teams in order to positively influence them in their pursuit of automation and toil reduction, and to keep the rest of our team apprised of upcoming initiatives.
  • Protect the health of the Production environment.
You may be a fit to this role if you:
  • Think deeply about edge cases, points of failure, failure modes, and systemic behaviors.
  • Embrace a DevOps philosophy.
  • Know your way around Linux and the command line.
  • Feel comfortable working toward delivering an end-to-end seamless CI/CD pipeline, with a goal of delivering code into production as swiftly as possible, while working with the QA/TestEng and Infrastructure teams to ensure that code is production worthy.
  • Have strong programming skills – Python, Go, Rust, Ruby (etc.)
  • Maintain “production grade” adherence to best practices for the lowliest tools and scripts.
  • Embrace collaboration and are comfortable with communicating asynchronously.
  • Are driven to document, document, document so you don't need to learn (or teach) the same thing twice.
  • Have an enthusiastic, driven, go-for-it attitude. Are compelled to fix broken things and improve less-than-ideal things.
  • Have experience with Drone.io, Kubernetes, Ansible, or similar technologies.
  • Have experience using the advanced tools of AWS, GCP, Azure, or other cloud providers.
Projects you could work on:
  • Improve production infrastructure automation.
  • Improve Metrics collection scope / improve our metrics-driven Monitoring story.
  • Work with the QA / Test Engineering team to fully pipeline our internal tools.
  • Work with Test Engineering on scale testing initiatives.
  • Reduce the noise-to-signal ratio in our alerting.
  • Develop a relationship with a product group, define their SLOs, help analyze our metrics data on those SLOs and improve their reliability.

Leveling of Site Reliability Engineers at GlobalityAreas of expertise/contribution for up-leveling:Technical:
  • Use Ansible to efficiently manage our infrastructure
  • Further our "Infrastructure as Code" mission using Terraform and CI/CD-focused automation
  • Administration of a variety of high-availability clusters.
  • Firm grasp of Metrics and Monitoring systems and Grafana visualization.
  • Implementation, and delivery of well-targeted alerting with Slack/PagerDuty integrations.
  • Logging infrastructure (we use Loki / fluentbit)
  • Backend storage management and scaling
  • Disaster Recovery and High Availability strategy
  • Script / tool authoring
  • Knowledge of Globality product stack and service interoperations
  • Contributing to code in Globality
Execution:
  • Team organization and planning
  • Issue, Epic, OKR/KPI leadership and completion
Collaboration and Communication:
  • Creating blog posts / confluence articles
  • Completing Root Cause Analysis (RCA) investigations
  • Contributions to handbook, runbooks, general documentation
  • Leading and contributing to designs for issues, epics, KPIs
  • Improving team practices in handoffs of work and incidents
Influence and Maturity
  • Involvement in hiring process – developing/reviewing questionnaires, involved in interviews, qualifying candidates
  • Knowledge sharing, mentoring
  • Accountability, self-awareness, handling conflict in the team and receiving feedback
  • Maintaining good relationships with other engineering teams in Globality that help improve the product
Levels for Site Reliability EngineerSenior Site Reliability Engineer I/IITechnical:
  1. Deep knowledge in 2+ areas of expertise and general knowledge of all areas of expertise. Capable of mentoring SRE-Is in all areas and other SREs in their area of deep knowledge.
  2. Are able to design and build tools to improve the management of the production environment and/or infrastructure
  3. Are able to contribute small improvement PRs to the Globality codebase to resolve issues
Execution:
  1. Identifies significant projects that result in substantial cost savings or revenue
  2. Identifies changes for the product architecture from the reliability, performance, and availability perspective with a data-driven approach.
  3. Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage to make Globality cheaper to run.
  4. Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents.
  5. Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.
Collaboration and Communication:
  1. Know a domain really well and radiate that knowledge through recorded demos, discussions in ProdEng design meetings, or Incident/Root-Cause Reviews
  2. Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again.
Influence and Maturity:
  1. Set an example for team of SREs with positive and inclusive leadership and discussion on work.
  2. Contributes to the hiring process by being part of the interview team to qualify SRE candidates
  3. Show ownership of a major part of the infrastructure.
  4. Trusted to de-escalate conflicts inside/with the team
Staff Site Reliability Engineer

Are Senior SREs who meet the following criteria:

Technical:
  1. Able to conceptualize, design, and create innovative solutions that push Globality's technical abilities ahead of the curve
  2. Deep knowledge of Globality and 4 areas of expertise. Knowledge of each area of expertise enough to mentor and guide other team members in those areas.
  3. Contributes to Globality codebase to resolve issues and add new functionality
  4. Significant modification to open source or major from-scratch tooling to deliver best-of-breed implementation of our production ecosystem.
Execution:
  1. Strives for automation either by coding it or by leading and influencing developers to build systems that are easy to run in production.
  2. Measure the risk of introduced features to plan ahead and improve the infrastructure.
  3. Proposes and drives architectural changes that affect the whole company to solve scaling and performance problems
  4. Leads significant project work for KPI level goals for the team
Communication and Collaboration:
  1. Works with engineers across the whole company, influencing design to create features that will work well multi-region/multi-cloud, massive-scaling implementations
  2. Runs RCAs and epic level planning meetings to get meaningful work scheduled into the plan
Influence and Maturity:
  1. Writes in-depth documentation that shares knowledge and radiates Globality technical strengths
  2. Has a high level of self-awareness
  3. Trusted to de-escalate conflicts inside and outside the team
  4. Routinely has an impact on the broader Engineering organization
  5. Helps to develop other team members into more senior levels and leaders in the team

 

Senior Staff Site Reliability Engineer

Are Staff SREs who meet the following criteria:

Technical:
  1. Able to lay out vision-level direction of tooling and solutions that push Globality's technical abilities ahead of the curve
  2. Deep knowledge of the Globality product stack and 90%+ of areas of expertise.
  3. Knowledge of each area of expertise enough to mentor and guide other team members in all areas.
  4. Contributes to Globality codebase to resolve issues and add new functionality
  5. Mentorship and coordination of major modification to open source solutions/tools or major from-scratch tooling to deliver best-of-breed implementation of our production ecosystem.
Execution:
  1. Strives for automation either by coding it or by leading and influencing developers to build systems that are easy to run in production.
  2. Measure the risk of introduced features to plan ahead and improve the infrastructure.
  3. Proposes and drives architectural changes that affect the whole company to solve scaling and performance problems
  4. Leads significant project work for KPI level goals for the team
Communication and Collaboration:
  1. Works with engineers across the whole company, influencing design to create features that will work well multi-region/multi-cloud, massive-scaling implementations
  2. Runs RCAs and epic level planning meetings to get meaningful work scheduled into the plan
Influence and Maturity:
  1. Writes in-depth documentation that shares knowledge and radiates Globality technical strengths
  2. Has a high level of self-awareness
  3. Trusted to de-escalate conflicts inside and outside the team
  4. Routinely has an impact on the broader Engineering organization
  5. Helps to develop other team members into more senior levels and leaders in the team

The anticipated annual pay scale for this position is $140,000-$260,000. Actual salaries will vary depending on factors including but not limited to location, experience, and performance. The range listed is just one component of Globality's total compensation package for employees. This information is provided per the California Equal Pay Act. We are an equal opportunity employer and a participant in the E-Verify program. We believe diversity makes teams better and that discrimination based on race, gender, or anything else is self-defeating.

Apply Now

Date Posted

08/05/2023

Views

6

Back to Job Listings Add To Job List Company Profile View Company Reviews
Neutral
Subjectivity Score: 0.7

Similar Jobs

Manager, Site Reliability Engineering - Zoox

Views in the last 30 days - 0

Zoox is seeking a Site Reliability Engineering Manager to lead and grow the team ensuring the reliability scalability and performance of the companys ...

View Details

Senior Staff Simulation Engineer - Wisk

Views in the last 30 days - 0

Wisk Aero is seeking a Senior Staff Simulation Engineer to join their Flight Physics Vehicle Modeling FPVM team The role involves designing implementi...

View Details

Staff Data Engineer - AiDash

Views in the last 30 days - 0

AiDASH is a Series C climate tech startup offering a fullstack SaaS solution for making critical infrastructure industries climateresilient and sustai...

View Details

Senior Simulation Software Integration Engineer - Wisk

Views in the last 30 days - 0

Wisk is seeking a Senior Simulation Software Integration Engineer to lead the integration of highfidelity simulation models develop advanced test fram...

View Details

Support Engineer - Pricefx

Views in the last 30 days - 0

Pricefx a leading SaaS Pricing Price Optimization Management provider is seeking a Tier 34 Support Engineer The role involves providing technical sup...

View Details

Avionics Mechanical Engineer (Harness) - Reliable Robotics Corporation

Views in the last 30 days - 0

Reliable Robotics is seeking an Avionics Mechanical Engineer to join their Vehicle Design and Integration team in Mountain View California The role in...

View Details