Senior Site Reliability Engineer

Workato · Remote

Company

Workato

Location

Remote

Type

Full Time

Job Description

About Workato

Workato is the only integration and automation platform that is as simple as it is powerful — and because it’s built to power the largest enterprises, it is quite powerful. 

Simultaneously, it’s a low-code/no-code platform. This empowers any user (dev/non-dev) to painlessly automate workflows across any apps and databases.

We’re proud to be named a leader by both Forrester and Gartner and trusted by 7,000+ of the world's top brands such as Box, Grab, Slack, and more. But what is most exciting is that this is only the beginning. 

Why join us?

Ultimately, Workato believes in fostering a flexible, trust-oriented culture that empowers everyone to take full ownership of their roles. We are driven by innovation and looking for team players who want to actively build our company. 

But, we also believe in balancing productivity with self-care. That’s why we offer all of our employees a vibrant and dynamic work environment along with a multitude of benefits they can enjoy inside and outside of their work lives. 

If this sounds right up your alley, please submit an application. We look forward to getting to know you!

Also, feel free to check out why:

  • Business Insider named us an “enterprise startup to bet your career on”

  • Forbes’ Cloud 100 recognized us as one of the top 100 private cloud companies in the world

  • Deloitte Tech Fast 500 ranked us as the 17th fastest growing tech company in the Bay Area, and 96th in North America

  • Quartz ranked us the #1 best company for remote workers

Responsibilities

We are looking for a Senior Site Reliability Engineer. In this role, you will be responsible for:

  • Running the production environment to provide the highest levels of uptime, performance, and reliability.

  • Identify toil in the day-to-day operations and automate whatever can be automated

  • Work with development teams to make sure the applications are production-ready, scalable, reliable, and observable from day zero

  • Identify and drive opportunities to improve automation for code deployment, management, and visibility of application services

  • Establish end-to-end monitoring and alerting on all critical components within the platform

  • Participate in the on-call rotation, supporting the platform and production applications

  • Manage end-to-end availability and performance of critical services and build automation

  • Perform root cause analysis on issues, and participate in blameless post-mortems so we can learn from incidents and automate them out of recurrence

  • Independently troubleshoot complex systems and environments including applications, microservices, DNS, and networking components

  • Create load test scenarios and streamline their execution so performance regressions can be caught pre-production

  • Enable developers and product teams to move rapidly with features without sacrificing reliability, availability, and overall performance of our systems

  • Participate in architecture reviews and work cross-functionally with Engineering teams on operational readiness and tactical day-to-day scenarios

  • Work with engineering teams to better address needs and enable more effective and efficient developer throughput

  • Identify performance bottlenecks and triage with Engineering teams to design and implement a secure and performant solution

  • Guide development teams towards security, reliability, and availability best practices during the SDLC

  • Daily and Monthly Responsibilities

  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding

  • Partner with development teams to improve services through rigorous testing and release procedures

  • Participate in system design consulting, platform management, and capacity planning

  • Create sustainable systems and services through automation and uplifts

  • Balance feature development speed and reliability with well-defined service level objectives and service-level indicators to honor SLAs

If you’re looking for a real challenge in terms of mission criticality, multi-geographic region deployments, diversity of managed services, and the chance to work with cutting edge technologies like Kubernetes, Kafka, Serverless, ArgoCD and more, then this might be the position for you!

RequirementsQualifications / Experience / Technical Skills
  • Experience administering Kubernetes-based microservices, ingress controllers, web servers (nginx), and databases (Postgres, MySql, MongoDB; Desirable - Redis, Clickhouse)

  • Strong experience with AWS technologies such as EKS, ELB, RDS, S3/EBS/Glacier and VPC

  • Experience architecting highly scalable, fault tolerant, secure, and available systems within the AWS ecosystem

  • Strong troubleshooting experience in the realm of networking fundamentals, web applications, and DNS

  • Hands-on experience developing automation to streamline development processes

  • Experience working with modern CI/CD tools such as CircleCI, ArgoCD, CodeShip, GitHub Actions, or similar solutions

  • Experience with Infrastructure as Code tools (e.g. Terraform, CloudFormation)

  • Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript.

Soft Skills / Personal Characteristics
  • BS or MS from a top-notch CS program (or equivalent experience)

  • 5+ years professional experience in hands-on engineering roles (DevOps/SRE);

  • 3+ years operating high-traffic production environments in public clouds: AWS, GCP, or Azure

  • Python programming experience in production environments

  • Experience with modern cloud environments: containerization, infrastructure-as-code, devops, CI/CD pipelines and general automation

  • Hands on experience with network security, databases systems and related tools

  • English speaking and writing

Preferred Experience
  • Operating Kubernetes clusters in a compliance regulated environment

  • Experience performing stress-testing, failure analysis, and load testing apps

  • Experience with cloud and infrastructure security regulations & compliance programs: SOC2, ISO27001, HIPAA, GDPR, CCPA

  • Experience with ML Ops: Spark, TensorFlow, GPUs

Apply Now

Date Posted

02/12/2023

Views

8

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Senior Design Manager (Infrastructure) - Canonical

Views in the last 30 days - 0

Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...

View Details

Senior Product Designer - Org & Security - Typeform

Views in the last 30 days - 0

This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...

View Details

Senior Business Analyst - Xpansiv

Views in the last 30 days - 0

Xpansiv promotes its role as an energy market innovator with a global platform for environmental commodities The job posting seeks a Business Analyst ...

View Details

Senior Specialist Senior Accountant Shared Financial Services - Make-A-Wish America

Views in the last 30 days - 0

The text describes Make a Wish Foundations mission to grant childrens wishes and their community efforts It outlines job positions with remotehybrid o...

View Details

Software Engineer Networking Software and Services - xAI

Views in the last 30 days - 0

The text describes xAIs mission to develop AI systems for understanding the universe and advancing human knowledge It outlines a role involving networ...

View Details

Associate Technical Support Engineer - Recharge

Views in the last 30 days - 0

Recharge is a subscription platform for innovative brands offering customer retention solutions They seek Technical Support roles with 247 coverage em...

View Details