Staff Site Reliability Engineer

VGS · Remote

Company

VGS

Location

Remote

Type

Full Time

Job Description

VGS is the world's leader in payment tokenization. Large banks, aspiring fintechs, and growing merchants embed our universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks, open banking, card issuance, omnichannel loyalty, PCI compliance, payment orchestration, and more. We empower our clients and partners by tokenizing sensitive payment data, limiting compliance scope, and consolidating payments to unlock revenue and business opportunities.Β 


VGS provides processor-agnostic tokenization solutions via secure universal token vaults, iframes, mobile SDKs, tokenization proxies, APIs, and data orchestration tooling to support payment acceptance, card issuance, PII and bank account tokenization, and other payments value-added services. Some of the use cases we enable include multi-processor Network Tokenization, Account Updater, payment orchestration, secure settlement file processing, 3DS, and Risk provider connectivity.


We are looking for a well-versed, passionate Staff Engineer, who wants to play a key role in site reliability engineering and cloud operations of our global cloud infrastructure.


We’re seeking individuals with an equal flair for creative problem solving, enthusiasm for new technologies, and desire to contribute to our product. You will likely be successful in this role if you identify with the following traits: attention to detail, problem solver, customer oriented, versatile, resilient and confident. If all of this sounds interesting to you, we’d love to hear from you.

What you will be doing at VGS

  • Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
  • Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
  • Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.
  • Performance tuning and capacity planning: Identify bottlenecks and optimization opportunities and execute scaling strategies to ensure efficient handling of traffic spikes and growing workloads.
  • Collaborate with cross-functional teams: Work closely with software engineers, product teams, and DevOps to enhance system reliability and delivery pipelines.
  • Improve operational processes: Champion continuous improvement initiatives in deployment, scaling, and performance testing, while advocating for the adoption of SRE best practices across the organization.
  • Mentorship and leadership: Provide technical mentorship to junior engineers, contribute to strategic decisions around infrastructure, and ensure best practices are implemented at scale.
  • Be proactive and innovative: we rely on your feedback to build a world-class product.
  • Be a part of a team that believes in the core values of transparency, collaboration, grit, and humility; in going above and beyond what is required in order to do the right thing for our customers and the company; and in having fun while doing all this!

What we are looking for from you (Requirements)

  • Proven experience in SRE or DevOps roles at a staff level, with a track record of managing production systems in complex, large-scale environments.
  • Strong proficiency in AWS including infrastructure-as-code (Terraform, CloudFormation, etc.).
  • Solid understanding of cloud-native architecture, Linux Systems, microservices, Infrastructure-as-code (Terraform, CloudFormation, CDK), CI/CD (CircleCI, GitHub Actions, Argo), GitOps, Authentication and Authorization, APIs and API Gateway, Docker, Kubernetes (EKS), Kafka (MSK), Java, Spring Framework, Python, and AWS services.
  • Expertise in monitoring and observability tools like Prometheus, Grafana, Honeycomb, Datadog, Open Telemetry, New Relic, or similar tools to measure system health and performance.
  • Programming and scripting experience in languages such as Python, Go, Bash, or other relevant languages used in automating infrastructure.
  • Solid understanding of networking, security, and load balancing in cloud-native environments.
  • Experience with configuration management tools like Ansible, Chef, or Puppet is a plus.
  • Strong communication and collaboration skills, with the ability to lead cross-functional initiatives and mentor junior team members.
  • Experience with incident management and disaster recovery best practices.
  • Strong written and verbal communication skills.

What you get from us...


β€’ Flexible work hours and flexible PTO

β€’ Competitive health benefits

β€’ VGS stock options

β€’ 401k plan, with employer matching 4% and immediate vesting (available only for US employees)

β€’ Life & disability insurance

β€’ Pre-tax flexible spending accounts, dependent and healthcare FSA (available only for US employees)

β€’ Global parental leave program

β€’ Employee Assistance Program

β€’ Home Internet reimbursement

β€’ New hire home office set up allowance

β€’ Professional learning reimbursement



At VGS we have a remote-first philosophy, believing employees should have a comfortable work-life balance.Β We value great talent. Striving to provide the best experience for our candidates, VGS appreciates your candidacy.


We consider applicants without regard to race, color, national origin, sex, age, religion, sexual orientation, gender identity, veteran status, marital status, physical or mental disability, or other protected classes under all local, state, and federal laws and ordinances (AA/EOE/W/M/Vet/Disabled).


Qualified applicants with arrest and conviction records will be considered for the position in accordance with the San Francisco Fair Chance Ordinance.


VGS will not be able to support any kind of employment sponsorships at this time.

Apply Now

Date Posted

10/18/2024

Views

0

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Software Engineer Networking Software and Services - xAI

Views in the last 30 days - 0

The text describes xAIs mission to develop AI systems for understanding the universe and advancing human knowledge It outlines a role involving networ...

View Details

Associate Technical Support Engineer - Recharge

Views in the last 30 days - 0

Recharge is a subscription platform for innovative brands offering customer retention solutions They seek Technical Support roles with 247 coverage em...

View Details

Full Stack Product Engineer - Jiga

Views in the last 30 days - 0

Jiga is a remotefriendly company focused on empowering engineers with trust autonomy and flexibility They emphasize simplicity ownership and impactful...

View Details

Senior Design Manager (Infrastructure) - Canonical

Views in the last 30 days - 0

Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...

View Details

Senior Product Designer - Org & Security - Typeform

Views in the last 30 days - 0

This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...

View Details

Executive Director Patient Advocacy - Kyverna Therapeutics

Views in the last 30 days - 0

Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...

View Details