(USA) Senior Director I, Software Engineering - Site Reliability Engineering

Walmart Global Tech · South Bay

Company

Walmart Global Tech

Location

South Bay

Type

Full Time

Job Description

Position Summary...
What you'll do...
Sr. Director, Software Engineering (Site Reliability Engineering)
Walmart Global Tech's Site Reliability Engineering team is built with hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to uptime and availability of Walmart's e-commerce, stores, and omni-channel platform. Our goal is to build, scale and guard the systems that delights the customers.
As part of Reliability Engineering & Operations, you'll help to define and execute a unified, reliable, operationally robust set of processes and tools for Walmart Technology & its customers across all channels and geographies.
You will be responsible as a Sr. Director in Reliability Engineering and Operations team to ensure that critical parts of Walmart's business are prepared for known events and to address any contingency. This role sits at the confluence of Command-and-Control Center, and if you enjoy leveraging your natural curiosity and inventive problem-solving skills to understand the needs of our drivers and lead strategic measures to improve driver experience at scale.
What you'll do...
  • Design, write and build tools to improve the reliability, latency, availability, and scalability of Walmart Tech stack.
  • Engender reliability and availability starting with metrics and measurements
  • Enable scaling by providing tools, developing training and/or augmenting processes
  • Build tools/automate to prevent re-occurrence of problem to mission critical products/services.
  • Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.
  • Participate in capacity planning, demand forecasting, software performance analysis and system tuning.
  • Develop a deep understanding of the various services and applications that come together to deliver Walmart e-commerce products
  • Design and architect new tools to monitor and smart alerts that help discover failures/issues in a timely fashion and work with engineers to identify root cause and fix issues
  • Influence, design and create new architectures, standards, and methods for large-scale enterprise systems.
  • Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance
  • Participate in on-call rotation.
  • Secure the system from issues, be they real, perceived, or notional
  • High focus on collecting and inferring metrics
  • Experience with configuration management tools such as Ansible, Saltstack, Chef and Puppet
  • Build and drive the automation systems that maintain system health
  • Eliminate Single Point of failure and test disaster recovery and HA regularly.

You'll sweep us off our feet if you...
  • Have strong Engineering focus with sharp Service Management (Incident, Problem & Change Management), performance and Capacity engineering to bring meaningful outcomes that creates reliable Customer, Merchant or Associate experience.
  • Possess solid knowledge and understanding of reliability engineering and operations in ensuring deployment and design principles are followed to drive quality and consistency across a very diverse technology developer community
  • Engage in high priority business/corporate impacting incidents and effectively run post incident management flows to reduce risk and drive compliance to Operational goals and objectives
  • Collaborate effectively with various functions such as Product, Design, Business, Operations, and other Engineering teams to gain commitments on improvement/remediation initiatives
  • Have a deep understanding of the Scalability, Reliability, Resiliency and Availability and various data sources to employ the AI/ML methods to derive better outcomes for highly available, resilient and scalable systems.
  • Are a transformation agent; constantly striving for excellence in a rapidly evolving environment
  • Are a multiplier: Build a high performing team with strong engineering-driven culture that promotes diversity, innovation, and creative problem solving
  • You possess a global mindset and develop a positive healthy environment while working internationally across teams

You'll make an impact by...
  • Driving Operational Excellence & supporting business objectives across the year
  • Establishing new ways of working to fix fundamentals for operational stability
  • Ensuring reliability and availability of systems 24/7 365 days.
  • Predicting service performance degradation and disruptions and moving to a Preventive state.
  • Partnering with observability and telemetry team and other Engineering teams to improve visibility and drive areas of improvement on the Incident, Problem, availability and resiliency front.
  • Actively engaging in Holiday (Peak Trading) business volume discussions with a view to build scalable and resilient production systems to lower risk exposure

What you'll bring...
  • A recognized Bachelor's / Master's degree in Engineering with 15+ years of experience in Site Reliability Engineering which includes the Service Management (Incident Problem & Change Management), Performance and Capacity Engineering.
  • 5+ years of experience in managing, leading and developing Site Reliability Engineering focused teams with indirect reports around 15.
  • Experience in Retail or Site facing transactional web services, and or, internet-based services environment would be added advantage.
  • Experience in running Site Reliability or DevOps team with KPI targets on MTTD, MTTR, and availability would be an added advantage.
  • You are a highly resilient and responsive team player with high degree of cross functional teaming
  • You are someone who thrives in a fast-paced, dynamic, startup-like environment to drive business results and passionate about Customer delight with every transaction
  • You have a great sense of urgency, is a great communicator (written and verbal), active listener and has experience working in a highly matrixed environment with a global footprint.
  • You should possess high degree of Emotional Quotient to actively listen and respond/mobilize effective plans directed at our Associates.

#LI-PL1
At Walmart, we offer competitive pay as well as performance-based incentive awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable. For information about PTO, see https://one.walmart.com/notices .
Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms. For information about benefits and eligibility, see One.Walmart at https://bit.ly/3iOOb1J .
The annual salary range for this position is $224,000.00-$336,000.00
Additional compensation includes annual or quarterly performance incentives.
Additional compensation for certain positions may also include:
- Regional Pay Zone (RPZ) (based on location)
- Stock equity incentives
Minimum Qualifications...
Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and7 years' experience in software engineering or related area.
Option 2: 9 years' experience in software engineering or related area.
4 years' supervisory experience.
Preferred Qualifications...
Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
Master's degree in Computer Science or related field and 6 years' experience in software engineering
Primary Location...
640 W California Avenue, Sunnyvale, CA 94086-4828, United States of America
Apply Now

Date Posted

01/07/2023

Views

5

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Senior Front-End Software Engineer - Percipient.ai

Views in the last 30 days - 0

Percipientai founded in 2017 is a cuttingedge technology company specializing in Computer Vision Artificial Intelligence and Deep Learning They develo...

View Details

Senior Program Manager, Global Occupational Health & Safety - ServiceNow

Views in the last 30 days - 0

ServiceNow is seeking a Health Safety Program Manager to design implement and lead a comprehensive corporate safety program The role involves develop...

View Details

Senior Developer, Data Engineer - Tarana Wireless, Inc.

Views in the last 30 days - 0

Tarana is seeking a Senior DeveloperData Engineer with 5 years of experience in building largescale data pipelines The role involves designing buildin...

View Details

Technologist, System Design Engineering - Western Digital

Views in the last 30 days - 0

Western Digital is seeking a Technologist with expertise in SSD design hardware design Product Management Memory Systems and system architecture to le...

View Details

Staff Engineer, System Design Verification Engineering - Western Digital

Views in the last 30 days - 0

Western Digital is seeking a validation engineer to define and track test plans characterize and optimize SSDs and lead bug review meetings The ideal ...

View Details

Senior Finance Manager, Central FP&A - Palo Alto Networks

Views in the last 30 days - 0

Palo Alto Networks is seeking a Senior Finance Manager with 10 years of experience in FPA The role involves leading ad hoc projects collaborating with...

View Details