Senior Site Reliability Engineer

RXMG · Orange County

Company

RXMG

Location

Orange County

Type

Full Time

Job Description

Senior Site Reliability Engineer

RXMG is a California-based digital advertising company that employs our own state-of-the-art analytical and consumer intelligence platform to match people with the products they need to enrich their financial well-being. 


We seek a Senior Site Reliability Engineer to join our engineering team to help develop an inclusive, innovative, and collaborative team environment.

The Ideal candidate is an experienced Senior Site Reliability Engineer with a strong technical background. A Site Reliability Engineer (SRE) is a professional uniquely positioned at the software engineering and systems operations crossroads. Your role is to develop and implement scalable, reliable, and efficient systems, ensuring that both internal and external services meet the highest standards of uptime and performance.

You will be working 100% remotely and should be extremely comfortable working via Slack, Google Meet, Zoom etc.

Benefits of working with us:

  • Unlimited PTO: Many organizations try this, but we do it successfully.
  • Paid Health Insurance, Dental, and Vision for you & your family
  • Fully remote work: You don't have to come to an office! Our team works over Slack, Google Meet, and Zoom.
  • 401K Plan: Matching 100% of the first 4%
  • Company-provided hardware: We don’t want you to be held back by hardware - we provide the newest Apple hardware (MBP), extra monitors, and peripherals.
  • Employee education programs: Do you want to continue to learn and grow? We will pay for your training, courses, materials, and certifications.
  • Great Company Culture: Monthly Events (Poker, Guest Speakers, etc.), 6 weeks paid parental leave.

What we expect out of every member of our Cloud Support Engineering team:

  • You will be expected to manage various domains within the Enterprise Technology cloud platform, focusing on developing solutions that enhance the developer experience, foster autonomy, and empower other developers to excel in cloud environments.
  • Contribute to automating security best practices and safeguarding customers from threats, while working on scaling, securing, and automating our cloud systems, ensuring a reliable, flexible, and user-friendly cloud environment.
  • You are eager to learn new technologies and excel at your core technical competencies. 
  • Be involved in our stand-ups every morning, participate in team deep dives that describe what we are making, join in on our Lunch & Learn to showcase a piece of technology you want us to adopt, and work with our project managers to stay on track and update our users!
  • Be organized and be able to communicate your objectives to your peers.
  • Positive and supportive team culture of diversity and growth.
  • Ability to participate in our rotating on-call schedule for monitoring outages outside of business hours.

Main Responsibilities:

  • Infrastructure Optimization: Tailoring our infrastructure for peak performance is especially crucial as we transition to the RXP platform.
  • Uptime & Platform Support: The candidate's main focus will be to ensure the uptime and reliability of our internal platform. This includes proactive monitoring, troubleshooting, and timely resolution of any issues to maintain continuous operational functionality.
  • Site Monitoring & Support: The role also requires maintaining the uptime of our company websites. The candidate should be capable of managing site performance, addressing downtime promptly, and implementing strategies to enhance overall site stability.
  • Security: The candidate must also contribute to the security of our systems. This involves implementing basic security measures, responding to security incidents, and collaborating with the security team to uphold the integrity and safety of our digital assets.
  • Incident Management: Develop robust incident response protocols to address and mitigate any issues quickly, maintaining service continuity.
    • Lead and participate in weekend testing (e.g., capacity testing, fail-over, etc).
    • Provide for 24x7x365 on-call technical support for the Engineering and Operations team as needed.
    • Provide technical leadership, support, and operational oversight to sustain resiliency and high availability of critical business operations. 
    • Monitor production, disaster recovery, and certification systems for issues. Troubleshoot and drive resolution of issues. 
    • Analyze and optimize the performance of core platforms. 
    • Investigate software defects. 
    • Assist the Engineering team in resolving build/deployment issues.
    • Analyze application logs (e.g., GCP GKE and AWS EKS logs and various platform logs) to troubleshoot or explain perceived issues. 
    • Execute SQL queries against a database to identify potential performance issues and or create upgrade recommendations.
    • Drive capacity planning decisions for RXMG platforms and systems and support capacity planning needs. 
    • Provide an active voice within Capacity Planning meetings with engineering and technical operations management staff.

Requirements:

  • 4+ years of experience as a Site Reliability Engineer. 
  • Deep understanding of containerized ecosystems 
  • Expert Working knowledge of:
    • Google Cloud Platform (GCP), Amazon Web Services (AWS) components, monitoring tools, and alerting systems.
    • NGINX/Apache configuration and PHP module installation through apt or PECL.
    • Firewalls, including setting up, managing, and understanding their role in network security. 
    • Be adept in managing user and file permissions across different operating systems, ensuring appropriate access rights without compromising security.
    • Proficiency in using '.htaccess' for web server configurations, such as URL redirection and access control, is crucial.
    • Additionally, having a strong understanding of various hashing algorithms is essential, particularly for securing sensitive information and ensuring data integrity.
  • Hosting blameless postmortems to share findings, discover gaps, embrace transparency, and improve reliability across our services
  • Demonstrating Configuration Management to build and maintain consistency across platform components and services.
  • Willing to work Pacific Standard Time as well as off-hours 

Nice to haves:

  • One scripting language under your belt (Bash, PHP, Python, Terraform, etc.)
  • Google Cloud Certifications, AWS certification.
  • Experience with Kubernetes clusters.
  • CI/CD application deployment in a cloud-based infrastructure such as GCP.
  • Experience using UNIX Shell, MySQL, MongoDB, PostgreSQL or other databases.
  • Server administration with SaltStack on Linux systems.
  • Experience with high-performance applications.
  • Good understanding and comfortable using Linux/Unix command line.

About You:

  • Solution-oriented, ability to discern an issue and execute a solution promptly
  • Proactive, proactive, and proactive. 
  • Excellent communication skills.
  • Enjoy mentoring peers.
  • Self-supervised and motivated.

RX Marketing's Tech Stack:

  • Code: PHP Laravel Framework 8+ and Vue/Nuxt, Python
  • Infrastructure and DevOps: Ubuntu Linux, Kubernetes, Docker, Terraform, AWS & Google Cloud, and Sentry.
  • Databases: ElasticSearch, MongoDB, InfluxDB, Redis, MySQL, and Clickhouse.
  • Version Control: GitHub & Gitlab
  • Project Tracking and Roadmaps: JIRA, Monday, and Smart Sheets. AGILE Scrum

AI/ML: OpenAI and SVMs

Apply Now

Date Posted

12/06/2023

Views

4

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8

Similar Jobs

Quality Engineer, RM & Pre-Production - ARC'TERYX

Views in the last 30 days - 0

Arcteryx is seeking a Quality Engineer with 3 years of experience in manufacturing preferably in the apparel industry The role involves developing and...

View Details

Sr RF Engineer - Universal Electronics

Views in the last 30 days - 0

Universal Electronics is hiring a Sr RF Engineer to lead the design and optimization of advanced RF solutions for IoT and smart home products The role...

View Details

Mission Systems Engineer - Maxar Technologies

Views in the last 30 days - 0

Maxar Intelligence is currently hiring for a Mission Systems Engineer in Westminster CO The role involves collaborating with experts to explore remote...

View Details

Lead AIT Systems Engineer - Maxar Technologies

Views in the last 30 days - 0

Maxar Intelligence is currently hiring for a Lead AIT Systems Engineer in Westminster CO The role involves managing a team ensuring performance from c...

View Details

Spacecraft Systems Engineer - Maxar Technologies

Views in the last 30 days - 0

Maxar Intelligence is seeking a Spacecraft System Engineering Team member with a Bachelors degree in engineering physics or a related field and 510 ye...

View Details

Quality Engineer (Internal Assignment / Project Hire) - The Walt Disney Company

Views in the last 30 days - 0

The job posting is for a Quality Engineer position in Worldwide Safety Assurance Disneyland Resort Quality Engineering team The role involves providin...

View Details