Site Reliability Engineer
Job Description
Radiant Digital delivers technology consulting and business solutions for commercial and government clients.
Our flexible delivery model allows us to provide end-to-end solution delivery, single project execution, and, or strategic resources.
CMMI Maturity Level III and ISO 9001 – 2015 certified.
Responsibilities:Note: Candidate is LEGALLY ABLE TO SUPPORT SERVERS THAT HOST GOVERNMENT AND LEGAL ENTITIES.
Description:-
Five or more years of experience as a Site Reliability Engineer, full stack Linux Systems/Application Support Engineer
• Identify opportunities to improve architecture/engineering practices
• Develop new processes to prevent problem recurrence and automated recoveries.
• Enhance SLO trending and centralized reporting (ex. Grafana dashboard integration)
• Identify opportunities to improve architecture/engineering practices
• Mentor staff to replace manual processes with automation
• Collaborate across all level of the organization to drive the SRE model
• Advanced experience in supporting enterprise container based platforms
• Strong Systems & Network Architecture experience
• Good knowledge with Tomcat, MySQL/Percona support and SQL queries, RHEL, RabbitMQ, Elasticsearch, nginx, haproxy
• Understanding of HA design, cross-site replication, local and global load balancers, etc
• Experience with Security Hardening & Vulnerability/Compliance, OS patching
• Strong knowledge of performance monitoring, metrics, capacity planning, and management
• Hands on Scripting & Programming - REACT, Java, JavaScript, Python, bash, Ansible, YAML, etc.
• Understanding of data parsing and regex syntax
• Experience with application onboarding - capturing requirements, understanding data sources, application relationships, manage meetings, training, etc
• Familiarity with Splunk, HP OMi/Infrastructure agents, APM/New Relic, Oracle OEM, Catchpoint, syslog events, SNMP events, Zabbix, ServiceNow, etc
• Strong skills in creating documentation - engineering runbooks, support procedures, user onboarding and support documentation
• Familiarity with Confluence and JIRA
• Mentor staff to replace manual processes with automation
• Collaborate across all levels of the organization to drive the SRE model
• Familiarity with supporting enterprise container based platforms
• Data ingestion & enrichment from various sources, webhooks, and REST APIs with JSON/XML payloads
• Strong knowledge of Unix/Linux based systems, and experience troubleshooting applications running on these systems
• Experience with software design lifecycle, including testing, implementation, and delivery
• Understanding of CMDB and asset relationships, topology maps, and alert enrichment
• Develop new processes to prevent problem recurrence and automated recoveries
• Strong data analytics and centralized reporting (ex. Grafana dashboard integration)
• Experience in cloud technologies such as architecting, developing or maintaining cloud solutions in public cloud environments (AWS/OCI/GCP)
• Data ingestion & enrichments – Webhooks, REST API design, JSON, XML, SMTP
• CI/CD - Deployment pipeline experience (Jenkins, Ansible)
• Devops container/orchestration tools (Kubernetes, Docker, Puppet, etc)
• Good knowledge of Python, bash or similar scripting languages
• Experience with Configuration Management systems
• Knowledge of Unix/Linux based systems, and experience troubleshooting applications running on these systems
• Experience with software lifecycle including design, implementation, and delivery
• Expertise in designing, analyzing and troubleshooting large-scale distributed systems
• Ability to apply a systematic approach to solve problems with a sense of ownership and focus
• Effective communication skills with the ability to articulate technical details to different audience
AIOPS Requirements:
Installation, Infra & Config:
• Linux Systems Administration and Operations experience.
• Network Administration experience.
• JavaScript experience.
• Familiarity with the Moogsoft installation procedures.
Integrations & Dev
• Familiarity with WebHooks, REST API, JSON, XML, SMTP.
• Development experience with a popular scripting language (Python) and Unix Shell Scripting.
• Familiarity with SQL Query
• Proficient in Jenkins & Ansible
• Proficient in Grafana reporting tools.
Clustering & Workflows
• Familiarity with Operations (SRE) workflows, responsibilities and organizational structures.
• Familiarity with predetermined and dynamic correlation, entropy, anomaly detection concepts.
• Strong SQL/PERCONA DB experience.
• Experienced communicator and collaborator.
Platform Monitoring
• Systems Administration and Operations experience.
• Network Administration experience.
• Development experience with a popular scripting language (Python, GO, Ruby), JavaScript and Unix Shell Scripting
• Familiarity with Moogsoft components and data flows.
• Understanding of monitoring and metrics concepts. (Volume, Performance, Capacity)
Explore More
Date Posted
05/05/2023
Views
9
Similar Jobs
Software Architecture Engineering and Cloud Computing Engineer - The Aerospace Corporation
Views in the last 30 days - 0
The Aerospace Corporation is seeking a Senior Project Engineer with expertise in software architecture engineering and cloud computing The role involv...
View DetailsLead Technical Support Engineer - HERE Technologies
Views in the last 30 days - 0
This role Senior Technical Support Engineer at HERE Technologies involves supporting a diverse portfolio of products and services acting as a technica...
View DetailsPrincipal / Lead Software Engineer- RUST (Algorithmic and Mathematics) - m/w/d - HERE Technologies
Views in the last 30 days - 0
HERE Technologies is seeking a Principal Software Engineer to lead the development of extended services for their VRP solver Tour Planning The role in...
View DetailsSenior Software Engineer (Scala/Java) - HERE Technologies
Views in the last 30 days - 0
HERE Technologies is seeking an experienced backend engineer with strong Java or Scala skills to join the Map Processing Pipelines team The role invol...
View DetailsSoftware Engineering Manager - Cargill
Views in the last 30 days - 0
The Software Engineering Manager job involves setting goals for a team responsible for software project development and delivery ensuring quality stan...
View DetailsSales Development Representative - UK (Remote) - Dscout
Views in the last 30 days - 0
Dscout is a company that specializes in experience research solutions helping innovative companies like Salesforce Sonos Groupon and Best Buy to build...
View Details