Job Description
• Jacksonville, Florida; Atlanta, Georgia
• Operations
• In-Office
• 10550
Overview
Job Purpose
Assist with day-to-day activities supporting Mortgage Servicing Application services related to production support, releases, and incident management. Build actionable alerts/automation for preventing incidents, detecting performance bottlenecks, and identifying maintenance activities.
Responsibilities
• Build and maintain tools and solutions for our operations platform, ensuring that we meet our customer service standards and reduce errors
• Actively troubleshoot any issues that arise during testing and production
• Update existing processes and design new processes as needed to optimize performance
• Work with the customer to understand their infrastructure automation solution requirements
• Actively participate in or own continuous improvement projects driven by automation
• Work closely with the other team members to improve existing projects
• Provide technical analysis, resolve problems, and propose solutions in a 24/7 production environments
• Participate in on-call rotation
• Employ deep troubleshooting skills to improve the availability, performance, and security of IMT Services.
• Implement automated tests, automated deployments, and operational tools
• Collaborate with Product and Support teams to plan and deploy product releases
• Work with Engineering leadership to build shared services that meet the requirements and need of the platform and application teams
• Ensure services are designed with 24/7 availability and operational readiness and rigor
• Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
• Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
• Contribute to product development / engineering as needed to ensure Quality of Service of Highly Available services
• Identify, evaluate, and execute preventive measures to minimize/avoid impact to the customers experience. Proactive v/s Customer escalated
• Resolution of product/service defects or design changes, infrastructure changes, or operational changes
• Partner with other engineers and analysts and lead by example - contributor more than a delegator
• Develop partnership-oriented relationships with business executives and functional leaders, especially as it relates to operations and technology
• Any other activities as directed by management
Knowledge and Experience
• 3+ years functional experience working as a DevOps Engineer in 24x7 Production support services environments
• Prior experience with infrastructure development, or development and operations
• Strong experience with Microsoft Windows Server, Linux Administration, and AWS
• Experience with scripting languages such as Python or PowerShell
• Experience in architecting an automation framework
• Proficiency in Configuration Management, CI, and automation tools such as – Jenkins, Chef, Puppet, Ansible or similar
• Experience with Agile methods (Scrum/Kanban) to organize project deliverables, to track and to report progress (Jira)
• Experience with git, git repo services (BitBucket, GitHub), and branching strategies
• Experience with open-source technologies and cloud services (AWS/Azure)
• Experience with monitoring and alerting tools (Splunk, BigPanda, PagerDuty)
• Experience with infrastructure as code (Terraform, CloudFormation)
• Experience with automation of business continuity/disaster recovery/application resiliency
• Knowledge of and exposure to container technology and orchestration is a plus.
• Excellent problem-solving and troubleshooting skills
• Process-oriented with great documentation skills (Confluence)
• Experience with data structures/formats such as XML, JSON, YAML, and HCL
• BS in Computer Science, Computer Engineering, Math, or equivalent professional experience
• Fluency with one or more current generation scripting language (Python/Shell/Perl/ PHP/Ruby) AND/OR Java Development and/or .NET
• Excellent troubleshooting skills, utilizing a systematic problem-solving approach
• Demonstrated experience in designing, analysing, and diagnosing large-scale distributed systems + Windows Server and/or Linux systems internals (system libraries, file systems, client-server protocols)
• Experience with elastically scalable, fault tolerance and other cloud architecture patterns
• Experience with Continuous Integration and Continuous Delivery concepts
• Good to have experience in Containerization concepts like Kubernetes
• Must be able to multitask in a fast-paced environment with focus on timeliness, documentation, and communications with peers and business users alike
• Expertise with monitoring, alerting and incident response tools and performing root cause analysis
• Experience with deployment automation tools like UCD and Azure DevOps (ADO)
#LI-RS1
#LI-Onsite