Pyspark Developer- REMOTE

NTT DATA Services · Pittsburgh, PA

Company

NTT DATA Services

Location

Pittsburgh, PA

Type

Full Time

Job Description

Req ID: 210029

NTT DATA Services strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.

We are currently seeking a Pyspark Developer- REMOTE to join our team in Pittsburgh, Pennsylvania (US-PA), United States (US).

Experience & Skills:
  • 4+ Years of Experience in Data Lake, Data Analytics & Business Intelligence Solutions and at least 1+ as AWS Data Engineer
  • Full life cycle project implementation experience in AWS using Pyspark/EMR, Athena, S3, Redshift, AWS API Gateway, Lambda, Glue and other managed services
  • Strong experience in building ETL data pipelines using Pyspark on EMR framework
  • Hands on experience in using S3, AWS Glue jobs, S3 Copy, Lambda and API Gateway.
  • Working SQL experience to troubleshoot SQL code. Redshift knowledge is an added advantage.
  • Strong experience in DevOps and CI/CD using Git and Jenkins, experience in cloud native scripting such as CloudFormation and ARM templates
  • Hands-on with system & application log tools like Datadog, CloudWatch, Splunk etc.
  • Experience working with Python, Python ML libraries for data analysis, wrangling and insights generation
  • Experience using Jira for task prioritization and Confluence and other tools for documentation.
  • Experience in Python and common python libraries.
  • Strong analytical experience with database in writing complex queries, query optimization, debugging, user defined functions, views, indexes etc.
  • Experience with source control systems such as Git, Bitbucket, and Jenkins build and continuous integration tools.
  • Strong understanding of AWS Data lake and data bricks.
  • Exposure to Kafka, Redshift, Sage Maker would be added advantage
  • Exposure to data visualization tools like Power BI, Tableau etc.
  • Functional Knowledge in the areas of Sales & Distribution, Material Management, Finance and Production Planning is preferred


Knowledge, Skills, Abilities

  • Full life cycle implementation experience in AWS using Pyspark/EMR, Athena, S3, Redshift, AWS API Gateway, Lambda, Glue and other managed services
  • Experience with agile development methodologies by following DevOps, Data Ops and Dev Sec Ops practices.
  • Manage life cycle of ETL Pipelines and other cloud platform tools, including GitHub, Jenkins, Terraform, Jira, and Confluence.
  • Excellent written, verbal and inter-personal and stakeholder communication skills.
  • Ability to analyze trends associated with huge datasets.
  • Ability to work with cross functional teams from multiple regions/ time zones by effectively leveraging multi-form communication (Email, MS Teams for voice and chat, meetings)
  • Excellent prioritization and problem-solving skills.
  • Action Oriented: Have a sense of urgency, high energy and enthusiasm in managing Systems and Platforms
  • Drives Results: Consistently achieving results, even under tough circumstances.
  • Global Perspective: Takes a broad view when approaching issues; using a global lens.
  • Learn and train other team members
  • Communicates Effectively: Provide timely and consistent updates and recommendations on BI Operational issues and improvements to stakeholders.
  • Drive to meet and exceed BI Operational SLAs for Service Now incidents, Major Incidents, xMatters alerts, Employee Experience Metrics and BI application /process availability metrics.


What will you do?
  • Own and deliver enhancements associated with Data platform solutions.
  • Maintains and Enhances scalable data pipelines and builds out new API integrations to support continuing increases in data volume and complexity.
  • Enhance/Support solutions using Pyspark/EMR, SQL and databases, AWS Athena, S3, Redshift, AWS API Gateway, Lambda, Glue and other Data Engineering technologies.
  • Write Complex Queries and edit them as required for implementing ETL/Data solutions.
  • Measure performance and environment of application with system & application log tools and act to improve accordingly.
  • Implement solutions using AWS and other cloud platform tools, including GitHub, Jenkins, Terraform, Jira, and Confluence.
  • Follow agile development methodologies to deliver solutions and product features by following DevOps, Data Ops and Dev Sec Ops practices.
  • Propose Data load optimizations and continuously implement to improve the performance of the Data loads
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
  • Keep the data separated and secure across through multiple data centers and AWS regions.
  • Be available and participate in on-call schedule to address critical operational incidents and business requests

Basic qualifications

Minimum 1 year of Data Engineering experience using AWS services, Pyspark/EMR

INDHCLSMC

About NTT DATA Services

NTT DATA Services is a global business and IT services provider specializing in digital, cloud and automation across a comprehensive portfolio of consulting, applications, infrastructure and business process services. We are part of the NTT family of companies, a partner to 85 % of the Fortune 100.

NTT DATA Services is an equal opportunity employer and considers all applicants without regarding to race, color, religion, citizenship, national origin, ancestry, age, sex, sexual orientation, gender identity, genetic information, physical or mental disability, veteran or marital status, or any other characteristic protected by law. We are committed to creating a diverse and inclusive environment for all employees. If you need assistance or an accommodation due to a disability, please inform your recruiter so that we may connect you with the appropriate team.

Nearest Major Market: Pittsburgh
Job Segment: Business Intelligence, Database, SQL, Consulting, Programmer, Technology

Date Posted

11/21/2022

Views

5

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8
142,000+ Jobs Tracked
12,400+ Companies
1,930 Categories