Data Engineer - PySpark / Spark

IBM • MX Guadalajara

Company

IBM

Location

MX Guadalajara

Type

Full Time

Job Description

Introduction
In this role you’ll work in one of our IBM Consulting Client Innovation Centers (Delivery Centers) where we deliver deep technical and industry expertise to a wide range of public and private sector clients around the world. Our delivery centers offer our clients locally based skills and technical expertise to drive innovation and adoption of new technology.

Your Role and Responsibilities
Day-to-day troubleshooting of forecasting systems mainly working through data anomalies that cause inaccurate forecasts or prevent forecasts’ generation.
Collaborate with the data science team to enhance existing forecasting systems for the trade floors.
Create dynamic object-oriented methods full stack solutions and integrations to existing code solutions.
Develop individual Python classes methods functions that support the data flow of existing and new projects.
Work on code additions to seamlessly support projects for data flows including logging and support with little to no supervision.
Experience in modifying packages testing and repository instances to support CI/CD.

MXCON24

Required Technical and Professional Expertise
1. PySpark and Spark: Proficiency in PySpark including the Spark DataFrame API and RDD (Resilient Distributed Datasets) programming model. Knowledge of Spark internals data partitioning and optimization techniques is advantageous.
2. Data Manipulation and Analysis: Ability to manipulate and analyze large datasets using PySpark’s DataFrame transformations and actions. This includes filtering aggregating joining and performing complex data transformations.
3. Distributed Computing: Understanding of distributed computing concepts such as parallel processing cluster management and data partitioning. Experience with Spark cluster deployment configuration and optimization is valuable.
4. Data Serialization and Formats: Knowledge of different data serialization formats like JSON Parquet Avro and CSV. Familiarity with handling unstructured data and working with NoSQL databases like Hadoop HBase or Apache Cassandra.
5. Data Pipelines and ETL: Experience in building data pipelines and implementing Extract Transform Load (ETL) processes using PySpark. Understanding of data integration data cleansing and data quality techniques.

Preferred Technical and Professional Expertise
NA

Apply Now

Date Posted

11/28/2024

Views

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews

Positive

Subjectivity Score: 0.8

Similar Jobs

Data Engineer AWS - IBM

Views in the last 30 days - 0

The job posting is for a data analysis and machine learning position at IBM requiring experience in data analysis machine learning and programming lan...

View Details

Data Engineer- Redshift Data Architecture - IBM

Views in the last 30 days - 0

The job description is for a Redshift Data Engineer to work in an IBM Consulting Client Innovation Center The role involves managing and overseeing da...

View Details

Software Development Engineer - Storage Backend - IBM

Views in the last 30 days - 0

IBMs software developer role focuses on IBMs Storage Systems involving new development for the backend and working on customer cases The role requires...

View Details

Netcool Suite Engineer - Guadalajara - IBM

Views in the last 30 days - 0

The text discusses IBM Consultings role in helping clients improve their hybrid cloud and AI journey using IBMs strategic partner ecosystem and techno...

View Details

Data Analyst - Integration - IBM

Views in the last 30 days - 0

The job posting is for a Data Analyst role at IBM requiring handson experience with SQL data movement and processing The role involves troubleshooting...

View Details

Project Manager Cloud Solutions - IBM

Views in the last 30 days - 0

This job posting is for a Project Manager with Telecom experience in IBM Consulting The role involves overseeing the successful delivery of Telecom re...

View Details