LLM Data Engineer | United States | Fully Remote
Job Description
We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform. The ideal candidate will be well-versed in the latest Large Language Model (LLM) technologies and have a strong background in data engineering, with a focus on Retrieval-Augmented Generation (RAG) and knowledge-base techniques. This role sits in the AI COE within DX Tech & Digital. As a AI/LLM Data Engineer (you will report into the Director, AI Solutions & Development who oversees the AI COE.Â
You will work on highly visible strategic projects, collaborating with cross-functional teamsÂ
to define requirements and deliver high-quality AI solutions.Â
The ideal candidate will have a passion for Generative AI and LLMs, with a proven track record of delivering innovative AI applications.
ResponsibilitiesÂ
• Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processesÂ
• Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platformÂ
• Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text dataÂ
• Benchmark and implement various vector stores, embedding techniques, and retrieval methodsÂ
• Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search)Â
• Implement and maintain auto-tagging systems and data preparation processes for LLMsÂ
• Develop tools for text and image data crawling, cleaning, and refinementÂ
• Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML modelsÂ
• Work with data lake house architectures to optimize data storage and processingÂ
• Integrate and optimize workflows using Snowflake and various vector store technologiesÂ
• Master's degree in Computer Science, Data Science, or a related fieldÂ
• 3-5 years of work experience in data engineering, preferably in AI/ML contextsÂ
• Proficiency in Python, JSON, HTTP, and related toolsÂ
• Strong understanding of LLM architectures, training processes, and data requirementsÂ
• Experience with RAG systems, knowledge base construction, and vector databasesÂ
• Familiarity with embedding techniques, similarity search algorithms, and information retrieval conceptsÂ
• Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated)Â
• Knowledge of data crawling techniques and associated ethical considerationsÂ
• Strong problem-solving skills and ability to work in a fast-paced, innovative environmentÂ
• Familiarity with Snowflake and its integration in AI/ML pipelinesÂ
• Experience with various vector store technologies and their applications in AIÂ
• Understanding of data lakehouse concepts and architecturesÂ
• Excellent communication, collaboration, and problem-solving skills.Â
• Ability to translate business needs into technical solutions.Â
• Passion for innovation and a commitment to ethical AI development.Â
• Experience building LLMs pipeline using framework like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions.
• Familiar with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies.Â
Preferred Skills
- Experience with popular LLM/ RAG frameworks
- Familiarity with distributed computing platforms (e.g., Apache Spark, Dask)Â
- Knowledge of data versioning and experiment tracking toolsÂ
- Experience with cloud platforms (AWS, GCP, or Azure) for large-scale data processingÂ
- Understanding of data privacy and security best practicesÂ
- Practical experience implementing data lakehouse solutionsÂ
- Proficiency in optimizing queries and data processes in Snowflake or Databricks
- Hands-on experience with different vector store technologies
- US employees benefit package.
Date Posted
11/30/2024
Views
0
Similar Jobs
Software Architecture Engineering and Cloud Computing Engineer - The Aerospace Corporation
Views in the last 30 days - 0
The Aerospace Corporation is seeking a Senior Project Engineer with expertise in software architecture engineering and cloud computing The role involv...
View DetailsSales Development Representative - UK (Remote) - Dscout
Views in the last 30 days - 0
Dscout is a company that specializes in experience research solutions helping innovative companies like Salesforce Sonos Groupon and Best Buy to build...
View DetailsSenior Data Analyst - Customer Experience - WISE
Views in the last 30 days - 0
Wise is a global technology company aiming to revolutionize international money transfers by offering minimal fees maximum ease and full speed They ar...
View DetailsLead Data Analyst - Mitigation - WISE
Views in the last 30 days - 0
Wise is a global technology company seeking an Operations Analyst with 4 years of experience in analytics particularly in operational team analytics T...
View DetailsLead Technical Support Engineer - HERE Technologies
Views in the last 30 days - 0
This role Senior Technical Support Engineer at HERE Technologies involves supporting a diverse portfolio of products and services acting as a technica...
View DetailsPrincipal / Lead Software Engineer- RUST (Algorithmic and Mathematics) - m/w/d - HERE Technologies
Views in the last 30 days - 0
HERE Technologies is seeking a Principal Software Engineer to lead the development of extended services for their VRP solver Tour Planning The role in...
View Details