Web Crawling & Indexing Engineer (Paris/London)
Company
Mistral AI
Location
Other US Location
Type
Full Time
Job Description
About MistralÂ
- At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world.
- Our mission is to make AI ubiquitous and open.Â
- We are creative, low-ego, team-spirited, and have been passionate about AI for years.
- We hire people that foster in competitive environments, because they find them more fun to work in.
- We hire passionate women and men from all over the world.
- Our teams are distributed between France, UK and USAÂ
Role SummaryÂ
- We are seeking a skilled and motivated Web Crawling and Data Indexing Engineer to join our dynamic engineering team.
- The ideal candidate will have a strong background in web scraping, data extraction, and indexing, with a focus on leveraging advanced tools and technologies to gather and process large-scale data from various web sources.
- The role is based in Paris or LondonÂ
Key ResponsibilitiesÂ
- Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites.
- Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes.
- Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives.
- Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction.
- Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks.
- Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process.
- Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges.
Qualifications & profileÂ
- Bachelor’s or master’s degree in computer science, information systems, or information technology
- Strong understanding of web technologies, data structures, and algorithms.
- They should have knowledge of database management systems and data warehousing.
- Programming Languages: Proficiency in programming languages such as Python, Java, or C++ is essential.Â
- Masterings of Web Technologies: Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites.
- Knowledge of HTTP and HTTPS protocols
- A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary
- Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data.
- Understanding distributed systems and technologies like Hadoop or Spark Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup
- Understanding how search engines work and how to optimize web crawling.
- Experience in Machine Learning to improve the efficiency and accuracy of web crawling
- Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data.Â
BenefitsÂ
- Daily lunch vouchersÂ
- Contribution to a Gympass subscriptionÂ
- Monthly contribution to a mobility passÂ
- Full health insurance for you and your familyÂ
- Generous parental leave policyÂ
Date Posted
08/11/2024
Views
0
Similar Jobs
Software Architecture Engineering and Cloud Computing Engineer - The Aerospace Corporation
Views in the last 30 days - 0
The Aerospace Corporation is seeking a Senior Project Engineer with expertise in software architecture engineering and cloud computing The role involv...
View DetailsLead Technical Support Engineer - HERE Technologies
Views in the last 30 days - 0
This role Senior Technical Support Engineer at HERE Technologies involves supporting a diverse portfolio of products and services acting as a technica...
View DetailsPrincipal / Lead Software Engineer- RUST (Algorithmic and Mathematics) - m/w/d - HERE Technologies
Views in the last 30 days - 0
HERE Technologies is seeking a Principal Software Engineer to lead the development of extended services for their VRP solver Tour Planning The role in...
View DetailsSenior Software Engineer (Scala/Java) - HERE Technologies
Views in the last 30 days - 0
HERE Technologies is seeking an experienced backend engineer with strong Java or Scala skills to join the Map Processing Pipelines team The role invol...
View DetailsSoftware Engineering Manager - Cargill
Views in the last 30 days - 0
The Software Engineering Manager job involves setting goals for a team responsible for software project development and delivery ensuring quality stan...
View DetailsSales Development Representative - UK (Remote) - Dscout
Views in the last 30 days - 0
Dscout is a company that specializes in experience research solutions helping innovative companies like Salesforce Sonos Groupon and Best Buy to build...
View Details