Tech Lead / Manager - AI Evaluation Science
Company
Diligent Robotics
Location
Remote
Type
Full Time
Job Description
What we’re doing isn’t easy but nothing worth doing ever is.
We envision a future powered by robots that work seamlessly with human teams. We build artificial intelligence that enables service robots to collaborate with people and adapt to dynamic human environments. Join our mission-driven venture-backed team as we build out current and future generations of humanoid robots.
The Tech Lead / Manager AI Evaluation Science will lead the team responsible for advancing the state of the art to measure the performance of physical AI systems and measuring and validating how our AI systems perform in the real world. This group defines requirements builds metrics and creates rigorous evaluation pipelines. This work ensures that our robots meet high bars for safety reliability task performance and human trust. You’ll own simulation testing labeling and interpretability frameworks making sure our robots not only work but work safely repeatably and explainably.
This is a hands-on leadership role in a startup environment. You’ll be both strategist and player-coach: defining evaluation standards coding tools and models and building the team that ensures our embodied AI is ready for deployment.
Responsibilities
-
Lead the AI Evaluation Science team owning evaluation strategy for robot perception planning control and multimodal models.
-
Define metrics and benchmarks for AI performance across safety reliability user experience and robustness.
-
Develop and maintain large-scale simulation environments to test robot behaviors under diverse real-world conditions (edge cases adversarial scenarios rare failures).
-
Design evaluation frameworks that cover offline experiments simulation and live deployments.
-
Build scalable pipelines for test coverage automated evaluation and regression tracking.
-
Oversee labeling and data curation pipelines to generate high-quality ground truth for training and validation.
-
Drive interpretability and explainability in embodied AI models—ensuring failures are measurable diagnosable and improvable.
-
Collaborate closely with AI/Robotics engineering teams to define product requirements set acceptance thresholds and close the loop between evaluation and development.
-
Actively mentor engineers and scientists while contributing hands-on to code experiments and metrics design.
Skills and Experience
-
MS or PhD in Computer Science Robotics ML EE or related field along with 8+ years of AI/ML experience.
-
Proven leadership experience: built and managed technical teams in AI simulation or robotics evaluation.
-
Hands-on expertise building and evaluating large multimodal ML models (vision language action).
-
Strong background in defining and operationalizing metrics for AI/robotics systems (safety robustness reliability).
-
Demonstrated success in designing end-to-end evaluation pipelines: from data labeling and test definition to automated reporting and regression tracking.
-
Experience in evaluation benchmarking or safety in robotics AVs or similar domains.
-
Experience with simulation platforms for robotics or AVs
-
Technical depth in ML interpretability error analysis and data-driven model improvement.
-
Ability to operate in a startup context: strategic but hands-on in code and experimentation.
-
Excellent communication and cross-functional alignment skills—able to articulate risks metrics and trade-offs to executives engineers and non-technical stakeholders.
Date Posted
11/18/2025
Views
0
Similar Jobs
Implementation Manager - Welkin Health, Inc.
Views in the last 30 days - 0
The role involves collaborating with teams to support customer implementations and ensure product success Welkin offers competitive benefits and a mis...
View DetailsAnalytics Manager - LiveOps Automation Team - Scopely
Views in the last 30 days - 0
Scopely is seeking an Analytics Manager to lead data science initiatives in LiveOps Automation emphasizing machine learning player engagement and cros...
View DetailsPayment Operations Associate - HitPay
Views in the last 30 days - 0
This job posting describes a 1year contract role for a Payment Operations Associate at HitPay a payments infrastructure platform in APAC Responsibilit...
View DetailsAccount Executive - Corporate - HubSpot
Views in the last 30 days - 0
HubSpot is evolving into a platform to support customer teams emphasizing growth and integration Theyre hiring for Corporate Account Executives with c...
View DetailsSenior Machine Learning Engineer - Automation Platform - Airbnb
Views in the last 30 days - 0
This job posting highlights a senior machine learning engineer role at Airbnb focused on enhancing conversational AI platforms to improve customer ser...
View DetailsStaff Backend Software Engineer - Databases - Loki Ingest - Grafana Labs
Views in the last 30 days - 0
This job posting describes a remote software engineer role focusing on databases and observability platforms It outlines responsibilities requirements...
View Details