Tech Lead / Manager - AI Evaluation Science

Diligent Robotics • Remote

Company

Diligent Robotics

Location

Remote

Type

Full Time

Job Description

What we’re doing isn’t easy but nothing worth doing ever is.

We envision a future powered by robots that work seamlessly with human teams. We build artificial intelligence that enables service robots to collaborate with people and adapt to dynamic human environments. Join our mission-driven venture-backed team as we build out current and future generations of humanoid robots.

The Tech Lead / Manager AI Evaluation Science will lead the team responsible for advancing the state of the art to measure the performance of physical AI systems and measuring and validating how our AI systems perform in the real world. This group defines requirements builds metrics and creates rigorous evaluation pipelines. This work ensures that our robots meet high bars for safety reliability task performance and human trust. You’ll own simulation testing labeling and interpretability frameworks making sure our robots not only work but work safely repeatably and explainably.

This is a hands-on leadership role in a startup environment. You’ll be both strategist and player-coach: defining evaluation standards coding tools and models and building the team that ensures our embodied AI is ready for deployment.

Responsibilities

Lead the AI Evaluation Science team owning evaluation strategy for robot perception planning control and multimodal models.
Define metrics and benchmarks for AI performance across safety reliability user experience and robustness.
Develop and maintain large-scale simulation environments to test robot behaviors under diverse real-world conditions (edge cases adversarial scenarios rare failures).
Design evaluation frameworks that cover offline experiments simulation and live deployments.
Build scalable pipelines for test coverage automated evaluation and regression tracking.
Oversee labeling and data curation pipelines to generate high-quality ground truth for training and validation.
Drive interpretability and explainability in embodied AI models—ensuring failures are measurable diagnosable and improvable.
Collaborate closely with AI/Robotics engineering teams to define product requirements set acceptance thresholds and close the loop between evaluation and development.
Actively mentor engineers and scientists while contributing hands-on to code experiments and metrics design.

Skills and Experience

MS or PhD in Computer Science Robotics ML EE or related field along with 8+ years of AI/ML experience.
Proven leadership experience: built and managed technical teams in AI simulation or robotics evaluation.
Hands-on expertise building and evaluating large multimodal ML models (vision language action).
Strong background in defining and operationalizing metrics for AI/robotics systems (safety robustness reliability).
Demonstrated success in designing end-to-end evaluation pipelines: from data labeling and test definition to automated reporting and regression tracking.
Experience in evaluation benchmarking or safety in robotics AVs or similar domains.
Experience with simulation platforms for robotics or AVs
Technical depth in ML interpretability error analysis and data-driven model improvement.
Ability to operate in a startup context: strategic but hands-on in code and experimentation.
Excellent communication and cross-functional alignment skills—able to articulate risks metrics and trade-offs to executives engineers and non-technical stakeholders.

Apply Now

Date Posted

11/18/2025

Views

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews

Positive

Subjectivity Score: 0.9

Similar Jobs

Senior Design Manager (Infrastructure) - Canonical

Views in the last 30 days - 0

Canonical a leading opensource provider seeks a Senior Design Manager to drive innovation in cloud and AI technologies The role offers remote work glo...

View Details

Product Manager Wallet SDKs - Startale

Views in the last 30 days - 0

The text describes a job alert system where applicants must mention UNSELFISH and use a specific tag to demonstrate they read the post It explains the...

View Details

Senior Product Designer - Org & Security - Typeform

Views in the last 30 days - 0

This job description outlines a role in developing an intelligent contact management system with AI capabilities The position involves designing user ...

View Details

Executive Director Patient Advocacy - Kyverna Therapeutics

Views in the last 30 days - 0

Kyverna Therapeutics is seeking an Executive Director for Patient Advocacy to lead initiatives in autoimmune disease treatment The role involves build...

View Details

Medical Affairs Writer Contract - Kyverna Therapeutics

Views in the last 30 days - 0

Kyverna Therapeutics seeks a Medical Affairs Writer to develop scientific publications and communications for cell therapy innovations The role requir...

View Details

Recovery Analyst Underpayments - Trend Health Partners

Views in the last 30 days - 0

TREND Health Partners seeks an Underpayment Recovery Analyst to optimize client reimbursement through collaboration and detailed claim analysis The ro...

View Details