Staff Product Manager - AI Eval Platform

Dropbox • USA

Company

Dropbox

Location

USA

Type

Full Time

Job Description

Role Description

As a Staff Product Manager within the Dash organization you will play a crucial role in shaping how we measure and evaluate our AI-powered assistant and features. Dropbox is seeking a Staff Product Manager to lead AI Evaluations (Evals) — the systems metrics and processes that measure the quality and reliability of AI-powered features across Dropbox. In this role you’ll define how we evaluate model performance accuracy and user satisfaction across diverse AI surfaces like Dash search summarization and intelligent organization. You will be responsible for a core platform that enables every product team at Dropbox to launch new AI features with confidence armed with the tools to measure their success both online and offline.

You’ll collaborate closely with Applied AI Data Science and Research to design frameworks that ensure our AI features are helpful safe and high-quality. This includes everything from defining success metrics for model improvements to building scalable pipelines that assess qualitative and quantitative signals at scale.

This role sits at the intersection of AI systems data rigor and product judgment — ideal for a PM who loves turning ambiguity into measurable progress and ensuring that every AI interaction meets a bar of excellence.

Responsibilities

  • Define and drive the roadmap for Dropbox’s AI Evaluation Framework covering both quantitative metrics and human-in-the-loop systems.

  • Define the strategic vision and north-star framework for how Dropbox measures AI performance setting unified principles for quality correctness relevance and reliability across Dash and other AI features.

  • End to end ownership of offline scoring pipelines online instrumentation dashboards APIs and LLM-as-Judge components used by all product teams.

  • Build and scale a self-serve measurement platform that enables any Dropbox team to launch features run experiments and measure performance with minimal friction.

  • Collaborate cross-functionally with ML product engineering research and data science to operationalize evaluation pipelines design rubrics and ensure metrics are valid reproducible and reliable.

  • Establish and maintain company-wide evaluation standards by defining rubrics extending scorer taxonomies and guidelines that become the foundation for AI quality measurement and benchmarking.

  • Integrate measurement systems into the product lifecycle by partnering with PMs and engineering to ensure evaluation and feedback loops are embedded from ideation through launch and iteration.

  • Communicate results insights and trade-offs to senior leadership influencing product decisions and roadmap prioritization through clear storytelling backed by rigorous data.

Requirements

  • 10+ years of experience building measurement analytics or evaluation platforms ideally in an ML/AI context (e.g. experimentation platform metrics infrastructure evaluation pipelines) particularly with an understanding of the end-to-end AI development lifecycle from model training to deployment and monitoring.

  • BS/MS in Computer Science Engineering Business Information Systems Applied Math or Statistics or relevant experience.

  • Experience designing and deploying evaluation frameworks and pipelines. E.g. solid offline vs online evaluation metric definition and calibration and human + model adjudication where needed.

  • Deep understanding of ML evaluation metrics statistics. E.g. AUC precision/recall calibration bias detection variance error analysis.

  • Technical fluency and ability to partner with engineers software engineers and data scientists. Candidate is comfortable reasoning about pipelines APIs performance scale latency system tradeoffs and more with the ability to engage in deep technical discussions with engineers and data scientists and translate complex technical concepts into clear product requirements.

  • Strong cross-functional collaboration skills. You will  need to work with PMs researchers engineers data teams labeling teams and senior leaders.

  • Exceptional written and verbal communication skills with a demonstrated ability to create clear structured product documents and effectively communicate vision trade-offs and progress to stakeholders at all levels including executives.

  • Bias fairness robustness mindset. Experience (or sensitivity) in designing evaluation with fairness / adversarial robustness / edge cases in mind.

Preferred Qualifications

  • Experience with developing or implementing LLM-based evaluation frameworks within a RAG (Retrieval-Augmented Generation)  context while leveraging LLM as a Judge for online evaluations.

  • Hands-on experience with prompt evaluation rubric design human-in-the-loop evaluation adversarial test design

  • Familiarity with experimentation at scale including test design and measurement . e.g.  A/B testing systems causal inference counterfactual measurement.

  • 5+ years of experience in building self-service internal platforms / ML infrastructure / SDKs / APIs.

  • Experience building platforms or internal tools for technical users or developers and non-technical audiences alike.

  • PhD or advanced degree in a quantitative field (CS ML statistics etc.).

Compensation

US Zone 1

$229500—$310500 USD

US Zone 2

$206600—$279500 USD

US Zone 3

$183600—$248400 USD

Apply Now

Date Posted

11/15/2025

Views

0

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.2

Similar Jobs

Sr. Product Manager - Content Quality Signals - Pinterest

Views in the last 30 days - 0

The job posting seeks a Sr Product Manager to lead content quality signals at Pinterest focusing on machine learning user experience and crossfunction...

View Details

Data Platform Engineer (Staff / Sr Staff) - Equilibrium Energy

Views in the last 30 days - 0

This job description highlights a foundational role in designing data platforms focusing on infrastructure cataloging and collaboration with teams It ...

View Details

Senior Manager - Engineering - Red Cell Partners

Views in the last 30 days - 0

Trase Systems founded in 2023 by Joe Laws and Grant Verstandig is an AI platform empowering enterprises with endtoend solutions for AI deployment and ...

View Details

Senior HRBP Manager - Engineering - Twilio

Views in the last 30 days - 0

This job description highlights a Senior HRBP Manager role at Twilio with opportunities to shape talent strategies collaborate with engineering leader...

View Details

Customer Success Services Project Manager - Blackbaud

Views in the last 30 days - 0

This job posting outlines a Project Manager role at Blackbaud with responsibilities including managing professional services projects ensuring custome...

View Details

Marketing Analytics Manager - Consensys

Views in the last 30 days - 0

Consensys is a leading blockchain and web3 software company pioneering technological developments within the web3 ecosystem Their mission is to make t...

View Details