Skip to content
No PriorsNo Priors

No Priors Ep. 110 | With Mercor CEO and Co-Founder Brendan Foody

On this episode of No Priors, Sarah and Elad sit down with Brendan Foody, CEO and Co-Founder of Mercor, to discuss the company’s rapid growth and their vision for the future of the labor market. They dive into how AI is reshaping the workforce in real, tangible ways and what skills are worth investing in today. Brendan shares insights on evaluating talent in an AI-driven world, including how models might identify outlier or 10x candidates and even assess “taste.” The conversation also touches on the evolving role of human data, the future of hiring in fast-scaling startups, and whether AI will act as an individual contributor or a data-centric manager. Show Notes: 0:00 Introduction 0:16 Building Mercor 3:00 Identifying outlier talent with AI 9:07 How AI is reshaping the workforce: job displacement & evolution 11:18 What skills should we invest in now? 12:18 Verifiability 13:36 Evaluating models 16:07 What should kids learn today? 17:05 Evaluating taste in talent assessments 18:45 Future of data collection 26:07 Humans’ role in the AI economy 28:53 AI as a contributor vs. a manager 33:03 Mercor’s goals 34:50 Evolution of labor markets 36:00 Hiring advice

Sarah GuohostBrendan FoodyguestElad Gilhost
Apr 10, 202541mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 1:57

    Mercor in one sentence: LLM-driven hiring and performance prediction

    Sarah introduces Brendan Foody and Mercor’s rapid growth, framing the conversation around where training data and evaluations come from and how labor markets may change. Brendan explains Mercor’s core product: automating resume review, interviewing, and hiring decisions with LLMs to predict job performance.

    • Mercor automates key steps in recruiting with LLMs
    • Goal is to predict on-the-job performance better than humans
    • Mercor works with top AI labs hiring large numbers of model trainers
    • Episode themes: data sources, evals, and labor market shifts
  2. 1:57 – 2:52

    From global talent matching to “human data” for frontier labs

    Brendan describes Mercor’s origin story: starting as a way to connect overlooked global talent with opportunities. Customer demand pulled them into the “human data” shift—from low-skill crowdsourcing to high-skill expert vetting for frontier model work—while keeping a broader hiring-marketplace ambition.

    • Initial thesis: talented people lack access/opportunity globally
    • Market shifted from crowdsourcing text to vetting world-class experts
    • Human data now involves working closely with researchers to push capabilities
    • Many customers hire via Mercor for both human-data roles and standard jobs
  3. 2:52 – 3:30

    Will AI run hiring? Where models beat human hiring managers

    Elad probes whether AI will eventually dominate knowledge-work hiring. Brendan argues models already outperform humans on many assessments and will become the default recommendation engine, with humans possibly retained only for legal or formal sign-off.

    • Models are already better than many human hiring managers on Mercor evals
    • Trust in model recommendations likely becomes economically inevitable
    • Humans may remain as “final click” for compliance and accountability
    • Hiring automation adoption is still early but accelerating
  4. 3:30 – 5:03

    Finding 10x outliers: power laws, pricing, and measurable outcomes

    The conversation turns to identifying outlier performers and whether performance follows bell curves or power laws. Brendan explains how power-law dynamics vary by domain (investing vs. factory work vs. software) and why strong performance prediction reshapes labor economics and team construction.

    • Outlier identification is central because knowledge work can be power-law distributed
    • Domain variation: investing is extremely power-law; factory work less so
    • Performance prediction can change comp/value trade-offs (cheaper top-quartile talent)
    • Key is tying model signals back to real customer outcomes
  5. 5:03 – 6:59

    What signals models can read today: text dominance, multimodal lag, and high-volume advantage

    Sarah asks where AI evaluations will be strongest first. Brendan says anything measurable in text (interview transcripts, written responses) is a near-term sweet spot, while multimodal cues (passion, persuasion, sales) lag; he also notes automation arrives first in high-volume, comparable-role pipelines.

    • Text-based interviews/transcripts are highly legible to models
    • Multimodal human cues (authentic motivation, persuasion) take longer
    • High-volume hiring enables clearer attribution and faster learning
    • Comparing many candidates for the same job yields stronger feature discovery
  6. 6:59 – 9:42

    Hidden features and underused data: GitHub, portfolios, study-abroad signals, intrinsic motivation

    Brendan shares surprising or underexploited predictors of talent, especially online artifacts for engineers and designers that humans don’t have time to review deeply. For less-online professions, Mercor looks for indirect indicators (e.g., international candidates who studied in the West) and repeated evidence of intrinsic motivation aligned to specific tasks.

    • Online artifacts provide rich signal (GitHub, personal projects, blogs, portfolios)
    • Manual hiring processes miss large amounts of available information
    • Less-online domains can still have hidden predictors (e.g., study-abroad correlates)
    • Intrinsic motivation is a recurring high-value signal
    • Matching includes not just who to hire, but what to assign them to
  7. 9:42 – 11:44

    Job displacement, politics, and the “physical world” buffer

    Elad asks how model-vs-human evaluations translate into displacement. Brendan predicts rapid, painful automation in many digital roles (support, recruiting), likely triggering political backlash; he expects more human work to shift toward the physical world and human-preference roles, with robotics/data creation also growing.

    • Fast displacement in digital roles; major political/populist response likely
    • Key challenge: reallocating work and wealth as capabilities rise
    • Physical-world automation expected to lag virtual-world automation
    • Potential growth areas: robotics data, service work, therapy/human interaction
  8. 11:44 – 14:03

    Which skills to invest in: versatility, learning speed, and the rule of verifiability

    Sarah asks what humans should learn to stay economically valuable. Brendan echoes a versatility-first approach and introduces a heuristic: models improve fastest where tasks are verifiable (clear feedback loops), while non-verifiable, sparse-signal domains like “taste” are harder to automate.

    • Optimize for adaptability: learn quickly and switch domains as needed
    • Verifiability drives rapid model improvement (math, soon code)
    • Feedback loops/utility functions accelerate capability gains
    • Non-verifiable domains (taste, judgment) resist automation longer due to sparse data
  9. 14:03 – 16:34

    Extending verifiability and the eval crisis: measuring economically valuable agent work

    Sarah shifts to the “evaluation crisis” as models near benchmark saturation. Brendan argues evals must move beyond academic/zero-shot questions toward economically valuable, end-to-end work—especially agent behaviors like coordination, tool use, and real-world workflows—starting with homogenous tasks (e.g., customer support).

    • Traditional benchmarks miss end-to-end economic work
    • Agent evals (coordination, tool use, workflow execution) are the bottleneck
    • Start with uniform interfaces/tasks (customer support) before complex roles
    • Software engineering evals are hard because they include taste, coordination, product sense
  10. 16:34 – 17:31

    What should kids learn? Reasoning, passion, and market taste over “just coding”

    Elad asks what to teach young children given AI progress. Brendan is cautious about prioritizing computer science for its own sake; he recommends cultivating deep interest, general reasoning, and the ability to spot what markets lack—i.e., contrarian insight and product/solution taste.

    • Don’t over-optimize for coding as the enduring moat
    • Encourage intellectual passion that builds transferable reasoning skills
    • Entrepreneurial exploration (building, hustling) can be formative
    • Long-term advantage may be contrarian insight and product taste
  11. 17:31 – 18:58

    Measuring “taste” in assessments: work-sample realism over proxies

    Sarah presses on how to evaluate taste and create a feedback loop. Brendan describes using tasks that mirror real job work, comparing candidates against peers, and capturing thought processes; he emphasizes measuring what you truly care about (e.g., MVP building) and only using proxies when horizons are long.

    • Taste is contextual but can be surfaced via realistic problem tasks
    • Scoring comes from comparative performance across candidates
    • Collect intermediate reasoning/thought process (akin to code review signals)
    • Avoid over-reliance on proxies; measure the real thing when possible
    • Use structured proxies only when outcomes take longer to observe
  12. 18:58 – 19:47

    The future of data collection: eval-building as a dominant knowledge-work job

    Sarah asks how many people will do data collection/eval creation at scale. Brendan predicts massive volume as every enterprise builds evals to define “good” for agents, potentially making eval-building the world’s most common knowledge-work job—even though it can feel like people are helping automate themselves.

    • Economy-wide push to create evals for ‘everything’ businesses do
    • Mix of in-house employees and contractors building domain evals
    • Eval creation could become the most common knowledge-work job
    • Tension: people may be training systems that later reduce their role
  13. 19:47 – 25:29

    Who grades superhuman models? Frontiers, bootstrapping, and limits of human feedback

    Elad explores how long humans remain useful as evaluators and whether models can create their own evals. Brendan argues there’s always a frontier short of full superintelligence; models can propose criteria while experts validate, and models may learn to discount low-quality human feedback. He also cautions that success on narrow benchmarks doesn’t imply full job replacement due to coordination complexity.

    • Two categories: variable-cost task execution vs fixed-cost eval creation
    • Models may help generate criteria; humans validate and ground to expert reality
    • Risk of human feedback degrading superhuman models; models may filter it
    • We may overestimate proximity to superintelligence from benchmark gains
    • Large gap remains between narrow SWE benchmarks and replacing engineers end-to-end
  14. 25:29 – 29:46

    Humans in the AI economy and AI as manager: rationalizing orgs, incentives, and regulation

    The discussion broadens to underappreciated shifts: how humans fit into a future economy and how organizations respond when employee value becomes measurable. Brendan suggests AI may become better as a manager than as an individual contributor, changing performance management and potentially accelerating layoffs; Sarah and Elad discuss sectors insulated from economic pressure due to regulation and institutional inertia.

    • Public underestimates speed of displacement; need proactive policy response
    • AI may excel at management: decomposing work, prioritizing, performance-managing
    • Better analytics could pressure companies to cut ‘UBI-like’ inefficiencies
    • Adoption may face legal/political pushback but remains economically compelling
    • Regulated sectors (healthcare, education) may change differently due to weak market pressure
  15. 29:46 – 33:29

    RFT and the path to true agents: post-training, tool use, and enterprise customization

    Sarah asks what’s missing for assistants/agents to be truly proactive and useful. Brendan argues base models largely have the reasoning, but need agent-specific evals and reinforcement fine-tuning (RFT) to learn tool usage and ‘what good looks like’ in each company’s environment—making customization far more data-efficient than classic supervised fine-tuning.

    • Gap today: models ace tests but struggle with end-to-end assistant workflows
    • Agent performance hinges on tool use and company-specific knowledge/processes
    • RFT vs SFT: reward outcomes/criteria rather than learning from many labeled pairs
    • RFT is more data-efficient (hundreds/thousands of examples vs massive corpora)
    • Forecast: enterprise-wide buildout of evals to create custom agents
  16. 33:29 – 41:52

    Mercor’s roadmap and labor-market endgame: global unified marketplace + hiring discipline

    Brendan outlines Mercor’s priorities: building the best global talent supply and improving job-performance prediction via customer outcome data. He argues the labor market’s biggest inefficiency is fragmentation; automating matching enables a unified global market spanning humans and AI agents. The episode closes with hiring advice: prioritize early talent density, be explicit and data-driven about the traits that drive outcomes, and avoid vibes-based processes.

    • Top goals: attract the smartest global talent + improve performance prediction
    • Free candidate tools (mock interviews, career advice, shareable profiles) to scale supply
    • Data flywheel from customer outcomes becomes increasingly defensible
    • Long-term vision: unified global labor market; humans and agents compete/coordinate
    • Startup hiring: optimize for quality early; define measurable traits tied to outcomes

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.