Y CombinatorIan Fischer: How Stilts Beat a Frontier Model on ARC-AGI V2
Poetic's stilts pair self-improvement with an inference harness, not fine-tuning; it topped ARC-AGI V2 at lower cost than frontier deep-thinking modes.
CHAPTERS
Engineers should use AI daily: rapid prototyping mindset
Ian opens with practical advice for engineers: the pace of AI progress is so fast that the best strategy is to constantly experiment. He illustrates this with a personal example of building an iPhone app with GPT in a single weekend after years away from iOS development.
What Poetic is building: recursively self-improving reasoning harnesses
Poetic positions itself as a layer on top of frontier models that makes them reliably better at hard reasoning tasks. The company frames its work as “recursive self-improvement,” aiming to improve performance without retraining new base models from scratch.
The fine-tuning trap and the “bitter lesson” for startups
Jared and Ian discuss why fine-tuning is often a losing strategy for startups: it is costly, time-consuming, and quickly obsoleted by the next frontier model release. Poetic’s pitch is to provide improvements that persist—and even compound—when better base models arrive.
“Stilts” for LLMs: portable performance gains across model upgrades
Poetic describes its product as “stilts”: a system that consistently outperforms the underlying model and gets an automatic boost when the base model improves. The harness is intended to be model-agnostic and forward-compatible, making it a durable layer for AI products.
ARC-AGI V2: taking the top spot with lower cost than frontier reasoning modes
Ian recounts Poetic’s emergence from stealth via ARC-AGI V2 results, surpassing a strong baseline shortly after a major release. A key claim is that Poetic beat a more expensive “deep thinking” approach by using a cheaper base model plus a better reasoning system.
Humanity’s Last Exam: beating Claude’s SOTA with sub-$100K optimization
Poetic reports state-of-the-art performance on Humanity’s Last Exam, a difficult expert-written dataset. The discussion emphasizes that a seven-person team achieved these gains with optimization costs under $100K—tiny compared to foundation model training budgets.
Inside the “Poetic meta-system”: automated generation and optimization of agents
Ian explains at a high level that Poetic uses a recursively self-improving meta-system whose output is problem-specific reasoning systems. Rather than hand-building complex agents, Poetic automates discovering prompts, code strategies, and data transformations that improve reliability on hard tasks.
Beyond RL: shifting S-curves with recursive improvement
The conversation frames Poetic as a new paradigm distinct from standard pretraining and RL, with its own evolving performance curve. As both the meta-system and underlying models improve, the achievable performance frontier keeps moving upward—potentially toward AGI-level capability.
Automating prompt & context engineering—and accepting non-human artifacts
Poetic’s approach delegates deep dataset understanding to the meta-system rather than human inspection. Ian notes the generated prompts can look strange, include unexpected choices, and even contain errors—yet still improve performance—highlighting the non-intuitive nature of automated optimization.
Why harnesses beat prompt-only optimization: from 5% to 95% with code strategies
Ian contrasts small gains from prompt optimization with large gains from adding explicit reasoning strategies implemented in code. He cites a prior DeepMind-era result where prompt tuning barely moved the needle, but adding structured reasoning mechanisms produced a dramatic jump.
Early access: how startups can put their agents “on stilts”
Poetic is not broadly released yet; they invite teams with especially difficult, reliability-critical problems to apply for early access. The company is looking for cases where standard methods have plateaued and additional robustness is needed.
From YC founder to DeepMind researcher to Poetic: career pivot story
Ian shares how the acquisition of his YC startup created space to rethink his trajectory and ultimately move into AI research at Google. After an initial exploration in robotics, he shifted fully into machine learning research for about a decade, setting the stage for founding Poetic.
Advice for engineers in the AI era: push boundaries by building constantly
Ian closes by reinforcing a hands-on approach: use AI tools to build real projects and continuously test what’s possible. The key message is to avoid self-imposed constraints and to treat AI as a daily multiplier for learning and creation.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome