Skip to content
Y CombinatorY Combinator

Why The Next AI Breakthroughs Will Be In Reasoning, Not Scaling

There's an ongoing debate about whether AI scaling laws will hold or hit a wall in the near future. However, what's clear now is today's models already have the power to increase productivity in ways that would have been unimaginable just a few years ago. In this episode of the Lightcone, we dig into the results of a recent o1 hackathon hosted by YC to find out what can be unlocked when founders leverage a SOTA reasoning model. Chapters (Powered by https://bit.ly/chapterme-yc) - 0:00 Intro 1:15 The intelligence age 4:18 YC o1 hackathon 12:09 4 orders of magnitude 14:42 The architecture of o1 21:52 Getting that final 10-15% of accuracy 32:06 The companies/ideas that should pivot because of o1 34:44 Outro

Harj TaggarhostGarry TanhostJared FriedmanhostDiana Huhost
Nov 14, 202435mWatch on YouTube ↗

EVERY SPOKEN WORD

  1. 0:001:15

    Intro

    1. HT

      I remember about a year ago, one of these conversations around, "Are we going to have AGI? What would that look like?" One- one of the arguments for it was that, "Well, like, at some point the AI will get good enough to just, like, design chips better than, like, humans can and then it will just, like, eliminate one of its bottlenecks for, like, getting greater intelligence." And so it feels we're on the pathway to that in a way that we just weren't before.

    2. GT

      The last episode, we were talking about, you know, what are you going to do with these two more orders of magnitude? Since then, Sam has, uh, told me that he actually wants to go to four orders of magnitude. It's the worst that these models are ever going to be right now, right this moment. Week to week, you know, there are things that you couldn't do maybe a month ago that you could do really, really well right now. So that sounds like a pretty crazy moment in history. (laughs) Welcome back to another episode of The Light Cone. I'm Gary. This is Jared, Harj, and Diana. And at Y Combinator, we've funded companies worth more than $600 billion, and we fund hundreds of companies every single year. So we're right there on the edge of seeing what is going to work, both in startups and in AI.

  2. 1:154:18

    The intelligence age

    1. GT

      Recently, Sam Altman wrote this pretty wild essay that predicted that AGI and ASI are coming within thousands of days. Seeing him on Monday, he actually directly estimated, you know, between 4 and 15 years. Have you guys read this essay yet and what do you think?

    2. JF

      Yeah, I read it, and one- one of the interesting places where I think we have a unique perspective is that we were- we had a front row seat to the very beginnings of OpenAI 'cause OpenAI basically spun out of YC. And so what was cool to me reading this essay is that it's literally the same ideas that Sam was talking about in 2015 when he started OpenAI. Like, he's been talking about this, like, basically since I've known the guy. Um, and in 2015, when he said these things, he sounded kind of like a crazy person-

    3. GT

      (laughs)

    4. JF

      ... and not that many people took him seriously. And now, 10 years later, it turns out he was right, and actually, we were much closer to AGI than anybody thought in 2015. And now it doesn't sound crazy at all, it sounds, like, totally plausible.

    5. GT

      I mean, the essay itself is pretty much the most techno-optimist thing I've read in a really long time. Some of the things that he says are coming are pretty wild. Space colonies, fixing the climate problem, um, y- intelligence on tap, being able to solve abundant energy. Uh, yeah, I think he's basically ushering in this sort of Star Trek future on the back of literally human intelligence, being able to figure out all of physics.

    6. JF

      Yeah, Sam was always a re- like, I- I remember back when he was starting OpenAI, one of the things that really motivated him to do it was he believed that when we actually had AGI, basically it'd be better at doing science than humans were, and therefore it would accelerate the rate of all scientific progress in every scientific field. That was- that was part of the motivation from the very beginning, and I think it's really connected to o1. Even when Sam came and spoke at our- at our- at our batch a year ago, this is long before o1 was publicly released but it was being worked on in, you know- in, uh, in secrecy by OpenAI. That was the thing that he was most excited to talk about, was giving GPT more advanced reasoning capabilities. And I think this is the reason, it's because, like, the thing that's missing from its ability to actually do science and, like, accelerate technological progress is it needs- it needs to be able to, like, think through things. (laughs)

    7. HT

      One thing that really stands about o1, um, in particular, is if you read, um, one of the papers talking about it, so Capabilities and Potentials for the Future, it talks about how it does really well in chip design. Um, and I remember about a year ago, one of- like, one of these conversations around, "Are we going to have AGI? What would that look like?" One- one of the arguments for it was that, "Well, like, at some point, the AI will get good enough to just, like, design chips better than, like, humans can and then it will just, like, eliminate one of its bottlenecks for, like, getting greater intelligence." And so it feels like that's already kind of, like, we're on the pathway to that in a way that we just weren't before.

    8. JF

      Diana

  3. 4:1812:09

    YC o1 hackathon

    1. JF

      is going to show a cool demo of her doing exactly that.

    2. DH

      (laughs) It's fun because we ran this, um, hackathon with OpenAI and Sam came over and judged the winners. And one of the participants was actually chip design. This company is called Diode Computer, I think we mentioned them earlier. What they're building is basically AI designer for circuit design. And their previous product, it could handle in the- if you think about PCB design, there's four ma- major steps. The big expensive part that you need a lot of- all of these need a lot of expertise, is like the system design, how do you really put together the architecture of it? How do you design all the components, like the resistors they need, the sensors, the specific, um, processing units? Then you needs to go do the layout with schematics, placing- then doing the routing, and routing is known to be a NP-complete problem because as you have different layers in circuit boards, there's interference. And this is why companies like NVIDIA, Intel, Apple have a gazillion electrical engineers, because this is a NP-complete problem. Up to GPT-4, which is this company have built, it actually put some constraints and was able to automate a lot of the, uh, schematic design that you as a human had to design what components it needed to go on the design, and to some extent the routing, it was simple, which is still pretty cool up to that point. So they were able to automate all that, but the thing that they demonstrated now with o1 was actually able to do the system design and component selection, which is-... crazy. So, it would be able to read all the data sheets and select the right components. So, the way the product would work, it could say, "I want to build a wearable heart rate monitor with an accelerometer and a microcontroller." Very high level. And given these constraint and looking at the database, it would be able to match the specific accelerometer and microcontroller and heart rate monitor sensor and connect it and just output the end result.

    3. DH

      What we are trying to build today is a wearable heart rate monitor, something like you would see in a Whoop, for example. Um, the O1 is amazing, but one of the downsides is that it's a bit slow. So, we actually cached a pre-generated, like, system diagram that O1 was able to generate. It's pretty good. It has a USB-C connector, an IMU, like we requested, a heart rate sensor, um, and, uh, like, this is a microcontroller. So, I'm going to show you how you can go from this, uh, and, like, build a PCB. So, we are going to, like, build the project. Um, the output of this is code. We actually use AutoPyL, which is a, uh, electronics as code, uh, language. And you can see that it took all the blocks in the block diagram, stitched them together exactly how we want. Um, and, uh, the second step is it actually is going to generate a layout for the board. Um, and so now, like, we can directly open it and, uh, here you go. (laughs) Here's the board. It's, uh, quite i- it's quite nice. Uh, there's still, like, a couple of, like, fine-tuning steps required. For example, um, we could, like, move, uh, like, this USB Type-C connector slightly. Um, we can, like, change the shape of the board. But, but these are all the components. Um, and then, like, thanks to the system that we built, uh, we can, like, call the auto-router on this specific board and actually get a fully working, uh, printed circuit board back.

    4. DH

      So, this is, uh, actually one of the examples on the O1 paper.

    5. DH

      Yeah.

    6. DH

      That it would do EDA, but actually they went a step for- forward because the example on the paper, they described the EDA as this step process, is this set of tools for circuit design. It does sort of the design of the schematic, also the simulation and bug verification. It's easier to verify stuff than to select and write it.

    7. DH

      Hmm.

    8. DH

      So this company actually went a step further beyond the paper.

    9. DH

      Okay, cool.

    10. DH

      Because the paper did mostly the last stages of, uh, verification and simulation.

    11. GT

      I guess it's an interesting example of using different models for different tasks and in different workflows. So, in order to actually pick the correct components off the bat, you know, even before you, you know, place it on a circuit board, you've got to actually have probably RAG on structured... You know, taking unstructured data like PDF documentation and turning into a structured form that then 40 mini sounds like is being used to actually extract the data and then put it into, um, like, a usable form.

    12. DH

      Into a format for O1.

    13. GT

      Yeah.

    14. DH

      I think this is actually a very common pattern that we're seeing. A lot of the interesting products built with AI, you use different kinds of models. So yes, 40 minis for PDF extraction and then O1 for the reasoning, because it's actually very hard to select the components for parts. I know, uh, Jared, you also work with a lot of hard tech companies and the whole part of selecting whatever, the servos, the motors, the sensors, it's like so... It's, it's, it takes a lot of thinking for human.

    15. JF

      Yeah. The, the other thing I think is interesting about this example is, like, during the batch, before O1 came out, Diode had tried to do this with GPT-4o, and it just flat out didn't work. And then they basically tried the same thing, the same prompts, but fed it to O1 and boom, all of a sudden it worked. (laughs) And so there really is a sort of like step function capability unlocked.

    16. DH

      They were so excited when I talked to them and they showed me. They were, they had this big smiles, like, "Wow."

    17. JF

      (laughs)

    18. DH

      They themselves were super impressed.

    19. JF

      This hackathon that Diana ran, incidentally, I, I think is a really interesting concept for a hackathon. Like most, most hackathons are, like, people who are just sort of like building something that they plan to throw away. Um, and the cool thing about this hackathon is it was all actual YC-funded startups that have real businesses, that have funding, that have like a real thing with real users, and they were all building actual features for their product that they plan to release to real users. It was, it was really cool, I think, for us to see how O1 unlock capabilities for real companies, not just like toy projects.

    20. GT

      Yeah.

    21. DH

      There's another one that was similar in here in terms of reasoning for O1. I think, uh, Harsh, you work with, uh, Camfer.

    22. HT

      Yep.

    23. DH

      So, want to tell us what Camfer does?

    24. HT

      Uh, it's... I mean, the tagline is Devon for CAD, but basically they, um, let you create CAD designs with just natural language. Like, you just type in, like, you know, something that you want to design and it just, like, spits out, like, a CAD design for you.

    25. GT

      So can you design me five airfoils optimized for 50 miles per hour with a minimum drag to lift of 15 at a five-degree angle of attack? This is very specific.

    26. HT

      Yep.

    27. DH

      Normally this would require, uh, actually a mechanical engineer to be running all the simulations and solving through the equations. And, well, you're seeing why it's, like, flashing as they running all the multiple simulations for them at the same time.

    28. GT

      So it's actually kind of like a copilot to, uh, SolidWorks.

    29. HT

      Yeah. They actually built their... Like, initially they were going to build this as a plugin to SolidWorks, but they went for, like, the even harder technical approach, which was like, "No, this is just like a executable that runs on your desktop and it essentially opens up SolidWorks for you and..."

    30. JF

      And then just starts, like, clicking around in the UI-

  4. 12:0914:42

    4 orders of magnitude

    1. GT

      I mean, pretty wild. But on the other hand, like, you could see where that might go, you know. You could imagine an airfoil is still... it's, you know, very impressive and complex, but sort of what's... we're capable of doing today in 2024, you could imagine abstracting that to, like, understanding the nature of (laughs) physics, I suppose. Like, it, it would be sort of hard to see that maybe in the current version of o1, but if the scaling laws hold, it seems entirely plausible that, you know, far more difficult engineering challenges such as, um, you know, room temperature fusion. Like, these are all sort of ultimately engineering problems.

    2. DH

      Fluid mechanics, there's weather prediction, there's all these complex physical phenomenas that are very hard to solve and you need basically PhDs. And to Sam's essay, this is a glimpse into what AI and where o1 is heading with, with this chain of thought and reasoning.

    3. JF

      Especially, like, like, Sam's essay, The, The Vibes Are sort of training intelligence and this new age of intelligence, and then the o1 paper just... The... I think this whole idea of now you can actually give, like, feedback not just on, like, the outputs and whether you got the correct answer, but, like, on all of the steps to get there, and, like, you're basically teaching a model how to think. Like, the comfort guys are mentioning it too, right? The reasoning traces, and they can probably go back and, like, fine-tune the various steps for, like, every output to make sure that the model's thinking how they want it to think. That one, that just is, again, very, like, the AGI conversations I feel, like, a year ago were all sort of in this direction of, like, what happens once you can actually start teaching the models to think better versus just, like, um, spitting out the correct answers. Uh, and then the scaling laws, this is, like, even more surface area for, like, throwing compute at the problem, right?

    4. GT

      Yes.

    5. JF

      Like, now you can just basically put compute at the inference step and-

    6. GT

      And iteratively have something come out that, you know, you can actually spend more money and more time and have a result that iteratively gets better-

    7. JF

      Yeah.

    8. GT

      ... similar to what you might expect from a human scientific organization.

    9. JF

      Yep.

    10. GT

      Maybe more consistently, even. (laughs)

    11. JF

      Diana, do you, do you want to talk about the architecture and how they actually created o1?

    12. DH

      I think a lot of it is inspired from what they've been working for many years since the beginning of, uh, OpenAI. I think one of the inspirations

  5. 14:4221:52

    The architecture of o1

    1. DH

      is a lot of the work they did with DOTA.

    2. JF

      Yeah. Does everyone remember when, like, before OpenAI was famous for GPT, the one thing that it was, like, kind of famous for, that at least people in the tech industry knew, was DOTA, (laughs) was, like, winning video game competitions. That was their first big breakthrough.

    3. DH

      And the funny thing to think, back then, DOTA wasn't something that took the world for a storm. I mean, maybe only the research community kind of knew about it, but it wasn't anything practical. But what was impressive, it was beating a lot of the best DOTA players. So, DOTA is this complex game of resources and planning, right? And they implemented a lot of, uh, kind of reinforcement learning type of techniques in there, which I think were also inspired early days from AlphaGo and AlphaZero as well on how it solved Go.

    4. JF

      Yeah.

    5. DH

      It wasn't just brute forcing through it, but actually having a rework function and, and trying to solve towards it.

    6. JF

      Yeah.

    7. DH

      And even... This is why there's just so, so much talk about Q-learning, because that's sort of the fundamental algorithms behind... family of algorithms behind RL.

    8. JF

      So, yeah. So, like, because of DOTA, they got really good at doing reinforcement learning. That's how they got it to work. They just had it play against itself, like, a million games. (laughs) Um, and then how, how does that connect to, to, to o1?

    9. DH

      So, I think this is where there's a bit of a big stepping function, because how do you then incorporate that into the family of GPT type of models? GPT is all generative based on predicting the next token and patterns, and then getting those results to check that they're correct. So, I th- I think a lot of it is you had to have a lot of data that was factually correct and fed into probably the model and the training and having a reward function that get it to reason a bit more about the output and make sure that it's correct. So, they've probably done a lot of, um, interesting techniques with that, and there's probably a lot of secret sauce on the type of, uh, data sources they've done. Maybe one of the speculations we could do is a lot of very factually correct information.

    10. JF

      Like math problems and science problems and things like that, yeah.

    11. DH

      And that's why it outperforms so much in those.

    12. JF

      Yes. Yeah, one of the things I think is interesting, Gary, to your point about the scaling laws, is a lot of people are really focused on the next, like, scale up of the model, like the GPT-5 series of models, which are being trained now, and people are working on them, and they are gonna come out. Um, but I think people may be underappreciating how big an unlock this other direction is, 'cause there's, there, there's two research directions being explored in parallel, right? Like, one is the straightforward scale up of the underlying LLM, and then this o1 direction is, like, a totally orthogonal research direction in which you unhobble the model by having it do reinforcement l- learning while actually trying to do things in the real world and getting better at them. The version that's come out so far, it's still only o1 mini, and if you look at the actual-

    13. DH

      o1-preview.

    14. JF

      Sorry, o1, o1-preview, and, like, if you look at the performance tests that they released, like, the full o1 model, which is coming out any day now, is a huge step function- Huh.

    15. GT

      Yeah.

    16. JF

      ... above even o1-preview, which is what enabled all these incredible results at the hackathon. Sam was just telling us that, like, o2 and o3 are not far behind, and so, like, I think people may be underappreciating just how big an unlock we're gonna get.

    17. GT

      Yeah. And o1 also is really opaque still. I mean, it... From a, you know, sort of business perspective, this is a new method. I think at great cost to themselves, they actually did create a new dataset to train the chain of thoughts. You know, it, it's essentially a giant dataset of, you know, given task X, can you break it down and, uh, into... you know, break it down into parts? And you know, what's funny is this sort of rhymes with what Jake...... uh, Heller figured out for case text that-

    18. JF

      Yeah.

    19. GT

      ... if, uh, a given task that you give an LLM is hallucinating or, you know, not consistently giving the output you want, you're trying to make that particular prompt do too many things, you need to break it down into steps. And so what's funny is Jake's prescription is really two parts. You know, one is break it down into steps, and then the other part is evals. And it sounds like basically with O1, the chain of thoughts will replace the workflow. So, you might not need to break it down into steps yourself, but the evals are still really important. Um-

    20. JF

      Mm-hmm.

    21. GT

      Even like in the aftermath of that episode with Jake Heller, it sounds like some YC alums are reaching out and saying, "That episode helped us figure out and unlock something really big." Like-

    22. JF

      Yeah.

    23. GT

      ... a lot of people really were just raw dogging their prompts. (laughs)

    24. JF

      Raw dogging their prompts. (laughs) Yeah.

    25. DH

      They got to, uh, you have the example of a company you worked with, Jared, that got to 100%.

    26. JF

      Yeah, just by doing exactly what Jake recommended, which is like having a really big eval set and being very careful about testing every step of your reasoning pipeline.

    27. GT

      So, one of the theories that I have now is ultimately like if you superimpose that on what is a moat? I mean, that's one of the key-

    28. JF

      Right.

    29. GT

      ... questions that everyone's sort of asking themselves right now, you know. Okay, like GP5's coming, two more orders, maybe four more orders of magnitude are gonna come in terms of a trillion dollars spent on more training. That's pretty wild. You know, if I'm a wrapper company or I'm trying to do vertical SaaS or I'm trying to, you know, build my own business, what do I do? My theory would be it's the evals. It's, you know, write the 10,000 test cases and the only way you get access to the test cases that are proprietary data that are not av- like commonly available is that you literally, you know, that's what a bunch of our companies in this current YC batch are doing. They're doing the hard work of doing enterprise sales, they're getting embedded and sort of going "undercover" into these, uh, you know, sometimes really boring jobs, sometimes really complex or arcane jobs. You know, it's everything from, you know, I think accounts receivable all the way over to how do you do like financial accounting or forensic accounting? Like it's just all kinds of things that are really, uh, not readily available. Um, you- you can almost argue that anything that is consumer and publicly available on the internet, that's gonna be in the base model.

    30. JF

      Yeah.

  6. 21:5232:06

    Getting that final 10-15% of accuracy

    1. HT

      are actually a good example of it, where there's lots of interest in this sort of text to CAD design amongst like hobbyists or people who want to prototype things and get something up and running very, very quickly. Um, but there's also like a segment of the market where it's people who are literally designing like, you know, airplane parts where there is no room or like margin for error. And O1 makes it quite easy or easier now, right, to get to like the prototype like, you know, 80% of the way there. But I think the strongest technical teams have the option to go all the way and go after this segment of customers who want like 100% accuracy and will pay a lot for it. Um...

    2. GT

      Harsh, always go all the way.

    3. HT

      Yeah, well, I can always go all the way. (laughs)

    4. GT

      You have to go all the way. Yeah.

    5. HT

      But I- I think it's interesting because o- one, one of the things that gets pushed is does O1 or does AI in general actually make it commoditize a lot of the tech and make it less important to be a strong technical team? Um, and it just seems unlikely to me. It seems like actually the last time...

    6. JF

      It seems it's the opposite.

    7. HT

      Yeah.

    8. JF

      Yes.

    9. GT

      (laughs)

    10. HT

      Like all of the value is probably going to be captured by like the strongest technical teams who can build on top of whatever the base level of tech is and get the final 10%.

    11. JF

      Hey Gary, I think it's the prompts, it's- it's the evals, and it's also like the UI layer and the integrations that go around it. 'Cause like just the prompts themselves are not a product. For a company to actually adopt Camfer, like it needs to actually integrate into their existing tools, it needs to have a well-thought-through UI and-

    12. HT

      Yeah.

    13. JF

      ... workflow and all the tooling to sort of-

    14. HT

      Yeah.

    15. JF

      ... make the prompts useful.

    16. HT

      Yeah.

    17. GT

      Well, and then it's distribution, right? Like-

    18. JF

      Yeah.

    19. GT

      ... how do you actually get in front of people?

    20. JF

      Yeah.

    21. GT

      How do you establish your brand? And then, uh, a perfectly good moat is difficulty switching, actually.

    22. JF

      Totally.

    23. GT

      And once you have all your data and it's working-

    24. JF

      Yeah.

    25. GT

      ... and you're paying $10,000 or $100,000 ACV, sometimes a million to 10 million ACV, (laughs) uh, you know, man, it's gonna be hard to switch. So, all the classic moats still apply. You know, this is still software, but, you know, you can unlock this capability and i- i- you know, this is a moment, you know?

    26. DH

      Another point to double down on the importance of evals is that that still applies in the world of O1 as founders are wondering, "How am I gonna still build the best product on top of O1? Does it change?" And everything we discussed in the episode with Jake Heller applies because Gigaml is this company that, uh, Harshi worked with.

    27. HT

      Yeah, and Gary too.

    28. JF

      (laughs)

    29. DH

      And Gary, right?

    30. HT

      Yeah.

  7. 32:0634:44

    The companies/ideas that should pivot because of o1

    1. DH

      benefited as much from o1 and perhaps maybe people even should pivot? Because they're getting... they might just get deprecated from the improvements of o1, o2, o3?

    2. HT

      I wouldn't go all the way and suggest they should pivot, but I do think companies that are building AI coding agents or AI, um, program engineers are potentially, um... have stuff to think about here because it seems like o1 in particular is, like, outperforming on just, you know, solving programming problems essentially. Um, and I, I certainly know some of the teams I've worked with in the past, like, a lot of what they've invested in is, like, the chain of thought infrastructure behind this stuff, which is now just... o1 doesn't... is not actually, like, any leap forward for them. They've already, like, invested in that already. And so-

    3. GT

      I think that might be a function of, um, basically the opaque nature of what the chain of thought is. And once you get it to be directable, that's actually... I mean, frankly, that's what users and CodeGen are struggling with even right now. Like, once it starts going down a certain path, you can't really alter things. Like, you, you want it to ask you, "Hey, do you want me to do it like this or that?" And, you know, all of the systems are a little bit struggling with that right now.

    4. JF

      I was gonna ask the, the inverse question, Diana, which is, like, each new model capability unlocks a new set of startup ideas. Like, um, a year ago, doing startup ideas where, like, the AI agent would talk on the phone just, like, didn't work. We had a bunch of companies that tried, and all the companies didn't, didn't work. And over the summer, it really started working.

    5. HT

      Yeah.

    6. JF

      And w- under the trends from the, the, the past two batches, like, anything around, like, phone calling is, like, blowing up right now because the models finally work. So, like, with this new o1 series of models, what are the startup ideas that, like, just became possible?

    7. DH

      To connect to Sam's essay, is a lot of things that are gonna g- make the atom world, physical world better because it's really good at math and physics. So any startup that's working around mechanical engineering, electrical engineering, chemical engineering, bioengineering, all of these things that really will make our lives better, I think really will... are getting an unlock as we've seen from the demos we highlighted.

    8. GT

      That's exciting. I mean, it can't just be helping people click a little bit faster. (laughs)

    9. HT

      (laughs)

    10. GT

      It's gotta be things that actually create real world abundance for everyone. And, um, th- it might just be a little bit of a race. Like, I think there's sort of the fear of AI out there in society right now, and then,

  8. 34:4435:16

    Outro

    1. GT

      um, it's sort of up to the technologists to try to usher in this age of abundance sooner rath- rather than later. And if we can do that, then, um, abundance will win out over fear. So with that, I think we're out of time for this week of The Light Cone. We'll see you guys next time.

Episode duration: 35:16

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode JiwiqYGw4iU

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome