Y CombinatorHow To Build Generative AI Models Like OpenAI's Sora
EVERY SPOKEN WORD
30 min read · 6,463 words- 0:00 – 1:13
Coming Up
- HTHarj Taggar
A lot of the sci-fi stuff is actually now becoming possible. What happens when you have a model that's able of simulating real world physics?
- DHDiana Hu
Wouldn't it be cool if this podcast were actually an Infinity AI video?
- JFJared Friedman
One thing I noticed that, like, the lip syncing is, like, extremely accurate. Like, it really looks like he's actually speaking Hindi.
- DHDiana Hu
How do YC companies build foundation models during the batch with just 500,000?
- JFJared Friedman
This is literally built by 21-year-old new college grads, and they built this thing in two months. I think he, like, locked himself in his apartment for a month and just read AI papers.
- SPSpeaker
You can actually be on the cutting edge in relatively short order, and that's an incredible blessing. Welcome back to another episode of The Light Cone. Today, we're talking about generative AI. First there was GPT-4, then there was Midjourney for image generation, and now we're making the leap into video. Harj, we got access to Sora, and we're about to take a look at some clips that they generated just for us.
- HTHarj Taggar
Yeah,
- 1:13 – 5:05
Sora Videos
- HTHarj Taggar
should we take a look? Okay, so here's the first one. The prompt is, "It's the year 2050. A humanoid robot, acting as a household helper, walks someone's golden retriever down a pretty, tree-lined suburban street." What do we think?
- SPSpeaker
I like how it actually spells out "helper." It's like a flex.
- JFJared Friedman
Yeah.
- SPSpeaker
Like, "I can spell now."
- JFJared Friedman
Yeah, which was not true with the image models, like-
- HTHarj Taggar
It would always screw up the text in the image.
- JFJared Friedman
Yeah.
- HTHarj Taggar
Yeah, that's true.
- JFJared Friedman
Stable Diffusion, DALL·E were, were notoriously bad at spelling text, so that is a major advance that no one's really talked about yet.
- HTHarj Taggar
I mean, it's wild how high definition it is. Like, that's almost realistic.
- DHDiana Hu
And the other really cool thing is the physics. The way the robot walks, for the most part, is-
- JFJared Friedman
Yeah.
- DHDiana Hu
... very accurate.
- JFJared Friedman
Accurate.
- DHDiana Hu
You do notice a little kind of, like, shuffle that's a little bit off, but for the most part, it's believable.
- JFJared Friedman
And the way the golden retriever moves, I have a golden retriever.
- HTHarj Taggar
Yeah, but look at the tail.
- JFJared Friedman
So I can personally vouch that, like, they perfectly modeled the, like-
- HTHarj Taggar
(laughs) Yeah, you have one, right? So you would know.
- JFJared Friedman
(laughs)
- DHDiana Hu
Like your dog, right?
- JFJared Friedman
Yeah, this is perfect, is a perfect representation of how a golden retriever walks. I also like that, um, with, with DALL·E and Stable Diffusion, as you got... As you made your prompts longer and longer, it would just start ignoring it, and not actually doing exactly what you told it to do. And like, we gave it a very specific prompt here, and it did exactly the thing that we told it to.
- HTHarj Taggar
You can see it's not... It's still not exactly perfect.
- JFJared Friedman
Y-
- HTHarj Taggar
So, I think towards the end, you see as, like, a floating dog or something in there.
- JFJared Friedman
Okay. I- I- I was gonna call out a couple other imperfections here-
- HTHarj Taggar
There you go.
- JFJared Friedman
... which is that, like, the street is not a street, guys.
- SPSpeaker
(laughs)
- DHDiana Hu
(laughs)
- 5:05 – 8:19
How Sora works under the hood?
- HTHarj Taggar
One thing I'm really curious about is just how Sora works under the hood and just how they're generating these videos. So Danny, could you give us a, uh, brief... Like a, uh, a primer on just, like, what's actually going on? And one thing I was particularly curious about is, like, is this, like, a new model, like, uh, or is this, like, an extension of the transformer model that we all know about as, um, powering ChatGPT?
- DHDiana Hu
I think the TLDR and the really cool thing here, it is really a combination of a transformer model, which typically has been mostly used for text, and a diffusion model, which has been used in... Which is a lot of the tech behind DALL·E, Midjourney to generate images. So it's combining these two and then adding a temporal component so you can see the consistency between frames and the time. And I think the key thing that, uh, OpenAI did was to train this with videos and with what they call space-time patches. So it is like this, uh, basically this three by three matrix of pixels. So you have the space, spatial...... and then patches of, uh, temporal, which is like m- multiple frames create a video. And the way they do it i- they have a variation of the sizes of these patches. They could be certain size, smaller to bigger in XYZ basically, right? And then they basically train all this in, in this giant architecture, which probably is really expensive.
- HTHarj Taggar
And so are the patches, are these sort of spacetime patches the video equivalent of tokens?
- DHDiana Hu
Sort of. Because I think there's a lot of prior work behind, uh, Sora because th- the first thing is transformers have been mostly applied to text. And one of the prior work arts was, uh, Google's work on demonstrating that you could do transformer models not just for English, text, but for images. So that was a foundational paper that came back in, um, I think they published it in 2020. And the paper was called Images Were 16x16.
- HTHarj Taggar
Hmm.
- DHDiana Hu
So they call it a visual transformer. So th- they demonstrated that you could create and use transformer models for image recognition tasks because the state of the art up to then was convolutional neural networks, which was very expensive to compute. So that was one piece of the puzzle. The other piece of the puzzle was, um, kind of the spacetime concept. And I think that some of that comes from stitching some different work on the past. There's, um, this other paper world model that came out in 2018 that separated, it's for robotics actually, that separates the detection piece. So that's kind of the perception of the visual part. And then the other piece is the memory model for the temporal aspect. And the temporal aspect in the world model paper uses RNNs, and then there's a controller model that combines it. So what, I mean, they don't explain too much, OpenAI. This is just a bit of just me looking at it. I, I don't know. This is one of these things that OpenAI's a bit cagey about it. But y- we can only speculate it's a combination of like robotics papers plus transformer plus text.
- 8:19 – 10:01
How expensive is it to generate videos vs. texts?
- HTHarj Taggar
And then how much more expensive is it to generate one of these videos compared to sharing the text? Like, I'm c- like how do we even think about that?
- DHDiana Hu
Oh, man. So imagine the GPT-4 is like a trillion parameters, and that imagine is only two dimensions, right?
- HTHarj Taggar
Yeah.
- DHDiana Hu
Text is just a matrix of two by two. Now this is like an order of magnitude. So I can imagine it's like at least one order of magnitude, 10 trillion. (laughs)
- HTHarj Taggar
Okay. That's amazing.
- DHDiana Hu
So probably 10 times the amount of GPUs. I could only imagine, I think it was about 20, uh, 30 s- thousand, I forget exactly the number of GPUs that it took for GPT-4.
- HTHarj Taggar
Okay. Well, what's crazy is that we have companies within YC that have also been able to achieve similar types of functionality, and they clearly have way less resources than OpenAI does. And so I'm curious how they managed to do that. And the way, the way I kind of think about this is that there's the components of building one of these like foundational models, like data, compute, and expertise. Should we talk through some of the YC companies and how they've managed to like hack-
- DHDiana Hu
And build-
- HTHarj Taggar
... each or all of those things?
- DHDiana Hu
Basically, how do YC companies build foundation models during the batch with just 500,000?
- JFJared Friedman
Yeah, I think this is an important topic because I think because people know how much money OpenAI is spending on GPUs, there's this meme going around that in order to do this, you need to, like, have raised like billions of dollars and have like a data center full of GPUs. And we've actually seen that it's not true. There's actually a bunch of companies in the current batch, Winter '24 right now, that just in the time of the batch with just the 500K that YC gives them have actually built really awesome foundational models that are producing like magical results.
- HTHarj Taggar
Should
- 10:01 – 11:23
Infinity AI
- HTHarj Taggar
we look at some of these demos?
- JFJared Friedman
Yeah.
- HTHarj Taggar
And see how, talk about how they managed to get this to work? (laughs)
- JFJared Friedman
Yeah.
- HTHarj Taggar
Uh, let's start with Infinity AI.
- JFJared Friedman
Infinity AI is a company in the current batch. And what they do is they make deep fake videos of a particular person. So for example, they have an AI replica of Elon Musk. And you can just tell Infinity AI what you want Elon Musk to say, and they will produce a video of Elon Musk saying exactly that thing.
- HTHarj Taggar
(laughs) That's pretty cool.
- DHDiana Hu
Should we watch a demo?
- HTHarj Taggar
Yeah, let's see a demo.
- JFJared Friedman
Yeah. Let's watch the demo.
- SPSpeaker
Speaking of IC companies training their own models, did you guys see the Infinity AI demo last week? Yeah, they're a company in my group. Infinity allows people to make videos by just typing out a script. Wouldn't it be cool if this podcast were actually an Infinity AI video? That'd be super cool. You think they'd be up for that? Well, guys, I have a surprise for you. (laughs)
- JFJared Friedman
(laughs) There we are.
- HTHarj Taggar
(laughs) That was pretty good.
- JFJared Friedman
(laughs)
- DHDiana Hu
Wow.
- JFJared Friedman
So special thanks to the Infinity AI team who made a model for, of the Lightcone podcast. And the way that they did this is they literally just downloaded our YouTube videos from the first three episodes and they trained their model on that. And the cool thing about these models now is like you don't need that much data once you've trained the foundation model to adapt it to learn a new person. So just the like hour or so of YouTube video that we had was enough for them to get a really accurate r- representation.
- 11:23 – 13:41
Sync Labs
- DHDiana Hu
I could talk about another company. So, uh, let's explain what SyncLab is. SyncLab is an API for creating real-time lip syncing. And the crazy thing about this team is that they trained the models on a single A100 and is generating these kinds of results. Let's take a look at it.
- SPSpeaker
(Hindi)
- JFJared Friedman
I'm guessing this guy doesn't actually speak Hindi?
- HTHarj Taggar
No.
- DHDiana Hu
No.
- JFJared Friedman
Okay. Oh, one thing I noticed is like the lip syncing is like extremely accurate. Like, it really looks like he's actually speaking Hindi.
- DHDiana Hu
Yeah. And, and if we put it in this framework that you were mentioning, Harsh, with how YC companies do this, eh, there's different vectors. There's, uh, computation, dataset, and speed. So they kind of hacked all of those. So for the dataset, the clever thing they've done is unheard of obtaining a video model-... video/audio model with so little resources is they compress a lot of the data and use low-res video. So you don't need the high-res video because if you have a high res of 1080p, versus let's say the 240 version, that's like a factor, quadratic factor less because it's two dimensions, right? So they've done that. The other thing that enabled them to really move a lot faster is the deal that we did with Azure where we have a dedicated GPU cluster for companies in the batch.
- JFJared Friedman
Hmm.
- DHDiana Hu
They've been able to iterate a hundred times faster than they were before in the batch.
- SPSpeaker
So a lot of companies out there, uh, they decide they need to do fine-tuning, they need access to GPUs and they just can't get it. Or you've got to pay an arm and a leg and pre-pay for a year in advance and maybe you'll get it in 2025. But if you're in the YC batch, turns out you can get them.
- DHDiana Hu
Yeah. You get over half a million in credits and you- and there's no, uh, contention for resources. You actually get instant access within 24 hours for a GPU cluster.
- JFJared Friedman
Which is pretty cool because YC invests half a million dollars but I think all the companies in the YC batch that trained these models, I think they literally didn't have to touch the YC money to train the models. Like that was all extra, free money on like unrelated to, unrelated to the YC investment.
- 13:41 – 15:44
Sonauto
- JFJared Friedman
Should we talk about-
- HTHarj Taggar
Yeah.
- JFJared Friedman
... Sonato?
- HTHarj Taggar
Yeah.
- JFJared Friedman
So Sonato is another company in the Winter '24 batch, um, and they have built a text to song model. So you can give their model lyrics to a song and tell it who you want to perform the song. Like you can tell it, "I want Taylor Swift to sing a birthday song for my dog." And it will make exactly that song.
- HTHarj Taggar
(laughs)
- JFJared Friedman
There's only like two or three models in the world that have ever been trained that actually do this, and I think Sonato is actually the best one.
- HTHarj Taggar
Oh, wow.
- JFJared Friedman
Um, and the really cool thing is that the founders of Sonato are literally like 21 years old. So Harj, to your point about expertise, this was not built by like PhD machine learning researchers who have been working in machine learning for like 10 years or something. This is literally built by 21-year-old new college grads.
- HTHarj Taggar
Yep.
- JFJared Friedman
And they built this thing in two months and they did it, basically they just taught themselves.
- HTHarj Taggar
That's amazing.
- JFJared Friedman
They just went online and they like figured out how to do it. (laughs)
- HTHarj Taggar
(laughs) That is very impressive. Should we take a look at it?
- JFJared Friedman
Yeah. So this is, um, a song that they made for the YC batch and it's like a power march about Y Combinator.
- HTHarj Taggar
(laughs)
- SPSpeaker
Forward we march. The dreamers in line. Bearing the torch of ideas so fine. In the heart of the valley. Where futures are made. We the founders of YC will never fade. 3, 2, 1.
- HTHarj Taggar
That's amazing. (laughs)
- JFJared Friedman
(laughs)
- HTHarj Taggar
This is how we're going to open the batch.
- SPSpeaker
Yeah.
- HTHarj Taggar
(laughs) From now on. (laughs)
- SPSpeaker
That's a good idea.
- JFJared Friedman
(laughs)
- HTHarj Taggar
(laughs)
- SPSpeaker
We need big, uh, orange banners behind us.
- JFJared Friedman
(laughs)
- HTHarj Taggar
(laughs)
- SPSpeaker
And we have to wear, um, military garb though.
- HTHarj Taggar
(laughs)
- 15:44 – 17:40
Metalware
- DHDiana Hu
And to your point with Jared, there's another company that also didn't have the expertise of PhD in machine learning. It is called Metalware. They're building a copilot for hardware. And these were founders who used to work as hardware engineers at SpaceX and they had to build all these hardware designs. So they're very familiar with building hardware. And they, when they came into the batch, they decided to build basically a copilot for hardware design. And they didn't have much AI background and they figured it out. And one of the cool things about them is they also train a foundation model for this because there was, there's no model available for this. And they were able to do it during the batch. And in that same framework are the things that they hack with data and computation. In terms of, uh, the data, they got away with using less data but more high quality. What they did is they took a bunch of figures and information from textbooks on hardware and they scanned all of that and used that as input, which is clever, right? The other problem, the other thing because they didn't need as much data, then they could choose to work with a model that's less computationally intensive. So they actually use GPT-2.5 which seems counterintuitive because the 2.5 GPT only has like one billion plus parameters. I think? I think it's one billion, right. Yeah. Versus GPT-4 is like trillion.
- JFJared Friedman
Yeah.
- DHDiana Hu
And they were able to get away to use less computational resources because they use a smaller model and better data and then they could do all these hardware design copilot tasks which is really cool. So when you kind of constrain a lot of your task and you're very specific and the dataset is very high quality, that's another way you could hack and build a foundation model during the batch. And they're for all different kinds of applications, not just generating video texts.
- 17:40 – 19:29
Guide Labs
- DHDiana Hu
There's one that I really excited in the current batch called Guylab. They're building a explainable foundation model because one of the things with all these foundation models and deep learning is kind of this black box magic. Nobody know what's going on. That you put in the data, it kind of predicts the label and you have no idea how that happened. The prior too deep learnings you could because you could have the weights and understand which feature indicated and gave the weight for the label. So this team is building a foundation model that can explain the outputs and they trained a model during the batch.
- HTHarj Taggar
Nice. As a founder, like when is it the right call to invest in building your own model versus just using one of the existing open source models and fine-tuning and tweaking it to fit what you need?
- DHDiana Hu
Well, I guess it depends, right? It depends on what you're really looking to build. If you're in a very specific, and it can be niche space, you can get away with training your own foundation model like the Metalware guys. But if you're, let's say, doing something more with language, GPT-4 gets you quite further along. So it depends on the task too, right?
- HTHarj Taggar
So, so if we're thinking about it as, like, a data compute expertise, like, we're basically saying expertise is maybe overrated. (laughs)
- DHDiana Hu
(laughs)
- HTHarj Taggar
Like, we've, like, proving that if you're just, like, smart and, like, willing to read the papers, you can figure it out. Compute, there are way, like, Y- being in YC is one way to get around that. Like, you can get credits and you can take some of that cost off. And so then is it, like, the data piece is sort of where all the edge is? Like, if you can find high quality, um, sorry, you say it like, high quality, but not, like, giant datasets, that's the,
- 19:29 – 24:21
Phind
- HTHarj Taggar
the hack?
- DHDiana Hu
Oh, yes. Let's talk about, uh, Phind. So Phind is this company that's building a copilot for software. The answers that they're generating are even better than Stack Overflow.
- HTHarj Taggar
Ah, interesting.
- DHDiana Hu
And these were also kids out of college (laughs) with not a lot of, like, AI background, and they'd done a very clever hack to build their own model for the data. They created a bunch of synthetic data for programming competitions. So they would have a bunch of those datasets generated, and that got, like, a lot higher quality. Imagine that. It's, like, basically infinite if it's synthetic.
- HTHarj Taggar
It's interesting because I feel like synthetic data is being looked down on by-
- JFJared Friedman
It was controversial-
- HTHarj Taggar
Yeah.
- JFJared Friedman
... initially, yeah.
- HTHarj Taggar
Uh, well, why, like, w- why was it originally controversial, and why does it actually seem to be working?
- JFJared Friedman
It seemed, like, circular. (laughs) It seemed like it would be impossible for a model to generate its own data and how, like, how can you learn from the data that you generated yourself?
- HTHarj Taggar
Yeah.
- JFJared Friedman
It wasn't obvious that such a thing could be possible. It seemed to, like, violate some, like, conservation of energy kind of law.
- HTHarj Taggar
I, I remember it was, like, the, the meme that was going around on Twitter was, like, the mosquito drinking its own blood.
- JFJared Friedman
Its own blood, yes.
- DHDiana Hu
(laughs)
- HTHarj Taggar
And, like, this is how synthetic data works.
- JFJared Friedman
Yeah. Um, but then it turns out it actually works. (laughs)
- HTHarj Taggar
Interesting.
- SPSpeaker
I think maybe this is related to the idea that, you know, uh, some of these, you know, LLMs are actually capable of reasoning. (laughs) And once you can reason, maybe that's the part that sort of spins up the flywheel and makes it possible. And, you know, there are other interesting analogues that I think there's a healthy debate out there whether or not this will come together. But you could look at self-driving car models are often trained on massive amounts of simulation data instead of actually real drive time, you know, sometimes by a factor of 10 to one or more. And that might end up being true for some of the generative AI models too.
- HTHarj Taggar
Is it possible Sora will do that as well? Like, can Sora generate its own video to continue training and improving its own model?
- DHDiana Hu
Probably. Uh, I don't... OpenAI doesn't share much about their data sources because that's part of the secret sauce, but 100% they're using, uh, video footage that's generated from Unreal Engine probably or Unity, one of these game engines, because they have a full physics simulator. So you could create multiple, uh, scenes of the same, uh, let's say if you have a... the example of the car driving on the, on the cliff, they could generate it from all multiple camera angles. Because what the game engine does, you can position the camera anywhere and you could basically generate all the footage on all possible camera views.
- HTHarj Taggar
The physics part of this is really interesting. I feel most people when they are seeing these Sora demos or just generally get this concept, your mind goes to, "Oh, this will be cool for generating films or video games that," like, like entertainment. But, uh, if what you're saying is it can actually, like, simulate the real world, there's probably going to be lots of, like, further-reaching implications for that. Like, what are some of, like, what happens when you have a model that's able of, like, simulating real world physics, and where does that apply?
- JFJared Friedman
Well, I have, I have a good example. This company, Atmo, which we funded in 2020, they built their own foundational model for weather prediction. The way they did it is they trained a model on, like, I think 90 terabytes of weather data. They've programmed in a physics model of the world by starting with, like, actual, like, equations of physics.
- DHDiana Hu
A giant polynomial.
- HTHarj Taggar
(laughs)
- JFJared Friedman
Yeah, it's, it's, yeah, it's, it's effectively a giant p- polynomial, and it's so expensive to run, it has to run on a cluster of supercomputers, and it's so expensive to run, there's actually on-... The only place in the world that actually runs this model is NOAA, the US government agency. They're the only ones with a supercomputer cluster that runs the, the physics model. And so every weather app that you go to, every weather channel, they're actually not predicting their own weather, they're just downloading the government prediction data and wrapping, like, a nice UI around it. There's only one actual physics simulation for weather, like, in, in, in America. And so... And, like, no commercial company has been able to create their own version because it's too expensive to do it the old school physics-based way. And so what's really cool about Atmo is instead of using the old school physics way, they've trained a foundational model, and using machine learning, it's, like, a million times more efficient to run the same calculation or something like that. And because of that, this startup, which has only raised a seed round, is actually able to make a weather prediction model that is more accurate than the NOAA-funded one that cost over a billion dollars.
- HTHarj Taggar
Interesting. What's really surprising about the text-to-video is, like, just how far-reaching, like, the implications are. So you can go way beyond just generating, like, video games. Like, we can do weather. Like, what, what are, what are other examples of cool things that we could do if we can generate, like, have a physics simulator of the real world?
- JFJared Friedman
Well,
- 24:21 – 25:36
Diffuse Bio
- JFJared Friedman
there's a bunch of companies that are applying it to biology. Diana, do you want to talk about a couple of those?
- DHDiana Hu
Yeah. So it turns out all these foundation models are great function approximators for anything, so not just-
- JFJared Friedman
Any function. They're general-purpose learning algorithms.
- DHDiana Hu
And the human body can be simulated with...With functions too.
- HTHarj Taggar
Right.
- DHDiana Hu
So one of the companies, uh, that we funded as well is called Diffuse Bio. They're building generative AI for proteins. So what they're doing is building these big models to be able to create new molecules for new types of drugs and new kinds of, uh, gene therapies. And in order to hack this aspect of how do they make progress with not as much resources, they had a lot of expertise. This is, uh, different than, than the set of founders we talked about that don't come from the background in AI. Namrata, the founder, she has, she published some very legit papers in Nature before this. So you had a lot of expertise in terms of how to short-circuit the computation loop. What she did is build custom kernels on the, on the models so that the, the whole process of building the mo- the foundation models are a lot faster with less resources. So that's one. The
- 25:36 – 27:15
Piramidal
- DHDiana Hu
other company in the current batch is, uh-
- HTHarj Taggar
Pyramidal. Do you want to talk about them?
- DHDiana Hu
They're building a foundation model for the human brain, which turns out they're predicting EEG signal, which could be used for all sorts of applications, from predicting stroke to re- reading. At some point, they could, your brain could be read.
- HTHarj Taggar
Yeah, interesting.
- DHDiana Hu
Perhaps. And what EEG signals are, they're also temporal. So sort of like Sora. Sora has, like, the images, plus images over a timestamp, so there's video. So EEG is the same thing. It's just a electrical impulse, but over a time period. So they kind of do something similar with chunking, space-time chunk, but for EEG. So they're able to train this model. And the way they were able to train and iterate during the batch, there were experts in the space, so they also did a lot of hacks around the computation, where they found a way to, um, divide a lot of the sequential data into chunks, sort of like what Sora has done, and that actually reduced the runtime complexity by quadratic, which is like impressive. And they could get a single run of a iteration of a initial model with just 800 hours of compute, GPU compute, which is really cool.
- HTHarj Taggar
One, o- one thing that's really cool about that is like if people sat down and tried to think of different applications for foundational models, EEG data would not be the one that would like immediately come to mind. And to me, that suggests that there's probably a lot of other application areas like EEG data that just people haven't thought of yet.
- DHDiana Hu
Yeah. It's like, who would have thought that EEG is sort of like videos? It's just this whole concept with s- space-time. You can space-time lots of things.
- 27:15 – 28:58
K-Scale Labs
- DHDiana Hu
(laughs)
- HTHarj Taggar
It's also possible that, um, applications of AI that people thought would exist will now exist. Like robotics, I think is a good one. Yeah. That's a huge one.
- DHDiana Hu
Mm-hmm.
- HTHarj Taggar
You remember, I think we talked about this on a, a previous episode, about how when Sam was starting OpenAI, he talked about they originally thought that, you know, AI in robots and AI in the real world would be like the first application. And I remember, I went over to the OpenAI office in like the first year or two, and they had all these robots trying to, like, learn how to solve the Rubik's Cube by, like, reinforcement learning. Which is also kind of an interesting side note, because like OpenAI is so wildly successful right now that it's easy to think that they knew, that like they, they had this like straight line path to get there, but it was definitely not that. It was like a meandering path that they pursued a bunch of dead-end ideas, like the reinforcement learning robots that didn't work. Well, even the researcher working on transformer architecture at OpenAI was like off in a corner, I think, at the start. Like it wasn't- Yeah. ... clear with even, within OpenAI that that was going to be the- The thing. ... the right thread to pull on, right? But it, so like the Sora and just like text to video is particularly interesting, 'cause again, if we have a real physics simulator for the world, like that potentially getting plugged into robots is like a breakthrough to make the sort of the AI robot a reality. And we actually have a company in the current YC batch, Kscale Labs, that's working on consumer humanoid robots. Um, and yeah, so like- That's cool. Yeah, and they have like pretty cool demos. It's very early, but like j- just a lot of the sci-fi stuff is actually now becoming possible.
- DHDiana Hu
The cool thing about, uh, Ben, who's the founder for Kscale, he was the guy that built the foundation robotics model for Tesla.
- HTHarj Taggar
Yeah. Oh, cool. He put it into the Optimus, um, Prime robot as well.
- DHDiana Hu
Oh, awesome.
- 28:58 – 30:38
DraftAid
- DHDiana Hu
The thing about f- the real world is governed by the laws of physics, and it turns out we have a bunch of equations that can describe it for different things, like weather. There's also the space, um... For example, there's this company that we funded called Draft8 that is building AI models for CAD design. So CAD follows a lot of the laws of physics with Newton, right? With force, shear, et cetera. And a lot of, uh, the software behind SolidWorks and AutoCAD run on these really old kernels that basically, uh, again, solve these giant polynomials of lots of equations, w- so that when you do a design of a structure, and you want to calculate the force and the tolerances, it's accurate, because you don't want a building to just flop, right? So what they d- And it's very expensive. I mean, wh- whenever you build all these models in CAD, and these kernels are super old, and they kinda, at, at the end of the day, they run on these equations that compile, like, I don't know, to some wild thing like Fortran, because they haven't not been updated. What, uh, Draft8 is doing, they are short-circuiting some of these with AI models that can do some of the predictions, so there's a lot faster and cheaper in terms of computation. So there's a lot of, uh, geometry, computational geometry, computation behind the scenes.
- HTHarj Taggar
That's really cool. That's a perfect example of just like a valuable problem to solve that the general purpose models just aren't gonna get around to specializing yet.
- DHDiana Hu
I think that's a great point, and there's a lot of startups that are very worried that if they like go into AI, they're going to get run over by OpenAI or other foundational model companies. And so one solution to that is like train your own model that's doing something else.
- HTHarj Taggar
Yeah. That's a great
- 30:38 – 33:20
Playground
- HTHarj Taggar
point.
- SPSpeaker
Uh, there's actually a YC company called Playground run by our friend Suhail Doshi that is a good example of actually you probably can go up against people who are really well funded and come up with something that is far better. What we're looking at here is the newest version of Playground 2.5. And you're, they're hot on the heels of Midjourney, but at the same time, like...The models that they've actually even released open source go toe to toe against sta- you know, the latest versions of Stable Diffusion and, in a lot of cases, way outperform that. And they've done it on far less money than Stability AI and other teams in this space. So, I think Suhail and Playground are really one to watch to sort of, you know, go toe to toe with Midjourney and, in the long run, potentially beat it because I would never bet against Suhail Doshi. That guy is a beast.
- JFJared Friedman
The image quality is super impressive.
- DHDiana Hu
That is- looks so cool and maybe some- some of the audience would have thought that Suhail comes from an AI background but he doesn't.
- SPSpeaker
Yeah, he started a Mixpanel before when he was, uh, 19 (laughs) .
- JFJared Friedman
And Playground is also an interesting gamlits something that Harj was talking about last night, which is the phenomenon of companies pivoting into AI. Because Playground actually did not start with this idea. When it started, it was a completely different idea.
- SPSpeaker
Yeah.
- JFJared Friedman
And a couple of years in, Suhail, after raising a bunch of money, Suhail hard pivoted the thing into AI and he literally just taught himself AI. I think he, like, locked himself in his apartment for a month and just read AI papers and then he built Playground (laughs) .
- SPSpeaker
So don't be afraid. I mean, I think that that's one of the most interesting things that we've seen across, uh, many of these different examples that if you're looking for a reason why you can't succeed, guess what? You're right. But on the other hand, the field itself is so new, so brand new that if you spend six or nine months literally reading every paper and then meeting all the people who are in this space, and they'll meet you, um, you can actually be on the cutting edge in relatively short order. And that's an incredible blessing.
- JFJared Friedman
Totally.
- HTHarj Taggar
I- it's a really important message actually, right? Because we're all- we're all grateful to Sam and OpenAI for, like, bringing this field forward and making all of this stuff possible. But at the same time, all of the news headlines tend to be around the companies that are raising, like, huge amounts of money or about, you know, like, Sam (laughs) , like, himself as a- a sort of world celebrity at this point. But you can actually, like, compete with OpenAI for very valuable, like, verticals and use cases by training your own model without having to be Sam Altman or having $100 million.
- SPSpeaker
So
- 33:20 – 34:05
Outro
- SPSpeaker
we're out of time for today but we could talk for hours about the crazy things that we're seeing in AI being built by people who are probably not that different than you who's watching right now. Uh, a lot of the world right now is looking at people like Sam Altman and Dario Amodei and some of the luminary figures who have really pushed forward the whole space. But remember, all of these people started some place and we hope that Y Combinator might actually be the place for you to start just like it was for Sam Altman back in the day. That's it. Catch you next time.
Episode duration: 34:05
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode fmI_OciHV_8
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome