Lex Fridman PodcastRohit Prasad: Amazon Alexa and Conversational AI | Lex Fridman Podcast #57
EVERY SPOKEN WORD
150 min read · 30,059 words- 0:00 – 15:00
The following is a…
- LFLex Fridman
The following is a conversation with Rohit Prasad. He's the vice president and head scientist of Amazon Alexa, and one of its original creators. The Alexa team embodies some of the most challenging, incredible, impactful, and inspiring work that is done in AI today. The team has to both solve problems at the cutting edge of natural language processing, and provide a trustworthy, secure, and enjoyable experience to millions of people. This is where state of the art methods in computer science meet the challenges of real-world engineering. In many ways, Alexa and the other voice assistants are the voices of artificial intelligence to millions of people, and an introduction to AI for people who have only encountered it in science fiction. This is an important and exciting opportunity. So the work that Rohit and the Alexa team are doing is an inspiration to me, and to many researchers and engineers in the AI community. This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, give it five stars on Apple Podcast, support it on Patreon, or simply connect with me on Twitter @lexfridman, spelled F-R-I-D-M-A-N. If you leave a review on Apple Podcast especially, but also Castbox, or comment on YouTube, consider mentioning topics, people, ideas, questions, quotes, in science, tech, or philosophy that you find interesting, and I'll read them on this podcast. I won't call out names, but I love comments with kindness and thoughtfulness in them, so I thought I'd share them. Someone on YouTube highlighted a quote from the conversation with Ray Dalio, where he said that you have to appreciate all the different ways that people can be A players. This connected with me, too. Uh, on teams of engineers, it's easy to think that raw productivity is the measure of excellence, but there are others. I've worked with people who brought a smile to my face every time I got to work in the morning. Their contribution to the team is immeasurable. I recently started doing podcast ads at the end of the introduction. I'll do one or two minutes after introducing the episode, and never any ads in the middle that break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. This show is presented by Cash App, the number one finance app in the App Store. I personally use Cash App to send money to friends, but you can also use it to buy, sell, and deposit bitcoin in just seconds. Cash App also has a new investing feature. You can buy fractions of a stock, say $1 worth, no matter what the stock price is. Brokerage services are provided by Cash App Investing, a subsidiary of Square and member SIPC. I'm excited to be working with Cash App to support one of my favorite organizations called FIRST, best known for their FIRST Robotics and LEGO competitions. They educate and inspire hundreds of thousands of students in over 110 countries, and have a perfect rating on Charity Navigator, which means the donated money is used to maximum effectiveness. When you get Cash App from the App Store or Google Play and use code LEXPODCAST, you'll get $10, and Cash App will also donate $10 to FIRST, which again, is an organization that I've personally seen inspire girls and boys to dream of engineering a better world. This podcast is also supported by ZipRecruiter. Hiring great people is hard, and to me, is one of the most important elements of a successful mission-driven team. I've been fortunate to be a part of and lead several great engineering teams. The hiring I've done in the past was mostly through tools we built ourselves, but reinventing the wheel was painful. ZipRecruiter is a tool that's already available for you. It seeks to make hiring simple, fast, and smart. For example, Codable co-founder Gretchen Heubner used ZipRecruiter to find a new game artist to join her education tech company. By using ZipRecruiter's screening questions to filter candidates, Gretchen found it easier to focus on the best candidates, and finally hiring the perfect person for the role, in less than two weeks from start to finish. ZipRecruiter, the smartest way to hire. See why ZipRecruiter is effective for businesses of all sizes by signing up, as I did, for free at ziprecruiter.com/lexpod. That's ziprecruiter.com/lexpod. And now, here's my conversation with Rohit Prasad. In the movie Her, I'm not sure if you've ever seen.
- RPRohit Prasad
Yeah.
- LFLex Fridman
A human falls in love with the voice of an AI system. Let's start at the highest philosophical level before we get to deep learning and some of the fun things. Do you think this, what the movie Her shows, is within our reach?
- RPRohit Prasad
I think, uh, not specifically about Her, but I think what we are seeing is a massive increase in adoption of AI assistance or AI in all parts of our social fabric. And I think it's... What I do believe is that the utility these AIs provide, some of the functionalities that sh- are shown are absolutely within reach.
- LFLex Fridman
So the, some of the functionalities in terms of the interactive elements, but in terms of the deep connection that's purely voice-based, do you think such a close connection is possible with voice alone?
- RPRohit Prasad
It's been a while since I saw Her, but I would say in terms of the, uh, in terms of interactions which are both humanlike and, in these AI assistants, you have to value what is also superhuman. We, as humans, can be in only one place. AI assistants can be in multiple places at the same time. One with you on your mobile device, one at your home, one at work. So you have to respect these superhuman capabilities too.Plus, as humans, we have certain attributes we are very good at. Very good at reasoning. AI assistance, not yet there. Uh, but in the realm of AI assistance, what they're great at is computation, memory, it's infinite and pure. These are the attributes you have to start respecting. So, I think the comparison with human-like versus the other, other aspect, which is also super human, has to be taken into consideration. So, I think we need to elevate the discussion to not just human-like.
- LFLex Fridman
So, there's certainly elements where you just mentioned, Alexa is everywhere, uh, computationally speaking. So, this is a much bigger infrastructure than just the thing that sits there in the room with you. But it certainly feels, to us mere humans, that there's just another little creature there when you're interacting with it. You're not interacting with the entirety of the infrastructure, you're interacting with the device. The feeling is, okay, sure, we anthropomorphize things, but, uh, that feeling is still there. So, what do you think we, as humans, the purity of the interaction with a smart assistant, what do you think we look for in, in that interaction?
- RPRohit Prasad
I think in the certain interactions, I think will be very much where it does feel like a human, uh, because it has a persona of its own. And in certain ones, it wouldn't be. So, I think a simple example to think of it is if you're walking through the house and you just wanna turn on your lights on and off, and you're issuing a command, that's not very much like a human-like interaction, and that's where the AI shouldn't come back and have a conversation with you. Just, it should simply complete that command. Uh, so those, I think the blend of we have to think about this as not human-human alone.
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
It is a human-machine interaction, and certain aspects of humans are needed, and certain aspects or in situations demand it to be like a machine.
- LFLex Fridman
So, I told you, it's gonna be philosophical-
- RPRohit Prasad
(laughs) .
- LFLex Fridman
... in parts. Uh, what, what's the difference between human and machine in that interaction? When we interact two humans, especially those that are friends and loved ones, versus you and a machine that you also are close with?
- RPRohit Prasad
I think the, uh, you have to think about the roles the AI plays, right? So, and it differs from different customer to customer, different situation to situation. Uh, especially I can speak from Alexa's perspective, it is a companion, a friend at times, an assistant, and an advisor down the line. So, I think most AIs will have this kind of attributes, and it will be very situational in nature. So, where is the boundary? I think the boundary depends on exact context in which you're interacting with the AI.
- LFLex Fridman
So, the depth and the richness of natural language conversation has been, by Alan Turing, been used to try to define what it means to be intelligent.
- RPRohit Prasad
Mm-hmm.
- LFLex Fridman
You know, there's a lot of criticism of that kind of test, but w- what do you think is a good test of intelligence, in your view, in the context of the Turing test, and Alexa, with the Alexa Prize, this whole realm? Do you think about this human intelligence, what it means to define it, what it means to reach that level?
- RPRohit Prasad
I do think the ability to converse is an, uh, sign of an ultimate intelligence. I think that is no question about it. So, if you think about all aspects of humans, there are sensors we have, and, uh, those are basically a data collection mechanism. And based on that, we make some decisions with our sensory brains, right? And from that perspective, I think that there are elements we have to talk about, how we sense the world, and then how we act based on what we sense. Those elements, clearly machines have. But then, there's the other aspects of computation that is way better. I also mentioned about memory again, in terms of being near infinite, depending on the storage capacity you have. And the retrieval can be extremely fast and pure, uh, in terms of like, there's no ambiguity of, "Who did I see when?" (laughs) , right? I mean, if you, machines can remember that quite well. So, it, again, on a philosophical level, I do subscribe to the fact that to con-, be able to converse, and as part of that, to be able to reason based on the world knowledge you've acquired and the sensory knowledge that is there, is definitely very much the essence of intelligence. But intelligence can go beyond human level intelligence, based on what machines are getting capable of.
- LFLex Fridman
So, what do you think, maybe stepping outside of Alexa, broadly as an AI field, what do you think is a good test of intelligence? Put it another way, outside of Alexa, because so much of Alexa is a product, is an experience for the customer. On the research side, what would impress the heck out of you if you saw? You know, what is the test where you said, "Wow, this thing is now starting to encroach into the realm of what we loosely think of as human intelligence"?
- RPRohit Prasad
So, well, we think of it as AGI and human intelligence-
- LFLex Fridman
AGI.
- RPRohit Prasad
... all together, right? So, in some sense (laughs) . And I think we are quite far from that. Uh, I think, uh, an unbiased view I have is that the Alexa's ca-, intelligence capability is a great test. I think of it as there are many other proof points, like self-driving cars, game playing, like Go or chess. Let's take those two for, as an example.
- LFLex Fridman
Sure.
- RPRohit Prasad
Uh, clearly requires a lot of data-driven learning and intelligence, but it's not as hard a problem as conversing with, as an AI is with humans to accomplish certain tasks, or open domain chat, as you mentioned-
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
... Alexa Prize.In those settings, the key difference is that the end goal is not defined, unlike game playing. You also do not know exactly what state you are in, in a particular goal completion scenario. (laughs) And so sometimes... sometimes you can if it's a simple goal, but if you're... Even certain examples like planning a weekend or... You- you can imagine how many th- things change along the way. Uh, you look for weather, you may change your mind and you, uh, you change the destination, or you want to catch a particular event and then you decide, no, I want this other event I want to go to. So these dimensions of how many different steps are possible when you're conversing as a human with a machine makes it an extremely daunting problem, and I think it is the ultimate test for intelligence.
- LFLex Fridman
And don't you think that natural language is enough to prove that conversation?
- RPRohit Prasad
Uh...
- LFLex Fridman
Just pure conversation?
- RPRohit Prasad
From a scientific standpoint, natural language, uh, is a great test. Uh, but I would go beyond... Uh, I don't wanna l- limit it to as natural language or simply understanding an intent or parsing for entities and so forth. We are really talking about dialogue.
- 15:00 – 30:00
Mm. …
- RPRohit Prasad
Uh, so during the judging phases, uh, there are multiple phases, uh, before we get to the finals, which is a very controlled judging in a situation where we have, uh, we bring in judges and we have interactors who interact with these social bots, that is a much more controlled setting. But to- till the point we get to the finals, the... all the judging is essentially by the customers of Alexa, and there you basically rate, uh, on a simple question how good your experience was.
- LFLex Fridman
Mm.
- RPRohit Prasad
Uh, so that's where we are not testing for a 20-minute boundary being cla- uh, crossed, because you do want it to be very much like a clear-cut winner we chosen and- and it's an absolute bar. (laughs) So did you really, uh, break that 20-minute barrier is why we have to test it in a more controlled setting with actors, essentially interactors, and see how the conversation goes. So this is why it's a subtle difference between how it's being tested in the field with real customers versus in the lab to award the prize. So on the latter one, what it means is that essentially the, uh, the j- there are three judges and two of them have to say this conversation has stalled, essentially.
- LFLex Fridman
Mm-hmm. Got it, and the judges are human experts that are-
- RPRohit Prasad
Judges are human experts.
- LFLex Fridman
Okay, great. So th- th- this in the third year, so what's been the evolution? How far? So the DARPA challenge in the first year n- the autonomous vehicles-
- RPRohit Prasad
(laughs)
- LFLex Fridman
... nobody finished. In the second year, a few more finished in the desert. Uh, so how far along with- in this, I would say, much harder challenge are we?
- RPRohit Prasad
This challenge has come a long way, to the extent that, uh, we're definitely not close to the 20-minute barrier being with coherence and engaging, uh, conversation. I think we are still five to 10 years away in that horizon to complete that. Uh, but the progress is immense, like, uh, you... what you're finding is the accuracy and what kind of responses these social bots generate is getting better and better. Uh, what's even amazing to see that, uh, now there's humor coming in. The bots are quite, uh-
- LFLex Fridman
Awesome. (laughs)
- RPRohit Prasad
(laughs) You know, y- you're talking about ultimate signs of intelli- uh, signs of intelligence, I think humor is a very high bar-
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
... in terms of what it takes to create humor, uh, and I don't mean just being goofy, I really mean good sense of humor-
- LFLex Fridman
Yeah.
- RPRohit Prasad
... is also a sign of intelligence in my mind, and something very hard to do. So these social bots are now exploring not only what we think of natural language abilities, but also personality attributes and, uh, aspects of when to inject an appropriate joke, when to, uh... when you don't know the ques- uh, the domain, how you come back with something more intelligible so that you can continue the conversation if- if you and I are talking about AI and we are domain experts, we can speak to it, but if you suddenly switch a topic to that I don't know of, how do I change the conversation? So you're starting to notice these elements as well.And that's coming from... partly by (laughs) by the a- nature of the 20 minute t- challenge.
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
That people are getting quite clever on how to, uh, really converse and, m- uh, essentially mask some of the understanding defects if they exist.
- LFLex Fridman
So some of this, this is not Alexa the product, this is f- somewhat for fun, for research, for innovation, and so on. Uh, I have a question sort of in this modern era, there's a lot of... if you look at, uh, Twitter and Facebook and so on, there's- there's discourse, public discourse going on, and some things that are a little bit too edgy, people get blocked and so on. I- um, just outta curiosity, are people in this context pushing the limits? Is anyone using the F word? Is anyone s- uh, sort of pushing back, uh, sort of, uh, you know, arguing, I- I guess I should say, in- as part of the dialogue to really draw people in?
- RPRohit Prasad
First of all, let me just back up a bit-
- LFLex Fridman
Yeah.
- RPRohit Prasad
... in terms of why we are doing this, right? So, uh, you said it's fun. Uh, I think fun is, uh, more part of the, uh, engaging part for customers. It is one of the most, uh, used skills as well in our skill store. But up- that apart, the real goal was essentially what was happening is with lot of AI research moving to industry, uh, we felt that academia has the risk of not being able to have the same resources at disposal that we have, which is lots of data, massive computing power, uh, and, uh, clear, uh, ways to test these AI advances with, uh, real customer benefits. So we brought all these three together in the Alexa Prize, that's why it's one of my favorite projects in Amazon. And, uh, with that, uh, eh- the secondary effect is, yes, it has become engaging for our customers as well. Uh, we're not there in terms of where we want to- it to be (laughs) , right? But it's a huge progress. But coming back to your question on-
- LFLex Fridman
Yeah.
- RPRohit Prasad
... how do the con- conversations evolve, yes, there is some, uh, natural attributes of what you said in terms of argument and-
- LFLex Fridman
Yeah.
- RPRohit Prasad
... some amount of swearing. The way we take care of that is that there is a sensitive filter we have built that's used-
- LFLex Fridman
Certain keywords and so on.
- RPRohit Prasad
It's more than keywords, a little more in terms of, uh, uh, of course there's keyword base too, but there's more in terms of conte- these words can be very contextual as you can see.
- LFLex Fridman
Yeah.
- RPRohit Prasad
And also the topic can be, uh, something that you don't want a conversation to happen, uh, because this is a communal device as well. A lot of people use these devices. So we have put a lot of guardrails for the conversation to be more useful for, uh, advancing AI and not so much of these, uh, uh, these other issues you attributed, uh, what's happening in the AI field as well.
- LFLex Fridman
Right. So this is actually a- a serious opportunity. I didn't use the right word, fun. I think, uh, it's an open opportunity to do some- some of the best innovation in conversational agents in- in the world. I was at-
- 30:00 – 45:00
Yeah. …
- LFLex Fridman
interface, or there's a ring and so on. I mean, I'm not sure of all the flavors of, uh, the devices that Alexa lives on, but there's a minimalistic basic interface, uh-And nevertheless, we're humans so I have a Roomba, I have all kinds of robots all over everywhere. So, uh, what do you think the, uh, Alexa of the future looks like if it begins to shift what its body looks like? What, uh, what maybe beyond Alexa, what do you think are the different devices in the home as they start to embody their intelligence more and more, what do you think that looks like? Philosophically-
- RPRohit Prasad
Yeah.
- LFLex Fridman
... a fu- a future, what do you think that looks like?
- RPRohit Prasad
I think, uh, let's look at what's happening today. You mentioned, I think, our devices as in Amazon devices.
- LFLex Fridman
Yeah.
- RPRohit Prasad
But I also wanted to point out, Alexa is already integrated on a lot of third-party devices, which also come in lots of forms and shapes. Some in robots, right? Some in, uh, microwaves. (laughs) Some in appliances of, uh, that you use in e- everyday life. So, I think it is... it's not just the shape Alexa takes in terms of form factors, but it's also where all it's available. Uh, it's getting in cars, it's getting in different appliances in homes, even toothbrushes. (laughs) Right?
- LFLex Fridman
Yeah.
- RPRohit Prasad
So, I think you have to think about it as not, uh, a physical assistant. It will be in some embodiment. As you said, we already have these nice devices, uh, but I think it's also important to think of it, uh, it is a virtual assistant. It is superhuman in the sense that it is in multiple places at the same time. Uh, so I think the, uh, the actual embodiment to, in some sense, to me, doesn't matter.
- LFLex Fridman
Right.
- RPRohit Prasad
I think you have to think of it as not as human-like, and more of what its capabilities are that derive a lot of benefit for customers.
- LFLex Fridman
Right.
- RPRohit Prasad
And how there are different ways to delight it... and, uh, delight customers in different experiences. And I think I'm a big fan of it not being ins- just human-like. It should be human-like in certain situations. Alexa, prior social bot in terms of conversation is a great way to look at it. But there are other, uh, uh, scenarios where human-like I think is underselling the abilities of this AI.
- LFLex Fridman
So, if I could, uh, trivialize what we're talking about. So, if you look at the way Steve Jobs thought about the interaction with the device that the- that Apple produced, there was a- a extreme focus on controlling the experience by making sure there's only these sp- Apple-produced devices. You see the voice of Alexa being... taking all kinds of forms depending on what the customers want, and that means, uh, that means it could be anywhere from the microwave, to a vacuum cleaner, to the home, and so on. The voice is the essential element-
- RPRohit Prasad
Correct.
- LFLex Fridman
... of the interaction.
- RPRohit Prasad
I think voice is an essence. It's not all, but it's a key aspect. I think, uh, to your question, in terms of, uh, you should be able to recognize Alexa.
- LFLex Fridman
Yeah.
- RPRohit Prasad
And that's a huge problem, I think, in terms of... a huge scientific problem, I should say. Like, what are the traits? What makes it look like Alexa? Especially in different settings, and especially if it's primarily voice, what it is. But Alexa's not just voice either, right? I mean, we have devices with a screen. Uh, now you're seeing just other behaviors of Alexa. So, I think we are in very early stages of what that means, and this will be an important topic for the following years. Uh, but I do believe that being able to recognize and tell when it's Alexa versus it's not is going to be important from an Alexa perspective. I'm not speaking for the entire AI (laughs) -
- LFLex Fridman
Right.
- RPRohit Prasad
... community, but from, uh... but I think attribution, and as we go into, uh, more of understanding who did what, that identity of the AI is crucial in the coming world.
- LFLex Fridman
I think from the broad AI community perspective, that's also a fascinating problem.
- RPRohit Prasad
Mm-hmm.
- LFLex Fridman
So basically, if I close my eyes and listen to the voice, what would it take for me to recognize that this is Alexa?
- RPRohit Prasad
Exactly.
- LFLex Fridman
Or at least the Alexa that I've come to know from my personal experience in my home th- through my interactions, that kind of thing.
- RPRohit Prasad
Yeah, and the Alexa here in the US is very different than Alexa in UK and Alexa in India, even though they are all speaking English, uh, or the Australian version. Uh, so again, where... so now think about when you go into a different culture, different community, but you travel there,-
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
... what... do you recognize Alexa? I think these are super hard questions, actually.
- LFLex Fridman
So w-... there's a, there's a team that works on personality.
- RPRohit Prasad
Yeah.
- 45:00 – 1:00:00
Absolutely. So, as you…
- LFLex Fridman
do good things for the customers, so how do you think about privacy in this context, the smart assistants in the home? How do you maintain, how do you earn trust?
- RPRohit Prasad
Absolutely. So, as you said, trust is the key here.
- LFLex Fridman
Yeah.
- RPRohit Prasad
So you start with trust, and then privacy is a key aspect of it. It's, has to be designed from very beginning about that. And we believe in two, uh, fundamental principles. One is transparency, and second is control. So if you, uh, by transparency I mean, uh, when we built, uh, what is now called Smart Speaker, or the first Echo, we were quite judicious about making these right trade-offs on customers' behalf that it is pretty clear when, when the audio's being sent to cloud, the light ring comes on when it has heard you say the word, wake word, and then the streaming happens, right? So and the light ring comes up. We also had, we put a physical mute button on it, just so you're, if you didn't want it to, uh, be listening even for the wake word, then you turn the, uh, power button, uh, the mute button on, and that, uh, disables the microphones. That's just the first decision on essentially transparency and control. O- then, even when we launched, we gave the control in the hands of the customers that you can go and look at any of your individual utterances that is recorded and delete them anytime, and, uh, we have cut to, true to that promise, right? So and that is super, again, a great instance of showing how you have the control. Then we made it even easier. You can say, "Alexa, delete what I said today." So that is now making it even just, (laughs) just more control in your hands-
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
... but what's most convenient about this technology is voice. You delete it with your voice now.
- LFLex Fridman
Yeah.
- RPRohit Prasad
Uh, so these are the types of decisions we continually make. Uh, we just recently launched this feature called, uh, what we think of it as if you wanted humans not to review your tr- uh, data, uh, because sm- you've mentioned supervised learning, right?
- LFLex Fridman
Yeah.
- RPRohit Prasad
So you, in supervised learning, humans, uh, have to give some annotation. And that also is now a feature where you can, uh, essentially, if you've selected that flag, your data will not be reviewed by a human. So, these are the types of controls that we have to constantly offer with customers.
- LFLex Fridman
So, why do you think it bothers people so much that, so th- uh, so everything you just said, uh, is really powerful, so the control, the ability to delete, 'cause we collect, we have studies here running at MIT that collects huge amounts of data-
- RPRohit Prasad
Mm-hmm.
- LFLex Fridman
... and people consent and so on. Uh, the b- ability to delete that data is really empowering, and m- almost nobody ever asks to delete it, but the ability to have that control is, is really powerful. But still, you know, there's these popular anecdote, anecdotal evidence that people say, they like to tell that, uh, them and a friend were talking about something, I don't know, uh, sweaters for cats, and all of a sudden they'll have advertisements for cat sweaters-
- RPRohit Prasad
Mm-hmm.
- LFLex Fridman
... on Amazon. There's that, that's a popular anecdote, as if something is always listening. What, can you explain that anecdote, that experience that people have? What's the psychology of that?
- RPRohit Prasad
Mm-hmm.
- LFLex Fridman
What's that experience? Uh, and can you ... You've answered it, but let me just ask, is Alexa listening? (laughs)
- RPRohit Prasad
No, Alexa listens only for the wake word on the device, right? Uh-
- LFLex Fridman
And a wake word is ...
- RPRohit Prasad
The words like, "Alexa," "Amazon," "Echo," and you, uh, but you only choose one at a time.
- LFLex Fridman
Yeah.
- RPRohit Prasad
So you choose one and it listens only for that on our devices.
- LFLex Fridman
Okay.
- RPRohit Prasad
So that's first. From a listening perspective, we have to be very clear that it's just a wake word. So you said, "Why is there, uh, this anxiety?" If you may.
- LFLex Fridman
Yeah, exactly.
- RPRohit Prasad
It's because there's lot of confusion what (laughs) it really listens to, right? And you, uh, and I think it's partly on us to keep educating, uh, our, our customers and the general media more in terms of like how, what really happens, and we have done, uh, a lot of it. Uh, and w- uh, our pages on information are clear, but still people have to have, uh, more... There's always a hunger for information (laughs) -
- LFLex Fridman
Yeah.
- RPRohit Prasad
... and clarity. And we'll constantly look at how best to communicate. If you go back and read everything, yes, it states exactly that, uh, and then people could still question it. And I think that's absolutely okay to question. Uh, what we have to make sure is that we are, uh, because our fundamental philosophy is customer first, customer obsession is our leadership principle, if you put... As researchers, I put myself in the f- uh, shoes of the customer, and all decisions in Amazon are made with that in line, so. And trust has to be earned and we have to keep earning (laughs) the trust of our customers in this setting. Uh, and to your other point on like, uh, is there something showing up based on your conversations? No.
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
I think the answer is like you, uh, a lot of times when those experiences happen, you have to also be, know that, okay, it may be a winter season. (laughs) People are looking for sweaters, right?
- 1:00:00 – 1:15:00
So first of all,…
- RPRohit Prasad
uh, you know, was linear, uh, in terms of like in the amount of data. So that was quite important work, uh, where it was algorithmic improvements as well as a lot of engineering improvements to be able to train on thousands and thousand o- of speech. Uh, and that was an important factor. So the, uh... if you ask me, like, in back in 2013 and 2014 when we, uh, launched Echo, uh, the combination of large-scale data, uh, deep learning progress, near infinite GPU s- we had available on AWS even then, was all came together for us to be able to solve the far-field speech recognition to the extent it could be useful to the customers. It's still not solved, like, I mean, it's not that we are perfect at recognizing speech, but we are great at it in terms of the settings that are, uh, in homes, right? So, and that was important even in the early stages.
- LFLex Fridman
So first of all, just even, uh, I'm trying to look back at that time, uh...If I remember correctly, uh, the, it was, it seems like the task would be pretty daunting.
- RPRohit Prasad
(laughs)
- LFLex Fridman
So like, uh, so we kinda take it for granted that it works now.
- RPRohit Prasad
Yes. So, you're right.
- LFLex Fridman
Uh, so let me, like how... First of all, you mentioned startup. I wasn't familiar how big the team was. I kinda... 'cause I know there's a lotta really smart people working on it.
- RPRohit Prasad
Yeah.
- LFLex Fridman
So now it's very, very large team. (laughs) How big was the team? How likely were you to fail in the eyes of everyone else? (laughs)
- RPRohit Prasad
(laughs) And ourselves. (laughs)
- LFLex Fridman
And, and, and yourself. So like what-
- RPRohit Prasad
I'll give you a very interesting-
- LFLex Fridman
Yeah.
- RPRohit Prasad
... anecdote on that. When I joined the team, esp- the speech recognition team was six people. My first meeting, and we had hi- hired a few more people, it was 10 people, nine out of 10 people thought it can't be done.
- LFLex Fridman
(laughs)
- RPRohit Prasad
Right?
- LFLex Fridman
Who was the one? (laughs)
- RPRohit Prasad
(laughs) The one was me saying-
- LFLex Fridman
Okay. (laughs)
- RPRohit Prasad
And actually I should say, and one was semi-optimistic.
- LFLex Fridman
Yeah. (laughs) In the face.
- RPRohit Prasad
Uh, and the, uh, and eight were trying to convince, "Let's go to the management and say, 'Let's not work on this problem. Let's work on some other problem like, uh,"-
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
"... either telephony speech for customer service calls and so forth." But this is the kind of belief you must have, and I had experience with far-field speech recognition, and I, my eyes lit up when I saw a problem like that, saying, "Okay, we have been in speech recognition, always looking for that killer app."
- LFLex Fridman
Yeah.
- RPRohit Prasad
And this was a killer use case to bring something delightful in the hands of customers.
- LFLex Fridman
So you mentioned, uh, you, the way you kinda think of it in the product way in the future, have a press release and an FAQ, and you think backwards.
- RPRohit Prasad
That's right.
- LFLex Fridman
Did you have, did the team have the Echo in mind? Th- so this far-field-
- RPRohit Prasad
Yeah.
- LFLex Fridman
... speech recognition, could actually putting a thing in the home that works, that's able to interact with, was that the press release? What was the...
- 1:15:00 – 1:20:40
Yeah. …
- RPRohit Prasad
"Alexa, what movies are playing nearby?" Am I trying to just buy movie tickets? Am I actually even ... do you think I'm looking for just movies for curiosity, whether the-
- LFLex Fridman
Yeah.
- RPRohit Prasad
... Avengers is still in theater or when is (laughs) maybe it's gone, and maybe it will come on my ... missed it, so I may watch it on Prime? (laughs)
- LFLex Fridman
Yeah. Right.
- RPRohit Prasad
Which happened to me.
- LFLex Fridman
Yeah. (laughs)
- RPRohit Prasad
Uh, so, uh, so from that perspective now, you're looking into, what is my goal? And let's say I now complete the movie ticket purchase. Maybe I would like to get dinner nearby.
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
Uh, so what is really the goal here? Is it night out or is it movies, as in just go watch a movie?
- LFLex Fridman
Yeah.
- RPRohit Prasad
The answer is we don't know. So, can Alexa now figuratively have the, uh, intelligence that I think this meta goal is really night out, or at least say to the customer, "When you've completed the purchase of movie tickets from ATOM Tickets or Fandango or Pick Your or any one, then the next thing is do you want to, uh, get to, uh, get a Uber to the theater?" Right?
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
Or, uh, "Do you want to book a restaurant next to it?" And, uh, and then not ask the same information over and over again, what time? (laughs)
- LFLex Fridman
Yeah.
- RPRohit Prasad
What, uh, uh, how many people in your party, right? So, uh, so this is where you shift the cognitive burden from the customer to the AI, where it's thinking the, of what is your, uh, it anticipates your goal and takes the next best action to complete it. Now, that's the machine learning problem. Uh, s- but essentially you're, uh, the way we solved this first instance, and we have, uh, got a long way to go to make it-
- LFLex Fridman
Yeah.
- RPRohit Prasad
... scale to everything possible in the world, but at least for this situation, it is from, uh, at every instance, Alexa is making the determination whether it should stick with the experience with ATOM Tickets or offer, uh-
- LFLex Fridman
Mm-hmm.
- RPRohit Prasad
... or you, based on what you say, whether either you have completed the interaction or you said, "No, get me an Uber now," so it will shift context into another experience or skill or another service. So, that's a dynamic decision-making. That's making Alexa, you can say, more conversational for the benefit of the customer, rather than simply complete transactions which are well thought through, you have i- uh, you as a customer has fully specified what you want to be accomplished.
- LFLex Fridman
Yeah.
- RPRohit Prasad
It's accomplishing that.
- LFLex Fridman
So, it's kinda as, uh, in, uh, we do this with pedestrians, right, intent modeling. It's, uh, predicting w- what your possible goals are-
- RPRohit Prasad
Exactly.
- LFLex Fridman
... and what's the most likely goal and then switching that depending on the things you say. So, the ... my question is there ... it seems ... maybe it's a dumb question, but it would help a lot if Alexa remembered me, what I said previously.
- RPRohit Prasad
Right. So-
- LFLex Fridman
Is, is it, is it trying to use some memory for the customer's purpose?
- RPRohit Prasad
So, it- yeah, it is using a lot of memory within that. So, right now not so much in terms of, okay, which restaurant do you prefer-
- LFLex Fridman
Right.
- RPRohit Prasad
... right? That is a more long-term memory. But within the short-term memory, within the session, it is remembering how many people did you ... so if you said-
- LFLex Fridman
Oh, with the-
Episode duration: 1:45:57
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode Ad89JYS-uZM
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome