Dawn Song: Adversarial Machine Learning and Computer Security | Lex Fridman Podcast #95

Dawn Song is a professor of computer science at UC Berkeley with research interests in security, most recently with a focus on the intersection between computer security and machine learning. Support this podcast by signing up with these sponsors: - Cash App - use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Dawn's Twitter: https://twitter.com/dawnsongtweets Dawn's Website: https://people.eecs.berkeley.edu/~dawnsong/ Oasis Labs: https://www.oasislabs.com Oasis Labs Twitter: https://twitter.com/OasisLabs PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41 OUTLINE: 0:00 - Introduction 1:53 - Will software always have security vulnerabilities? 9:06 - Human are the weakest link in security 16:50 - Adversarial machine learning 51:27 - Adversarial attacks on Tesla Autopilot and self-driving cars 57:33 - Privacy attacks 1:05:47 - Ownership of data 1:22:13 - Blockchain and cryptocurrency 1:32:13 - Program synthesis 1:44:57 - A journey from physics to computer science 1:56:03 - US and China 1:58:19 - Transformative moment 2:00:02 - Meaning of life CONNECT: - Subscribe to this YouTube channel - Twitter: https://twitter.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/LexFridmanPage - Instagram: https://www.instagram.com/lexfridman - Medium: https://medium.com/@lexfridman - Support on Patreon: https://www.patreon.com/lexfridman

Lex FridmanhostDawn Songguest

May 12, 20202h 12mWatch on YouTube ↗

EVERY SPOKEN WORD

150 min read · 30,081 words

0:00 – 1:53
Introduction
1. LFLex Fridman
  The following is a conversation with Dawn Song, a professor of computer science at UC Berkeley, with research interests in computer security, most recently, with a focus on the intersection between security and machine learning. This conversation was recorded before the outbreak of the pandemic. For everyone feeling the medical, psychological, and financial burden of this crisis, I'm sending love your way. Stay strong. We're in this together. We'll beat this thing. This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, review it with five stars on Apple Podcasts, support it on Patreon, or simply connect with me on Twitter @lexfridman, spelled F-R-I-D-M-A-N. As usual, I'll do a few minutes of ads now, and never any ads in the middle that can break the flow of the conversation. I hope that works for you and doesn't hurt the listening experience. This show is presented by Cash App, the number one finance app in the App Store. When you get it, use code LEXPODCAST. Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1. Since Cash App does fractional share trading, let me mention that the order execution algorithm that works behind the scenes to create the abstraction of fractional orders is an algorithmic marvel. So big props to the Cash App engineers for solving a hard problem that, in the end, provides an easy interface that takes a step up to the next layer of abstraction over the stock market, making trading more accessible for new investors and diversification much easier. So again, if you get Cash App from the App Store or Google Play, and use the code LEXPODCAST, you get $10, and Cash App will also donate $10 to FIRST, an organization that is helping to advance robotics and STEM education for young people around the world. And now, here's my conversation with Dawn Song.
1:53 – 9:06
Will software always have security vulnerabilities?
1. LFLex Fridman
  Do you think software systems will always have security vulnerabilities? Let's start at the broad, almost philosophical level.
2. DSDawn Song
  That's a very good question. I mean, in general, right, it's very difficult to write completely bug-free code, uh, and code that has no vulnerability, and also especially given that the definition of vulnerability is actually really broad. It's any type of attacks, uh, essentially on the code can, you know, that's- can, you can call that, uh, the cause by vulnerabilities.
3. LFLex Fridman
  And the nature of attacks is always changing as well?
4. DSDawn Song
  Right.
5. LFLex Fridman
  Like new ones are coming up?
6. DSDawn Song
  Right. So for example, in the past, we talked about memory safety type of vulnerabilities, where, uh, essentially attackers can exploit, um, the software and then take over control of how the code runs, and then can launch attacks that way.
7. LFLex Fridman
  By accessing some aspect of the memory, and be able to then, uh, alter the state of the program?
8. DSDawn Song
  Exactly. So for example, in the example of a buffer overflow, then the- the attacker essentially actually causes, uh, essentially unintended changes in the state of the- of the program, and then, for example, can then take over control flow of the program and lead the program to execute, uh, codes that actually they- the programmer didn't intend. So the attack can be a remote attack. So the- the attacker, for example, can- can send in a malicious input to the program that just causes the program to completely then be compromised and then end up doing something that's under the program- uh, under the attacker's control and, uh, intention. But that's just one form of attacks, and there are other forms of attacks. Like, uh, for example, there are these side channels where attackers can try to learn from, uh, even just observing the outputs, uh, from the behaviors of the program, try to infer certain secrets of the program. So they, uh, essentially, right, the form of attacks is very varied. It's very broad, uh, spectrum. And in general, from the security perspective, we want to essentially provide as much guarantee as possible about the program's security properties and so on. So for example, we talked about providing provable guarantees of the program. Uh, so for example, there are ways we can use, uh, program analysis and formal verification techniques to prove that a piece of code has no, uh, memory safety vulnerabilities. Um-
9. LFLex Fridman
  What does that look like? What does that prove? Is that just a dream, uh, for, that's applicable to small case examples, or is that possible to do to r- for real world systems?
10. DSDawn Song
  So actually, I mean, today, uh, actually We are entering the era of formally verified systems.
11. LFLex Fridman
  Mm-hmm.
12. DSDawn Song
  Uh, so in the community, uh, we have been working for the past decades in developing techniques and tools, um, to do this type of program verification. And, and we have dedicated teams that have dedicated, you know, their, like years, uh, sometimes even decades of, uh, their work in this space. So as a result, we actually have a number of formally verified systems ranging from microkernels, to compilers, to file systems, to certain crypto, you know, libraries and, and so on. Um, so it's actually really wide-ranging and it's really exciting to see that people are recognizing the importance of having these formally verified systems with verified security. Um, so that's great advancement that we see. But on the other hand, I think we do need to take all these in essentially with- with caution as well in the sense that, just like I said, the- the- the, uh, the type of vulnerabilities is very varied. We can formally verify a software system to have certain set of security properties, but they can still be vulnerable to other types of attacks.
13. LFLex Fridman
  Right.
14. DSDawn Song
  And hence, it's, uh, that we continue need to make progress in the, uh, in the space.
15. LFLex Fridman
  So just a quick, uh, to linger on the formal verification. Is that something you can do by...... looking at the code alone? Or is it something you have to run the code to, uh, to prove something? So, empirical verification. Can you look at the code, just the code?
16. DSDawn Song
  So, that's a very g- very good question. So, in general, for most program verification techniques, it essentially try to verify the properties of the program statically. And there are reasons for that too. Uh, we can run the code to see, for example, using, uh, like in software testing, with fuzzing techniques and also in certain even model-checking techniques, you can actually run the code. Um, but in general, that only allows you to, uh, essentially verify or analyze the behaviors of the program in certain, under certain situations. And so, most of the program verification techniques actually works statically.
17. LFLex Fridman
  What does statically mean? Static-
18. DSDawn Song
  Meaning without running the code.
19. LFLex Fridman
  Without running the code. Yep. Uh, so, but sort of, to return to the big question, if we can stay on it for a little bit longer, do you think there will always be security vulnerabilities? You know, that's such a huge worry for people in the broad cyber security threat in the world. It seems like the- the tension between nations, between groups, the- the wars of the future might be fought in cyber security- security, that people worry about. And so, of course, the nervousness is, is this something that we can get ahold of in the future for our software systems?
20. DSDawn Song
  So, there is a very, uh, funny quote saying, uh, "Security is job security."
21. LFLex Fridman
  (laughs)
22. DSDawn Song
  (laughs) So, right.
23. LFLex Fridman
  Yeah.
24. DSDawn Song
  I think that essentially answers your question.
25. LFLex Fridman
  Yeah.
26. DSDawn Song
  Um, right. We, uh, we strive to make, uh, progress in building, uh, more secure systems and also making it easier and easier to build secure systems. Um, but given, uh, the diversity, uh, the- the varied, uh, nature of attacks, uh, and also the interesting thing about security is that, um, uh, unlike in most other fields, essentially you are trying to, how should I put it? Um, prove a statement true.
27. LFLex Fridman
  Mm-hmm.
28. DSDawn Song
  Uh, but in this case, we are trying to say that there is no attacks.
29. LFLex Fridman
  Mm-hmm.
30. DSDawn Song
  So, even just the statement itself is not very well-defined, uh, again, given, you know, how varied the nature of the attacks can be. And hence, there's a challenge of security, uh, and then also then naturally, essentially, it's almost impossible to say that something, a real world system is 100% no security vulnerabilities.
9:06 – 16:50
Human are the weakest link in security
1. LFLex Fridman
  is there a particular security vulnerability that worries you the most, that you think about the most, in terms of it being a really hard problem and a really important problem to solve?
2. DSDawn Song
  So, it is very interesting. Uh, so I have in the past have worked, uh, essentially through the, all, through the different stacks in the systems, um, working on networking security, software security, and even in software security there is, I worked on program binary, uh, security, and then web security, mobile security. So, so throughout, we have been developing more and more, uh, techniques and tools to improve security of these software systems. And as a consequence, actually it's a very interesting thing that we are seeing, uh, interesting trends that we are seeing, is that the attacks are actually moving more and more from the systems itself-
3. LFLex Fridman
  Yeah.
4. DSDawn Song
  ... towards to humans.
5. LFLex Fridman
  So, it's moving up the stack.
6. DSDawn Song
  It's moving up the stack.
7. LFLex Fridman
  That's fascinating.
8. DSDawn Song
  And, and also, it's moving more and more towards what we call the weakest link. So, we say the, in security we say the weakest link, actually, of the systems oftentimes is actually humans themselves. Um, so a lot of attacks, for example, the attacker either through social engineering, uh, from these other methods, they actually attack the humans and then attack the systems. So, we actually have, uh, projects that actually works on how to use AI machine learning to help, uh, humans-
9. LFLex Fridman
  Oh, interesting.
10. DSDawn Song
  ... to defend against these type of attacks.
11. LFLex Fridman
  So, so yeah. So, if we look at humans as security vulnerabilities, is there, is, is there methods, is that what you're kind of referring to? Is there hope or methodology for, uh, patching the humans? (laughs)
12. DSDawn Song
  I think in the future, this going to be really more and more of a serious issue. Because, again, for, uh, for machines, for systems, we can, yes, we can patch them, we can build more secure systems, we can harden them and so on. But humans, a- actually, we don't have a way to say, do a software upgrade or do a hardware (laughs) change for humans. And so, for example, right now, um, we, you know, we already see, uh, different types of attacks. Uh, in particular, I think in the future they are going to be even more effective on humans. So, as I mentioned, social engineering attacks, like these phishing attacks, attackers, uh, just get humans to provide their passwords. And there have been instances where even places like, um, Google and other places, um, that's supposed to have really good security, people there have been phished to actually wire money to attackers. (laughs)
13. LFLex Fridman
  (laughs) Yeah.
14. DSDawn Song
  It's crazy. And then also, we talk about this deepfake and fake news. So, these essentially are there to target humans, to manipulate humans', uh, opinions, uh, uh, uh, perceptions and so on. Um, so I think in going to the future, these are going to become more and more, uh, severe issues for us.
15. LFLex Fridman
  Further and further up the stack?
16. DSDawn Song
  Yes, yes.
17. LFLex Fridman
  So, so you see kind of social engineering, automated social engineering as a kind of security vulnerability?
18. DSDawn Song
  Oh, absolutely.
19. LFLex Fridman
  Fascinating.
20. DSDawn Song
  And again, uh, given that humans are the weakest link to the system, I, I would say this is the type of attacks that I would be (laughs) worried.
21. LFLex Fridman
  Most worried about? Oh, that's fascinating. Okay, so- (laughs)
22. DSDawn Song
  And that's why when we talk about AI sites, also we need the AI to help humans too. As I mentioned, we have some projects in this space that actually helps on that.
23. LFLex Fridman
  Can, can you maybe, can we go there for a bit?
24. DSDawn Song
  Sure, sure, sure.
25. LFLex Fridman
  Do you have, uh, what are some ideas to help humans?
26. DSDawn Song
  So one of the pro- right, so one of the projects we are working on is actually using NLP and chatbot, uh, techniques to help humans. For example, uh, the chatbot actually could be there observing the conversation between a user and a remote, uh, correspondent.
27. LFLex Fridman
  Mm-hmm.
28. DSDawn Song
  And then the chatbot could be there to try to, uh, observe to see whether the correspondent is potentially a, an attacker.
29. LFLex Fridman
  Mm-hmm.
30. DSDawn Song
  For example, in some of the phishing attacks, the attacker claims to be a relative of the user, and the, and the relative got lost in London and his, you know, wallets had been stolen, he had no money. Asks the user to wire money to, to send money to the attacker.
16:50 – 51:27
Adversarial machine learning
1. LFLex Fridman
  Another fascinating topic you work on is, again, also non-traditional to think of it as security vulnerability, but I guess it is, is adversarial machine learning.
2. DSDawn Song
  Mm-hmm.
3. LFLex Fridman
  Is basically, again, high up the stack, being able to attack the, the accuracy, the performance of the s- of machine learning systems by manipulating some aspect perhaps. Perhaps you can clarify, but I guess the traditional way, the main way is to manipulate some of the input data to make the p- the output something totally not representative of, um, the semantic content of the input.
4. DSDawn Song
  Right, so in this adversarial machine learning, essentially attackers, the goal is to fool the machine learning system i- into making the wrong decision.
5. LFLex Fridman
  Wrong decision.
6. DSDawn Song
  And the attack can actually happen at different stages. It can happen at the inference stage where the attacker can manipulate the inputs, add perturbations, malicious perturbations to the inputs to cause the machine learning system to give the wrong prediction and, and so on. Uh-
7. LFLex Fridman
  So just to pause, what are perturbations?
8. DSDawn Song
  Oh, so essentially changes to the inputs, for example.
9. LFLex Fridman
  So some subtle changes-
10. DSDawn Song
  Right.
11. LFLex Fridman
  ... messing with the changes to try-
12. DSDawn Song
  Right.
13. LFLex Fridman
  ... to get a very different output.
14. DSDawn Song
  Right. So for example, uh, the canonical like adversarial example, uh, type is you have an image, you add really small perturbations, changes to the image. It can be so subtle that to human eyes it's hard to, it's even imperceptible, uh, imperceptible to human eyes. Uh, but for the, uh, for the machine learning system then-... the one without the perturbation, the machine learning system can give the wrong, uh, can give the correct classification-
15. LFLex Fridman
  Right.
16. DSDawn Song
  ... for example. Um, but for the perturbed division, the machine learning system will give a completely wrong classification. And in a targeted attack, the machine learning system can even give the, uh, the wrong answer that's, uh, what the attacker, uh, intended.
17. LFLex Fridman
  So not just the wro-
18. DSDawn Song
  Not just any wrong-
19. LFLex Fridman
  So not just any wrong answer, but, like, change the answer to something that will benefit the attacker?
20. DSDawn Song
  Yes.
21. LFLex Fridman
  So that's at the, at the inference stage.
22. DSDawn Song
  Right, right.
23. LFLex Fridman
  Uh, so, yeah-
24. DSDawn Song
  Then-
25. LFLex Fridman
  ... what e- what else?
26. DSDawn Song
  Right. So attacks can also happen at the training stage where the attacker, for example, can provide, um, poisoned, uh, data, training data sets, uh, or training data points to cause a machine learning system to learn the wrong model. And, uh, we also have done some work showing that you can actually do this, we call it a backdoor attack, whereby feeding these poisoned, uh, data points, uh, to the machine learning system, the, the machine learning system can, will learn a wrong model. Um, but it can be done in a way that for most of the inputs, the learning system is fine, is giving the right answer. But on specific, because of the trigger inputs, for specific inputs chosen by the attacker, it can actually, only under these situations the learning system will give the wrong answer and oftentimes the targeted answer designed by the attacker. So in this case, actually, the attack is really stealthy. So for example, in the, you know, work that Wei does, even when you're human, like, even when humans visually reviewing, uh, these training, the training data sets, actually it's very difficult for humans to see some of these, um, attacks. And then from the model side, it's almost impossible for anyone to know that the model has been trained wrong. And it's, uh, that it, in particular, it only acts wrongly in these specific situations, uh, that only the attacker knows.
27. LFLex Fridman
  So first of all, that's fascinating. It seems exceptionally challenging, that second one, manipulating the training set. So can you, can you, uh, help me get a little bit of an intuition on how hard of a problem that is? So can you... How much of the training set has to be messed with to try to get control? Is this a, is this a huge effort or can a few examples mess everything up?
28. DSDawn Song
  That's a very good, uh, question. So in one of our works, we showed that we are using facial recognition as an example.
29. LFLex Fridman
  So facial recognition?
30. DSDawn Song
  Yes, yes. Uh, so in this case, you give images of, uh, of people and then the machine learning system need to classify, like who it is. And in this case, we show that, uh, uh, using this type of, uh, uh, backdoor or, uh, poison data, uh, training data point attacks, attackers only actually need to insert a very small number of, uh, poisoned data points, uh, to actually be sufficient to fool the learning system into learning the wrong model.
51:27 – 57:33
Adversarial attacks on Tesla Autopilot and self-driving cars
1. LFLex Fridman
  Let's, let's talk about another space that people have some concern about, which is autonomous driving, as sort of security concerns. Um, that's another real world system. So, w- do you have... Should people be worried about adversarial machine learning attacks in the context of autonomous vehicles that use, like, Tesla Autopilot, for example, that uses vision as a primary sensor for perceiving the world and navigating in that world? What do you think? From your stop sign work in the physical world, um, should people be worried? How hard is that attack?
2. DSDawn Song
  So, actually, there has already been, like, there, uh, there have always been, um, like research shown that's, for example, actually even with Tesla, like, if you put a few stickers on the road, it can actually, when it's arranged in certain ways, it can fool the... (laughs)
3. LFLex Fridman
  That's right, but I-
4. DSDawn Song
  The-
5. LFLex Fridman
  ... I don't think it's actually been... I- I'm not, I might not be familiar, but I don't think it's been done on physical worlds, uh, physical roads yet. Meaning, I think it's with a projector in front of the Tesla. So, it's a, it's a physical, so s- so you're on the other sen- side of the sensor, but you're not in still the physical world. The, the question is whether it's possible to orchestrate attacks that work in the actual phys- like, end-to-end attacks. Like, not just a demonstration of the concept, but thinking is it possible on the highway to control a Tesla?
6. DSDawn Song
  I think-
7. LFLex Fridman
  That kind of idea.
8. DSDawn Song
  I, I think there are two separate questions. One is the feasibility of the attack, and-
9. LFLex Fridman
  Yeah.
10. DSDawn Song
  ... I'm 100% confident that the-
11. LFLex Fridman
  It's possible.
12. DSDawn Song
  ... attack is possible. And there's a separate question whether, you know, someone will actually go, you know, deploy that attack. I, I hope people do not do that-
13. LFLex Fridman
  Yeah. But hold, hold on a second.
14. DSDawn Song
  ... uh, but that's, uh, two separate questions.
15. LFLex Fridman
  So, the question on the word feasibility.
16. DSDawn Song
  Mm-hmm.
17. LFLex Fridman
  The... So, to clarify, feasibility means it's possible. It doesn't say how hard it is, because, uh, in the, to implement it.
18. DSDawn Song
  Mm-hmm.
19. LFLex Fridman
  So, sort of the, the barrier, like how (laughs) , how much of a heist it has to be. Like, how many people have to be involved? What is the probability of success? That kind of stuff. And couple with how many evil people there are in the world that would attempt such an attack, right? That, but the two... My, my question is, is it sort of, um, h- y- you know, uh, w- when I talked to Elon Musk and asked the same question, he says it's not a problem. It's very difficult to do in the real world. That, you know, this won't be a problem. He dismissed it as a problem for adversarial attacks on the Tesla. Of course, he happens to be involved with the company-
20. DSDawn Song
  (laughs)
21. LFLex Fridman
  ... so he has to say that. But let me, let me linger in on it a little longer. Do you... So you s- h- where does your confidence that it's feasible come from, and what's your intuition how people should be worried and w- how we might be de- how people should defend against it? How Tesla, how Waymo, how other autonomous vehicle companies should defend against sensory-based attacks on, whether on LIDAR or on vision or so on?
22. DSDawn Song
  And also, even for LIDAR, actually, there has been research shown-
23. LFLex Fridman
  Yes.
24. DSDawn Song
  ... that even LIDAR itself (laughs) can-
25. LFLex Fridman
  No, no, no, no. But see, it-
26. DSDawn Song
  ... can be attacked as well. (laughs)
27. LFLex Fridman
  ... it's, it's really important to pause. There's really nice demonstrations-
28. DSDawn Song
  Mm-hmm.
29. LFLex Fridman
  ... of that it's possible to do, but there's so many pieces that it's kind of like, um...It's in- it's kind of in the lab. Now, it's in the physical world, meaning it's in a physical space, the attacks, but it's very, like, you have to control a lot of things. To pull it off, uh, it's like the difference between opening a safe when you have it-
30. DSDawn Song
  (laughs)
57:33 – 1:05:47
Privacy attacks
1. DSDawn Song
2. LFLex Fridman
  So another thing, another part of your work has been in the space of privacy. And that too can be seen as a kind of, um, security vulnerability. So- so thinking of data as a thing that should be protected and the vulnerabilities to data is vulnerabilities is essentially the thing that you want to protect is the privacy of that data. So-
3. DSDawn Song
  Mm-hmm.
4. LFLex Fridman
  ... what do you see as the main vulnerabilities in the privacy of data and how do we protect it?
5. DSDawn Song
  Right. So in sec- in security, we actually talk about essentially two, in this case, two different properties. One is integrity and one is confidentiality. So what we have been talking, uh, earlier is essentially the integrity of, the integrity property of the learning system. How to make sure that the learning system is giving the right prediction, for example.
6. LFLex Fridman
  Mm-hmm.
7. DSDawn Song
  And privacy essentially is on the other side, is about confidentiality of the system, is how attackers can, when the attackers compromise the, uh, confidentiality of the system, that's when the attacker steal sensitive information, um, right, about individuals and so on.
8. LFLex Fridman
  That's really clean. Those are, those are great terms. Integrity and confidentiality.
9. DSDawn Song
  Right.
10. LFLex Fridman
  So how, what are the main vulnerabilities to privacy, would you say, and how do we protect against it? Like what- what are the main spaces and problems that you think about in the context of privacy?
11. DSDawn Song
  Right. So, um, especially in the machine learning setting, um, so in this case, as we know that how the process goes is that we have the training data and then the machine learning system, uh, trains from this training data and then builds a model, and then later on inputs are given to the model to, at inference time, to try to get prediction and so on. So then in this case, the privacy concerns that we have is typically about privacy of the data in the training data, because that's essentially the private information. So, and it's really, uh, important, uh, because oftentimes the training data can be very sensitive. It can be your financial data, it's your health data, or like in our key case, it's the sensors deployed in real world environment and so on. And all this can be collecting very sensitive information. And all the sensitive information gets fed into the learning system and trained, and as we know, uh, these neural networks, they can have really high capacity and they actually can remember a lot. And hence, just from the learning- the learned model in the end, actually attackers can potentially infer information about, uh, the original training, um, dataset.
12. LFLex Fridman
  So the thing you're trying to protect here is the, um, confidentiality of the training data. And so what are the methods for doing that, would you say? What- what are the different ways that can be done?
13. DSDawn Song
  Mm-hmm, mm-hmm. And also we can talk about essentially how the attacker may try to-
14. LFLex Fridman
  Yeah, that's true.
15. DSDawn Song
  ... re- learn information from the-
16. LFLex Fridman
  How it-
17. DSDawn Song
  Right. So- so, and also there are different types of attacks. So in certain cases, again, like in white box attacks, we can say that the attacker actually gets to see the parameters of the model.
18. LFLex Fridman
  Mm-hmm.
19. DSDawn Song
  And then from that, the-... a smart attacker potentially can try to figure out information about the training data set. They can try to figure out what type of data has been in the training data set, and sometimes they can tell, like, whether a person has been, um, a- a p- particular person's data point has been used in the training data sets as well.
20. LFLex Fridman
  So, uh, so white box meaning you have access to the parameters of, say, a neural network?
21. DSDawn Song
  Right.
22. LFLex Fridman
  And so then you're saying that it's some-
23. DSDawn Song
  And-
24. LFLex Fridman
  Uh, given that information, it's possible to some-
25. DSDawn Song
  So, I can give you some examples.
26. LFLex Fridman
  Yeah.
27. DSDawn Song
  And another type of attack which is even easier to carry out is not a white box model, it's a more of a just a query model where the attacker only gets to query the machine learning model-
28. LFLex Fridman
  Mm-hmm.
29. DSDawn Song
  ... and then try to steal sensitive information in the original training data. So, right, so I can give you an example, uh, in this case, uh, training a language model.
30. LFLex Fridman
  Mm-hmm.
1:05:47 – 1:22:13
Ownership of data
1. LFLex Fridman
  okay, so that's the attacks and the defense of privacy. Uh, you also talk about ownership of data. So, this- this is a really interesting idea that we get to use many services online for seemingly for free, uh, by essentially, uh, sort of a lot of companies are funded through advertisement, and what that means is that the advertisement works exceptionally well because the companies are able to access our personal data, so they know which advertisement to serve us to do targeted advertisements and so on. So, can you maybe talk about the- this w- y- you have some nice paintings of the future, philosophically speaking, future where, uh, people can have a little bit more control of their data by owning and maybe understanding the value of their data and being able to sort of, um, monetize it in a more explicit way, as opposed to the implicit way that it's currently done.
2. DSDawn Song
  Yeah, I think this is a fascinating topic and also a really complex topic. Um, right. It's, I think it- there are these natural questions. Who should be, uh, owning the data? Um, and so I can draw one analogy. Um, so for example, for physical properties, um, like your house and so on. So, really, um, this notion of property rights is not just, you know-... like, it's not like from day one we knew that there should be like this clear notion of ownership of properties, and having enforcement for this. And so actually, um, people have shown that, um, this establishment and enforcement of property rights has been a main driver for the, for the, uh, for, for the economy earlier, and that actually really propelled, uh, the economic growth, um, even, uh, right, in, in the earlier stage.
3. LFLex Fridman
  So, so-
4. DSDawn Song
  And-
5. LFLex Fridman
  ... for, through- throughout the history of the development of, of the United States, there, or actually just civilization, the idea of property rights, that you can own property-
6. DSDawn Song
  Right, and, and then there is-
7. LFLex Fridman
  ... is-
8. DSDawn Song
  ... enforcement, there is-
9. LFLex Fridman
  Enforcement of the law.
10. DSDawn Song
  ... institutional rights, like governmental, like, enforcement of this, uh, actually has been a key driver for economic growth. And there have been, uh, even research or proposals saying that for a lot of the developing countries, uh, they're, um, you know, essentially the challenge in growth is not actually due to the lack of capital, it's more due to the lack of this, uh, prop- no- notion of property rights and enforcement of property rights.
11. LFLex Fridman
  Interesting. So that, uh, the presence of absence of both the, the, the, the concept of the property rights and their enforcement has a lo- a strong correlation to economic growth?
12. DSDawn Song
  Right. Right. So-
13. LFLex Fridman
  And so, you think that that same could be transferred to the idea of property ownership in the case of data ownership?
14. DSDawn Song
  I think it's a ... I think it's, first of all, it's a good lesson for us to-
15. LFLex Fridman
  Right.
16. DSDawn Song
  ... right, to recognize that these rights and the recognition and enforcement of these type of rights is very, very important for economic growth. And then if we look at where we are now and where we are going in the future, um, so essentially more and more is, uh, is actually moving into the digital world.
17. LFLex Fridman
  Mm-hmm.
18. DSDawn Song
  And also, more and more, I would say, even, like, information or asset of a person is more and more into the real world, uh, the, the physical, uh, the, the, sorry, the digital world as well. It's the data that the, the person has generated. Um, essentially, it's like in the past, what defines a person? You, you can say, right, like, oftentimes, besides the innate, uh, yeah, capabilities, uh, actually it's the physical properties, assets-
19. LFLex Fridman
  House, car ...
20. DSDawn Song
  Right, that defines a person.
21. LFLex Fridman
  Yep.
22. DSDawn Song
  But I think more and more people start to realize, actually what defines a person is more and more in the data that the person has generated or the data about the person.
23. LFLex Fridman
  So-
24. DSDawn Song
  Like, all the way from your political views, your, your music taste, and, uh, right, your financial information. They're all, uh, a lot of these, uh, your health. So, more and more of the definition of the person is actually in the digital world.
25. LFLex Fridman
  And currently, for the most part, that's owned implicitly, like, it's unsp- uh, people don't talk about it, but kind of it's owned by, uh, i- i- internet companies. So it's not owned by individuals.
26. DSDawn Song
  Right, there's no clear notion of ownership of such data. And also, we, you know, we talk about privacy and so on, but I think actually clearly identifying the ownership is the first step. Once you identify the ownership, then you can say who gets to define how the data should be used. So maybe some users are fine with, um, you know, internet companies serving them ads (laughs) , right? Using their data. As long as if the, if the data is used in a certain way that actually, uh, l- the, the, the user consents with or allows. For example, you can say the recommendation system in some sense, we don't call it a ad, but a recommendation system, similarly it's trying to recommend you something.
27. LFLex Fridman
  Mm-hmm.
28. DSDawn Song
  And users enjoy and c- can really benefit from good recommendation systems, and they're recommending you better music, movies, news, uh, even research papers to read. Um, but, but of course, then in these targeted ads, especially in s- in certain cases where people can be manipulated by these targeted ads, that can have really bad, like severe consequences. Um, so, so essentially, users want their data to be used to better serve them, and also maybe even, right, get paid for or whatever, like in different settings. But the thing is that s- first of all, we need to really establish like who needs to decide, who can decide how the data should be used. And typically, the, the establishment and clarification of the ownership will help this, and it's an important first step.
29. LFLex Fridman
  Mm-hmm.
30. DSDawn Song
  So if the user is the owner, then naturally the user gets to define how the data should be used. But if you even say that, "Wait a minute, users are actually now the owner of this data, whoever is collecting the data is the owner of the data," now of course they get to use the data however way they want.
1:22:13 – 1:32:13
Blockchain and cryptocurrency
1. DSDawn Song
2. LFLex Fridman
  So, another interesting space where security is really important is in, in the space of, uh, any kinds of transactions, but it could be also digital currency. So, can you maybe talk, uh, a little bit about blockchain, uh, can you tell me what is a blockchain?
3. DSDawn Song
  (laughs) Um, I think the blockchain word itself is actually very overloaded.
4. LFLex Fridman
  Of course.
5. DSDawn Song
  Uh, i- in general-
6. LFLex Fridman
  It's like AI.
7. DSDawn Song
  Right. (laughs) Yes. So, in general when we talk of blockchain, we refer to this, uh, distributed ledger in a decentralized fashion. So essentially, you have, um, an, a community of nodes that come together and even though each one may not be trusted, and as long as certain thresholds of the set of nodes, um, uh, behaves properly, then, uh, the system can essentially achieve certain properties. For example, uh, in the distributed ledger setting, you have, you can maintain an immutable log and, uh, you can, uh, ensure that, uh, uh, for example, the transactions actually are agreed upon and then it's immutable and so on.
8. LFLex Fridman
  So, first of all, what's a ledger? So, it's a...
9. DSDawn Song
  It's like a database, it's like a data entry.
10. LFLex Fridman
  And so a distributed ledger is something that's maintained across or is, uh, synchronized across multiple sources, multiple nodes.
11. DSDawn Song
  Multiple nodes, yes.
12. LFLex Fridman
  And so where is this idea, now how do you keep... So g- so it's important a ledger, a database to keep that, uh, to make sure, so what are the kinds of security vulnerabilities that you're trying to protect against in the context of dist- of a distributed ledger?
13. DSDawn Song
  So, in this case, for example, you don't want some malicious nodes to be able to change the transaction logs, uh, and in certain cases called double spending, like, uh, you'll also cause, you can also cause different views in different parts of the network and so on.
14. LFLex Fridman
  So, the ledger has to represent, if you're capturing like financial transactions, it has to represent the exact timing and the exact occurrence and no duplicates, all that kind of stuff, it has to be, uh, represent what actually happened. Okay, so what are your thoughts on the security and privacy of digital currency? I can't tell you how many people write to me to interview various people in the digital currency space. There seems to be a lot of excitement there, uh, and it seems to be, some of it's, to me, from a outsider's perspective seems like dark magic. I don't know how secure... I, I think the f- the foundation from my perspective of digital currencies, that is you can't trust anyone, so you have to create a really secure system. So, can you maybe speak about how, what your thoughts in general about digital currency is, and how you, how it can possibly create financial transactions and, uh, financial stores of money in the digital space?
15. DSDawn Song
  So, you asked about security and privacy. And so, so again, as I mentioned earlier, in security, we actually talk about two main properties. Um, the, uh, integrity and, uh, confidentiality. Also there's another one for availability. Uh, you want the system to be available.
16. LFLex Fridman
  Right.
17. DSDawn Song
  But here, uh, for the question you asked, let's just focus on, uh, integrity and confidentiality.
18. LFLex Fridman
  Yes.
19. DSDawn Song
  So, so for integrity of this distributed ledger essentially as we discussed, we want to ensure that the different nodes, uh, s- s- r-right, so they have this consistent view, usually it's done through, uh, we call a consensus protocol, and that they establish, uh, this shared view on this ledger, uh, that you cannot go back and change-
20. LFLex Fridman
  Mm-hmm.
21. DSDawn Song
  ... is immutable, uh, and so on. So, uh, so in this case then, the security often refers to, uh, this integrity property, and essentially you're asking the question, how much work, how, uh, how can you attack the system-
22. LFLex Fridman
  Mm-hmm.
23. DSDawn Song
  ... so that, uh, the attacker can change the log, for example?
24. LFLex Fridman
  Right. How hard is it to make an attack like that, yeah?
25. DSDawn Song
  Right, right. And then that very much depends on the, the consensus mechanism, the, how the system is built, and all that. So, there are different ways to build these, uh, decentralized systems, and people may have heard about the terms called like proof of work, proof of stake, these different mechanisms, and really depends on, uh, how, right, how the system has been built and also how much resources, how much work has gone into the network to actually, um, say how secure it is. So, for example-
26. LFLex Fridman
  Mm.
27. DSDawn Song
  ... people talk about like in Bitcoin is proof of work system, so much e- electricity has been burned.
28. LFLex Fridman
  So, there's differences, uh, there's differences in the different mechanisms and the implementations of a distributed ledger used for digital currency. Uh, also there's Bitcoin, there's, uh, uh, f- whatever, there's so many of them and there's underlying, uh, different mechanisms, and there's arguments I suppose about which is more effective, which is more secure, uh, which is more, um-
29. DSDawn Song
  And what is needed, w- uh, what amount of resource is needed to be able to attack the system. Like for example, what percentage of the nodes do you need to control or compromise in order to, um, right, to change the log, for example.
30. LFLex Fridman
  And those are things d, uh, do you have a sense if those are things that could be shown theoretically through the design of the mechanisms? Or does it have to be shown empirically by having a h- uh, a large number of users using the currency?

Episode duration: 2:12:36

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Transcript of episode HhY95m-WD_E

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome