Lex Fridman PodcastMichael Kearns: Algorithmic Fairness, Privacy & Ethics | Lex Fridman Podcast #50
EVERY SPOKEN WORD
150 min read · 30,000 words- 0:00 – 1:00
Lex’s intro: Kearns’ background and why ethical algorithms matter
- LFLex Fridman
The following is a conversation with Michael Kearns. He's a professor at the University of Pennsylvania and a co-author of the new book, Ethical Algorithm, that is the focus of much of this conversation. It includes algorithmic fairness, bias, privacy, and ethics in general. But that is just one of many fields that Michael's a world-class researcher in, some of which we touch on quickly including learning theory or the theoretical foundation of machine learning, game theory, quantitative finance, computational social science, and much more. But on a personal note, when I was an undergrad early on, I worked with Michael on an algorithmic training project and competition that he led. That's when I first fell in love with algorithmic game theory. While most of my research life has been in machine learning and human robot interaction, the systematic way that game theory reveals the beautiful structure in our competitive and cooperating world of humans has been a continued inspiration
- 1:00 – 2:31
Sponsor segment: fear of new technology and recurring social reactions
- LFLex Fridman
to me. So for that, and other things, I'm deeply thankful to Michael, and really enjoyed having this conversation again in person after so many years. This is the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, give it five stars on Apple Podcasts, support it on Patreon, or simply connect with me on Twitter @lexfridman, spelled F-R-I-D-M-A-N. This episode is supported by an amazing podcast called Pessimists Archive. Jason, the host of the show, reached out to me looking to support this podcast, and so I listened to it to check it out. And by listened, I mean I went through it Netflix binge style, at least five episodes in a row. It's now one of my favorite podcasts, and I think it should be one of the top podcasts in the world, frankly. It's a history show about why people resist new things. Each episode looks at a moment in history when something new was introduced, something that today we think of as commonplace, like recorded music, umbrellas, bicycles, cars, chess, coffee, the elevator, and the show explores why it freaked everyone out. The latest episode on mirrors and vanity still stays with me as I think about vanity in the modern day, of the Twitter world. That's the fascinating thing about this show is that stuff that happened long ago, especially in terms of our fear of new things, repeats itself in the modern day, and so has many lessons for us to think about in terms of human psychology and the role of
- 2:31 – 3:40
Literary influences and the path from English to computer science
- LFLex Fridman
technology in our society. Anyway, you should subscribe and listen to Pessimists Archive. I highly recommend it. And now here's my conversation with Michael Kearns. You mentioned reading Fear and Loathing in Las Vegas in high school, and having a more or a bit more of a literary mind. So what books, non-technical, non-computer science, would you say had the biggest impact on your life, either intellectually or emotionally?
- MKMichael Kearns
You've dug deep into my history, I see. Um-
- LFLex Fridman
Went deep.
- MKMichael Kearns
Yeah. I think, well, my favorite novel is, uh, Infinite Jest by David Foster Wallace which, um, actually coincidentally much of it takes place in the halls of buildings right around us here at MIT.
- LFLex Fridman
Yes.
- MKMichael Kearns
So that certainly had a big influence on me. Um, and as you noticed, like, when I was in high school, I, I actually s- even started college as an English major. So was very influenced by sort of that genre of journalism at the time and thought I wanted to be a writer and then realized that an English major teaches you to read but it doesn't teach you how to write, and then I became interested in math and computer science instead.
- 3:40 – 7:33
From moral philosophy to implementable definitions of fairness
- LFLex Fridman
Well, in your new book, Ethical Algorithm, you kinda sneak up from a algorithmic perspective on these deep profound philosophical questions of, of fairness, of, um, privacy. In thinking about these topics, how often do you return to that literary mind that, that you had?
- MKMichael Kearns
Yeah. I'd like to claim there was a deeper connection, um, but, but there, uh, you know, w- I think both Aaron and I kind of came at these topics first and foremost from a technical angle. I mean, you know, uh, um, I kind of consider myself primarily, um, and originally a machine learning researcher, and I think as we just watched like the rest of the society, the field technically advance and then quickly on the heels of that kind of the, the buzzkill of all of the antisocial behavior by algorithms. Just kind of realized there was an opportunity for us to do something about it from a research perspective. You know, a, uh, more to the point in your question, I mean, I, I do have an uncle who is literally a m- a moral philosopher, and so in the early days of our technical work on fairness topics, I would occasionally, you know, run ideas behind him. So I mean, I remember an early email I sent to him in which I said like, "Oh, you know, here's a specific definition of algorithmic fairness that we think is some sort of variant of Rawlsian fairness. Um, what do you think?" And I thought I was asking a yes or no question, and I got back your kind of classical philosopher's response saying, "Well, it depends. If you look at it this way, then you might conclude this." Um, and that's when I realized that there was a real kind of, um, rift between the ways philosophers and others had thought about things like fairness, you know, from sort of a humanitarian perspective and the way that you needed to think about it as a computer scientist if you were going to kind of implement actual algorithmic solutions.
- LFLex Fridman
But I would say the algorithmic solutions take care of some of the low-hanging fruit, sort of the problem is a lot of algorithms when they don't consider fairness, they are just terribly unfair, and when they don't consider privacy, they're terribly, uh, uh, they violate privacy. Sort of the algo- algorithmic approach fixes big problems, but-There is still, you get, when you start pushing into the gray area, that's when you start getting to this philosophy of what it means to be fair, starting from Plato. What, what is justice, kind of questions.
- MKMichael Kearns
Yeah, I think that's right. And I, (sighs) I mean, I, I, I would even not go as far as you went to say that, that sort of the algorithmic work in these areas is solving, like, the biggest problems. Um, and, you know, we discuss in the book the fact that, uh, really, we are ... There's a sense in which we're kind of looking where the light is in that, um, you know, for example, if police are racist in who they decide to stop and frisk, um, and that goes into the data, there's sort of no undoing that downstream by kind of clever algorithmic methods. And I think especially in fairness. I mean, m- m- I think less so in privacy where we feel like the community kind of really has settled on the right definition, which is differential privacy. If you just look at the algorithmic fairness literature, already you can see it's gonna be much more of a mess, and, you know, you've got these theorems saying, "Here are three entirely reasonable, desirable notions of fairness, and, you know, here's a proof that you cannot simultaneously have all three of them." Um, so I think we know that algorithmic fairness compared to algorithmic privacy is gonna be kind of a harder problem, and it will have to revisit, um, I think, things that have been thought about by, you know, many generations of scholars before us. Um, so it's very early days for fairness, I think.
- 7:33 – 13:14
Are people fundamentally good? Power, professional culture, and social norms
- LFLex Fridman
So before we get into details of differential privacy, and, and on the fairness side, let me linger on the philosophy a bit. Do you think most people are fundamentally good, or do most of us have both the capacity for good and evil within us?
- MKMichael Kearns
I mean, I'm an optimist. I tend to think that most people are good and, and want to do, to do right, and that deviations from that are, you know, kind of usually due to circumstance, not due to people being bad at heart.
- LFLex Fridman
With people with power, are people at the heads of governments, people at the heads of companies, people at the heads of maybe, so financial power, markets, do you think the distribution there is also most people are good and have good intent?
- MKMichael Kearns
Y- yeah, I, I do. I mean, my, my statement wasn't qualified to people not in positions of power. I mean, I think what happens in a lot of the, you know, the, the, the cliche about absolute power corrupts absolutely. I mean, y- you know, I think even short of that, you know, having spent a lot of time on Wall Street and also in arenas very, very different from Wall Street, like academia, um, you know, one of the things m- m- I think I've benefited from by moving between two very different worlds is you, you become aware that, you know, these worlds kind of develop their own social norms, and they develop their own rationales for, um, you know, behavior, for instance, that might look unusual to outsiders. But when you're in that world, it doesn't feel unusual at all. Um, and I, I think this is true of a lot of, you know, professional cultures, for instance, um, and, and, you know, so then you're m- maybe slippery slope is too strong of a word, but, you know, you're in some world where you're mainly around other people with the same kind of viewpoints and training and worldview as you. And I think that's more of a source of, of, you know, kind of abuses of power, um, than sort of, you know, there being good people and evil people, um, and, and that somehow the evil people are the ones that somehow rise to power.
- LFLex Fridman
Oh, that's really interesting. So it's the, within the social norms constructed by that particular group of people, you're all trying to do good. But because it's a group, you might be, you might drift into something that for the broader population, it does not align with the values of society. That kind of, that's the worry.
- MKMichael Kearns
Yeah, I mean, or, or not that you drift, but even the things that don't make sense to the outside world don't seem unusual to you. So it's not sort of, like, a good or a bad thing, but, you know. Like, so for instance, you know, on, on, i- i- in the world of finance, right-
- LFLex Fridman
Right.
- MKMichael Kearns
... there's a lot of complicated types of activity that if you are not immersed in that world, you cannot see why the purpose of that, you know, that activity exists at all. It just seems-
- LFLex Fridman
Yeah.
- MKMichael Kearns
... like, you know, um, completely useless, and people just like, you know, pushing money around. And when you're in that world, right, you're, you're, you, and you learn more, you, your view does become more nuanced, right? You realize, okay, there is actually a function to, to this activity. Um, and for s- in some cases, you would conclude that actually if magically we could eradicate this activity tomorrow, it would come back because it actually is, like-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... serving some useful purpose. It's just a, a useful purpose that's very difficult for outsiders to see.
- LFLex Fridman
Right.
- MKMichael Kearns
And so I think, you know, lots of professional work environments or cultures, as I might put it, um, kind of have these social norms that, you know, don't make sense to the outside world. Academia's the same, right? I mean-
- LFLex Fridman
Right.
- MKMichael Kearns
... lots of people look at academia and say, you know, "What the hell are all of you people doing?"
- LFLex Fridman
Yeah.
- MKMichael Kearns
"Why are you paid so much in some cases at taxpayer expenses to do, you know, to-"
- LFLex Fridman
Publish papers that nobody reads.
- MKMichael Kearns
... publish pa- you know, but when you're in that world, you come to see the value for it, and but even though you might not be able to explain it to, you know, the person in the street.
- LFLex Fridman
Right. And in, in the case of the, the financial sector, tools like credit might not make sense to people ƒ. Like, is, it's a good example of something that does seem to pop up and be useful or, or just the power of markets and just in general capitalism.
- MKMichael Kearns
Y- yeah, in finance, I think the primary example I would give is leverage, right? So being allowed to borrow-... to, sort of, use 10 times as much money as you've actually borrowed.
- LFLex Fridman
Right.
- MKMichael Kearns
Right? So, so that's an example of something that before I had any experience in financial markets, I might have looked at and said, "Well, what is the purpose of that? That just seems very dangerous." And it, and it is dangerous, and it has proven dangerous. But, you know, if the fact of the matter is that, you know, sort of on some particular time scale you are holding positions that are, you know, very unlikely to, you know, loo- you know, they're, you know, like your value at risk or variance is like 1% or 5%-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... then it kind of makes sense that you would be allowed to use a little bit more than you have, uh, because you have, you know, some confidence that you're not gonna lose it all in a single day. Now, of course, when that happens (laughs) -
- LFLex Fridman
Yeah.
- MKMichael Kearns
... we've, we've seen what happens, you know, not, not too long ago. But, but, you know, but the idea that it serves no useful economic purpose under any circumstances is definitely not true.
- LFLex Fridman
We'll return to the other side of the coast, Silicon Valley, and, and the problems there as we talk about (laughs) privacy, as we talk about fairness. At the high level... And I'll ask some sort of basic questions with a hope to get at the fundamental nature of reality.
- MKMichael Kearns
(laughs)
- 13:14 – 19:04
What is an “ethical algorithm”? Quantifying ethics and choosing what to protect
- LFLex Fridman
But from a very high level, what is an ethical algorithm? So I can say that an algorithm has a running time of using big O notation, N log N. I can say that a machine learning algorithm cl- classified cat versus dog with 97% accuracy. Do you think there will one day be a way to measure sort of in the same compelling way as the big O notation of this algorithm is 97% ethical?
- MKMichael Kearns
First of all, let me riff for a second on your specific N log N example.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
So because early in the book when we're just kind of trying to describe algorithms period, we say like, "Okay, you know, what's an example of an algorithm or an algorithmic problem, first of all?" Like, it's sorting, right? You have a bunch of index cards with numbers on them, and you want to sort them. And we describe, you know, an algorithm that sweeps all the way through, finds the, the smallest number, puts it at the front, then sweeps through again, finds the second-smallest number. So we make the point that this is an algorithm, and it's also a bad algorithm in the sense that, you know, it's quadratic rather than N log N, which we know is kind of optimal for sorting. And we make the point that sort of like, you know, so even within the confines of a very precisely specified problem, there, you know, there might be many, many different algorithms for the same problem with different properties. Like, some might be faster in terms of running time. Some might use less memory. Some might have, you know, better distributed implementations. And, and so the point is, is that already we're used to, you know, in computer science, thinking about trade-offs between different types of quantities and resources-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... and there being, you know, better and worse algorithms. And, and our book is about that part of algorithmic ethics that we know how to kind of put on that same kind of quantitative footing right now. So, you know, just to, to say something that our book is not about, our, our book is not about kind of broad, fuzzy notions of fairness.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
It's about very specific notions of fairness. There's more than one of them. Um, there are tensions between them, right? But if you pick one of them, you can do something akin to saying that this algorithm is 97% ethical. You can say, for instance, the, you know, th- for this lending model, the false rejection rate on Black people and white people is within 3%.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
Right? So we might call that to, a 97% ethical algorithm and a 100% ethical algorithm would mean that that difference is 0%.
- LFLex Fridman
Mm-hmm. In that case, fairness is specified when two groups, however they're defined, are given to you.
- MKMichael Kearns
That's right.
- LFLex Fridman
So, so the... And, and then you can sort of mathematically start describing the algorithm. But nevertheless, the, the part where the two groups are given to you, in, unlike running time, you know, we don't in computer science talk about how fast an algorithm feels like when it runs. (laughs)
- MKMichael Kearns
True. (laughs)
- LFLex Fridman
We measure it. And ethical starts getting into feelings. So for example, an algorithm runs, you know, if it runs in the background, it doesn't disturb the performance of my system, it'll feel nice. I'll be okay with it. But if it overloads the system, it'll feel unpleasant. So in that same way, ethics, there's a feeling of how socially acceptable it is. How does it represent the, the moral standards of our society today? So in that sense, and sorry to linger on that, first of all, high-level philosophical question, is do you have a sense we'll be able to measure how ethical an algorithm is?
- MKMichael Kearns
First of all, I didn't, certainly didn't mean to give the impression that you can kind of measure, you know, memory speed trade-offs, uh, you know, and, and that there's a complete m- m- you know, mapping from that onto kind of fairness, for instance, or-
- LFLex Fridman
Yeah.
- MKMichael Kearns
... ethics and, and accuracy, for example. In the type of fairness definitions that are largely the objects of study today and starting to be deployed, you as the user of the definitions, you need to make some hard decisions before you even get to the point of designing fair algorithms. Um, one of them, for instance, is deciding who it is that you're worried about protecting, who you're worried about being harmed by, for instance, some notion of discrimination or unfairness. And then you need to also decide what constitutes harm. So for instance, in a lending application, maybe you decide that f- you know, falsely rejecting a creditworthy individual, um, you know, sort of a false negative, is the real harm and that false positives, i.e. people that are not...... creditworthy or are not gonna repay your loan, they get a loan, you might think of them as lucky.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
Um, and so that's not a harm, although it's not clear that if you are... don't have the means to repay a loan, that being given a loan is not also a harm. So, you, you know, the, the literature is sort of, so far, quite limited in that you sort of need to say who do you want to protect and what would constitute harm to that group. And, and when you ask questions like will algorithms feel ethical, one way in which they won't under the definitions that I'm describing is if, you know, if you are an individual who is falsely denied a loan-
- LFLex Fridman
Right.
- MKMichael Kearns
... incorrectly denied a loan, all of these definitions basically say, like, "Well, you know, your compensation is the knowledge that we are, we are also falsely denying loans to other people, you know-"
- LFLex Fridman
Yeah.
- 19:04 – 25:43
Group fairness vs. individual fairness—and “subjective fairness”
- MKMichael Kearns
"... in other groups at the same rate that we're doing it to you." And, and, you know, there... and so there is actually this interesting even technical tension in the field right now between these sort of group notions of fairness and notions of fairness that might actually feel like real fairness to individuals, right? They-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
They might really feel like their particular interests are being protected or thought about by the algorithm rather than just, you know, the groups that they happen to be members of.
- LFLex Fridman
Is there parallels to the Big O notation of worst case analysis? So is it important to... looking at the worst violation of fairness for an individual, is it important to minimize that one individual? So, like, worst case analysis? Is that something you think about or...
- MKMichael Kearns
I, I mean, I think we're not even at the point where we can sensibly think about that. So f- so first of all, you know, w- w- we're talking here both about fairness applied at the group level, which is a relatively weak thing, but it's better than nothing, and also the more ambitious thing of trying to f- to give some individual promises. But even that doesn't incorporate, I think, something that you're hinting at here, is what I try... what I call subjective fairness, right?
- LFLex Fridman
Right.
- MKMichael Kearns
So a lot of the definitions... I mean, all of the definitions in the algorithmic fairness literature are what I would kind of call received wisdom definitions. It's sort of-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... you know, somebody like me sits around and thinks like, "Okay, you know, I think here's a technical definition of fairness that I think people should want or that they should, you know, think of as some notion of fairness. Maybe not the only one, maybe not the best one, maybe not the last one." But we really actually, uh, uh, don't know from a subjective standpoint, like, what people really think is fair. There's-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
You know, it's... W- w- w- we, we just started doing a little bit of work in, in our group at real... actually doing kind of human subject-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... experiments in which we, you know, ask people about... you know, we, we, we ask them questions about fairness. We survey them. We, you know, we show them pairs of individuals in, let's say, a criminal recidivism prediction setting and we ask them, "Do you think these two individuals should be treated the same as a matter of fairness?" And to my knowledge, there's not a large literature in which ordinary people are asked about... you know, they, they have sort of notions of their subjective fairness elicited from them.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
Um, it's mainly, you know, kind of scholars who think about fairness, uh-
- LFLex Fridman
Right.
- MKMichael Kearns
... kind of making up their own definitions. And I think, I think this needs to change actually for many social norms, not just for fairness, right? So there's a lot of, you know, discussion these days in the AI community about interpretable AI or understandable AI. And as far as I can tell, everybody agrees that deep learning, or at least the outputs of deep learning, are not very understandable. And people might agree that sparse linear models with integer coefficients are more understandable.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
But nobody's really asked people, you know?
- LFLex Fridman
(laughs)
- MKMichael Kearns
There's very little literature on, you know, s- sort of showing people models and asking them do they understand what the model is doing.
- LFLex Fridman
Right.
- MKMichael Kearns
And I think that, uh, in all these topics, as these fields mature, we need to start doing more behavioral work.
- LFLex Fridman
Yeah, which is... so one of my deep passions is psychology, and I always thought computer scientists will be the f- the best future psychologists. (laughs) In a sense that data is, um... especially in this modern world, data is a really powerful way to understand and study human behavior. And you've explored that with your game the- your theory side of work as well.
- MKMichael Kearns
Yeah. I, I'd like to think that what you say is true about computer scientists and psychology. From my own limited wandering into human subject experiments, we have a great deal to learn. Not just computer science, but AI and machine learning more specifically, I, I kind of think of as imperialist research communities in that, you know, kind of like physicists in an earlier generation, computer scientists kind of don't think of any scientific topic as off limits to them. They will, like, freely wander into areas that others have been thinking about for decades or longer.
- LFLex Fridman
Yes.
- MKMichael Kearns
And, you know, we usually tend to embarrass ourself-
- LFLex Fridman
Yes.
- MKMichael Kearns
... in those efforts for, for some amount of time. Like, you know, I think reinforcement learning is a good example, right? So a lot of the early work in reinforcement learning, I have complete sympathy for the con- the control theorists that looked at this and said, like, "Okay, you are reinventing stuff that we've known since, like, the '40s," right?
- LFLex Fridman
Mm-hmm.
- 25:43 – 33:36
Fairness gerrymandering and the combinatorial explosion of protected subgroups
- MKMichael Kearns
So let me start by answering a, a very good high-level question with a slightly narrow technical response.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
Which is, these group definitions of fairness, like here is a few groups, like different racial groups, maybe gender groups, maybe age, what have you, um, and let's make sure that, you know, for none of these groups do we, um, you know, have a false negative rate which is much higher than any other one of these groups, okay?
- LFLex Fridman
Right.
- MKMichael Kearns
So these are kind of classic group aggregate notions of fairness. And, you know, but th- at the end of the day, an individual you can think of as a combination of all of their attributes, right? They're a member of a racial group, they're, they have a, a gender, um, they have an age, you know, and many other, you know, demographic properties that are not biological but that, you know, are, are still, you know, very strong determinants of outcome and personality and the like. S- so one I think useful spectrum is to sort of think about that array between the group and the specific individual-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... and to realize that in some ways asking for fairness at the individual level is to sort of ask for group fairness simultaneously for all possible combinations of groups.
- LFLex Fridman
(laughs) Yeah.
- MKMichael Kearns
So in particular, so in particular-
- LFLex Fridman
Yes.
- MKMichael Kearns
... you know, if I build a predictive model that meets some definition of fairness by race, by gender, by age, by what have you-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... marginally, to get slightly technical, sort of independently, I shouldn't expect that model to not discriminate against disabled Hispanic women over age 55 making less than $50,000 a year annually even though I might have protected each one of those attributes marginally.
- LFLex Fridman
So the optimization actually, th- that's a fascinating way to put it. So you're just optimizing... S- so one way to achieve the optimizing fairness for individuals is just to add more and more definitions of groups that each individual belongs to.
- MKMichael Kearns
That's right. So, you know, it, at the end of the day, we could think of all of ourselves as groups of size one-
- LFLex Fridman
Yeah.
- MKMichael Kearns
... because eventually there's some attribute that separates you from me and everybot- from everybody else in the world, okay?
- LFLex Fridman
Yes.
- MKMichael Kearns
And so i- i- it is possible to put, you know, these incredibly coarse ways of thinking about fairness and these very, very individualistic specific ways-
- LFLex Fridman
Yeah.
- MKMichael Kearns
... on a common scale.
- LFLex Fridman
Yeah.
- MKMichael Kearns
And, you know, one of the things we've worked on from a research perspective is, you know, so we sort of know how to, you know, we, in relative terms, we know how to provide fairness guarantees at the coarsest end of the scale.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
We don't know how to provide kind of sensible, tractable, realistic fairness guarantees at the individual level, but maybe we could start creeping towards that by dealing with more r- you know, refined sub-groups. I mean, we, we gave a name to this phenomenon where, you know, you protect, you, you, you enforce some desh- def- definition of fairness for a bunch of marginal attributes or features, but then you find yourself discriminating against a, a combination of them. We call that fairness gerrymandering.
- LFLex Fridman
Mm-hmm. (laughs)
- MKMichael Kearns
Because like political gerrymandering, you know, you're, you're giving some guarantee at the aggregate level-
- LFLex Fridman
Yes.
- MKMichael Kearns
... um, but that when you kind of look in a more granular way at what's going on you realize that you're achieving that aggregate guarantee by sort of favoring some groups and discriminating against other ones. And, and so there are, you know, it's early days, but there are algorithmic approaches that let you start creep, you know, creeping towards that, you know, individual end of the spectrum.
- LFLex Fridman
Does there need to be human input in the form of weighing the value of the importance of each kind of group? So for example, is it, is it like, so gender, say, uh, crudely speaking male and female, and then different races, are we as humans supposed to put value on saying gender is 0.6 and race is 0.4 in terms of, uh, in the big optimization of achieving fairness? Is that kind of what humans are supposed to do here?
- 33:36 – 44:22
Fairness–accuracy trade-offs: Pareto frontiers and stakeholder decision-making
- LFLex Fridman
When you discuss the fairness, an algorithm that, uh, that achieves fairness whether in the constraints and the objective function, there's an immediate kind of analysis you can perform which is saying if you care about fairness in gender, this is the amount that you have to pay for it in terms of the performance of the system. Like do you... Is there a role for statements like that in a table in a paper, or do you wanna really not touch that? Like-
- MKMichael Kearns
No, no. W- we, we want to touch that and we do touch it. So I mean just, just again to make sure I'm not promising your, your viewers more than we know how to provide, but if you pick a definition of fairness, like I'm worried about gender discrimination-
- LFLex Fridman
Yes.
- MKMichael Kearns
... and you pick a notion of harm like false rejection for a loan, for example, and you give me a model, I can definitely first of all go audit that model. It's easy for me to go, you know, from data to kind of say like, "Okay, your false rejection rate on women is this much higher than it is on men, okay?" But, you know, once you also put, uh, the fairness into your objective function, I mean I think the table that you're talking about is, you know, what, what we would call the Pareto curve, right?
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
You can literally trace out, and we give examples of such plots on real data sets in the book. You, you have two axes. On the X axis is your error, on the Y axis is unfairness by whatever, you know, if it's like the disparity between false rejection rates between two groups.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
Um, and you know, your algorithm now has a knob that basically says how strongly do I want to enforce fairness.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
And the less unfair, and you know, we, w- you know, if the two axes are error and unfairness, we'd like to be at zero/zero. We'd like zero error and zero f- fair- unfairness simultaneously. Anybody who works in machine learning knows that you're generally not going to get to zero error period without any fairness constraint whatsoever, so that's, that, that's not gonna happen. But in general, you know, you'll get this, you'll get some kind of convex curve that specifies the numerical trade-off you face, you know. If I want to go from 17% error down to 16% error, what will be the increase in unfairness that I, I experience as a result of that? And, and so this curve kind of specifies the, you know, kind of undominated models. Models that are off that curve are, you know, can be strictly improved in one or both dimensions. You can, you know, either make the error better or the unfairness better or both. Um, and I think our view is that not only, uh, are, are these objects, these Pareto curves, you know, they're efficient frontiers as you might call them-... not only are they valuable scientific objects, I actually think that they, in the near term, might need to be the interface-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... between researchers working in the field and, and stakeholders in given problems. So you know, you could really imagine telling a, uh, criminal jurisdiction, "Look, if, if you're concerned about racial fairness, but you're also concerned about accuracy, you want to, you know, you, you want to release on parole people that are not going to recommit a violent crime, and you don't want to release the ones who are. So, you know, that's accuracy. But if you also care about those, you know, the mistakes you make not being disproportionally on one racial group or another, you can, you can show this curve." I'm hoping that in the near future, it, it'll be possible to explain these curves to non-technical people that have, that are the ones that have to make the decision, where do we wanna be on this curve? Like, what are the relative merits or value of having lower error versus lower unfairness? You know, that's not something computer scientists should be deciding for society, right? That, you know, the, the, the people in the field, so to speak, the policymakers, the regulators, that's who should be making these decisions. But I think and hope that they can be made to understand that these trade-offs generally exist, and that you need to pick a point. And, like, in ignoring the trade-off, you know, you're implicitly picking a point anyway, right?
- LFLex Fridman
(laughs) Right.
- MKMichael Kearns
Um, you just don't know it, and you're not admitting it.
- LFLex Fridman
J- just to linger on the point of trade-offs, I think that's a really important thing to sort of, uh, think about. So, you think when we start to optimize for fairness, there's almost always in most system going to be trade-offs. So can you, like, like, what's the trade-off between... Just to clarify, uh, there have been some sort of technical terms thrown around, but a, uh, sort of, um, a perfectly fair world, why is that, why will somebody be upset about that?
- MKMichael Kearns
The specific trade-off I talked about, just in order to make things very concrete-
- LFLex Fridman
Yes.
- MKMichael Kearns
... was between numerical error and some numerical measure of unfairness. In, in-
- LFLex Fridman
What is numerical error? In, in the case of, uh-
- MKMichael Kearns
Just, like say, predictive error. Like, you know, the probability or frequency with which you release somebody on parole who then goes on to recommit a violent crime or keep incarcerated somebody who would not have recommitted a violent crime. So that-
- LFLex Fridman
So in the case of awarding somebody parole, or giving somebody parole, or letting them out on parole, you don't want them to recommit a crime. So it's your system failed in prediction if they happen to do a crime. Okay, so that's the perfo- that's one axis.
- MKMichael Kearns
Right.
- LFLex Fridman
And what's the fairness axis?
- MKMichael Kearns
And so then the fairness axis might be the difference between racial groups in the kind of false, false positive predictions, namely people that I kept incarcerated, predicting that they would recommit a violent crime, when in fact they wouldn't have.
- LFLex Fridman
Right. And the f- the unfairness of that, just to linger it, and a- allow me to, uh, ineloquently to try to sort of describe why that's unfair, why unfairness is there, the, the unfairness you wanna get rid of is the, in the judge's mind, the bias of having being brought up in the society, the slight racial bias, the racism that exists in the society, you wanna remove that from the system. Another way that's been debated is sort of equality of opportunity versus equality of outcome. And there's a weird dance there that's really difficult to get right. And we don't, and so, uh, the affirmative action is exploring that space.
- MKMichael Kearns
Right. And, and then we, this also quickly, you know, um, bleeds into questions like, well, maybe if one group really does recommit crimes at a higher rate-
- LFLex Fridman
Right.
- MKMichael Kearns
... uh, the reason for that is that at some earlier point in the pipeline or earlier in their lives they didn't receive the same resources that the other group did.
- LFLex Fridman
Right.
- MKMichael Kearns
Um, and that, and so, you know, there's always, in, in kind of fairness discussions, the possibility that the, uh, the real injustice came earlier, right?
- 44:22 – 1:06:00
Divisive culture, social media algorithms, and escaping “bad equilibria”
- LFLex Fridman
There are fundamentally difficult philosophical questions in fairness, and we live in a very divisive political climate, outraged culture. There is, uh, alt-right folks on 4Chan, trolls. There is Social Justice Warriors on Twitter. There is very divisive outraged folks on all sides of every kind of system. How do you... How do we as engineers build ethical algorithms in such divisive culture? Do you think they could be disjoint? The human has to inject your values and then you can optimize over those values. But in our times when- when you start actually applying these systems, things get a little bit challenging for the public discourse. How- how do you think we can proceed?
- MKMichael Kearns
Yeah, I mean, for the most part, in the book, you know, a point that we try to take some pains to make is that we don't view ourselves or people like us as being in the position of deciding for society what the right social norms are, what the right definitions of fairness are.
- LFLex Fridman
Right.
- MKMichael Kearns
Our- our- our main point is to just show that if society or the relevant stakeholders in a particular domain can come to agreement on those sorts of things, there's a way of encoding that into algorithms in many cases, not in all cases. One other misconception that hopefully we definitely dispel is sometimes people read the title of the book and, you know, I think not unnaturally fear that what we're suggesting is that the algorithms themselves should decide what those social norms are and develop their own notions of fairness and privacy or ethics, and we're definitely not suggesting that.
- LFLex Fridman
The title of the book is Ethical Algorithm, by the way, and I didn't think of that interpretation of the title. That's interesting. (laughs)
- MKMichael Kearns
Yeah, yeah. I mean, and especially these days where people are, you know, concerned about the robots becoming our overlords-
- LFLex Fridman
Yeah.
- MKMichael Kearns
... the idea that the l- the robots would also, like, sort of develop their own social norms is, you know, just one step away from that. But I- I do think, you know, obviously despite disclaimer that people like us shouldn't be making those decisions for society, we- we are kind of living in a world where in many ways computer scientists have made some decisions that have fundamentally changed the nature of our society and democracy and- and sort of civil discourse and deliberation in ways that I think most people generally feel are bad these days, right? So...
- LFLex Fridman
But they had to make... So if we look at people at the heads of companies and so on, they had to make those decisions, right? Th- there has to be decisions... So there's- there's two options. Either you kinda put your head in the sand and don't think about these things and just let the algorithm do what it does, or you make decisions about what you value, you know, of injecting moral values into the algorithm.
- MKMichael Kearns
Look, I don't... I d- I never mean to be an apologist for the tech industry-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... but I think it's- it's a little bit too far to sort of say that explicit decisions were made about these things. So let's, for instance, take social media platforms, right?
- LFLex Fridman
Yes.
- MKMichael Kearns
So like many inventions in technology and computer science, a lot of these platforms that we now use regularly kind of started as curiosities, right? I remember when things like Facebook came out and its predecessors like Friendster, which nobody even remembers now.
- LFLex Fridman
(laughs) .
- MKMichael Kearns
P- people l- people really wondered like, "W- what... Why would anybody wanna spend time doing that?" You know? What... And even- even the web when it first came out, when it wasn't populated with much content and it was largely kind of hobbyists building their own kind of ramshackle websites, a lot of people looked at this and says, "Like, what is the purpose of this thing? Why is this interesting? Who would wanna do this?" And so even things like Facebook and Twitter, yes, technical decisions were made by engineers, by scientists, by executives in the design of those platforms, but, you know, I don't- I don't think 10 years ago anyone anticipated that those platforms, for instance, might kind of...... acquire undue pol- you know, influence on political discourse or on the outcomes of elections. And, uh, I think the scrutiny that these companies are getting now is entirely appropriate, but I think it's a little too harsh to kind of look at history and sort of say, like, "Oh, you should've been able to anticipate that this would happen with your platform."
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
And in the sort of gaming chapter of the book, one of the points we're making is that, you know, these platforms, right, they don't operate in isolation. So, like, uh, unlike the other topics we're discussing like fairness and privacy, like, those are really cases where algorithms can operate on your data and make decisions about you and you're not even aware of it, okay?
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
Things like Facebook and Twitter, these are, you know, these are, these are systems, right? These are social systems and their evolution, even their technical evolution because machine learning is involved, is driven in no small part by the behavior of the users themselves and how the users decide to adopt them and how to use them.
- LFLex Fridman
Yeah.
- MKMichael Kearns
And so, you know, (laughs) you know, I'm kind of like who really knew that, that, you know, un- until, until we saw it happen, who knew that these things might be able to influence the outcome of elections? Who knew that, you know, they might polarize political discourse because of the ability to, you know, decide who you interact with on the platform and also with the platform naturally using machine learning to optimize for your own interests, that they would further isolate us from each other and, you know, like, feed us all basically just the stuff that we already agreed with? And so I think it, uh, you know, we've, we've come to that outcome, I think, largely, but I think it's something that we all learned together, including the companies as these things happen.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
Now, y- you asked, like, well, are there algorithmic remedies to these kinds of things? And, um, again, these are big problems that are not gonna be solved with, you know, somebody going in and changing a few lines of code somewhere in a social media platform. But I, I do think in many ways, there are, there are definitely ways of making things better. I mean, like an obvious recommendation that we, we make at some point in the book is, like, look, you know, to the extent that we think that machine learning applied for personalization purposes in things like news feed, you know, or other platforms, um, has led to polarization and intolerance of opposing viewpoints, as you know, right, these, these algorithms have models, right? And they kind of place people in some kind of metric space, and, and they place content in that space, and they sort of know the extent to which I have an affinity for a particular type of content. And by the same token, they also probably have a v- that, that same model probably gives you a good idea of the stuff I'm likely to violently disagree with or be offended by, okay?
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
So, you know, in this case, there really is some knob you could tune that says, like, instead of showing people only what they like and what they want, let's show them some stuff that we think that they don't like, or that's a little bit further away.
- LFLex Fridman
And s-
- MKMichael Kearns
And you could even imagine users being able to control this. You know, just like a b- l- everybody gets a slider.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
And y- that slider says like, you know, how much stuff do you want to see that's kind of, you know, you might disagree with or is at least further from your interests? Like an- it's almost like an exploration button.
- 1:06:00 – 1:22:32
Privacy basics: why anonymization fails and how differential privacy works
- LFLex Fridman
What is differential privacy? Or more broadly, algorithmic privacy?
- MKMichael Kearns
Algorithmic privacy more broadly is just the study or, or the notion of privacy definitions or norms being encoded inside of algorithms. And so, you know, I think we count among this body of work just, you know, the literature and practice of things like data anonymization, which we kind of at the beginning of our discussion of privacy say like, "Okay, this is, this is sort of a notion of algorithmic privacy. It kind of tells you, you know, something to go do with data." But, but, you know, our view is that it's... and I think it- this is now, you know, quite widespread that it's, you know, despite the fact-... that those notions of anonymization, kind of redacting and coarsening, um, are the most widely adopted technical solutions for data privacy. They are, like, deeply, fundamentally flawed. And so, you know, to your first question, what is differential privacy, differential privacy seems to be a much, much better notion of privacy that kind of avoids a lot of the weaknesses of anonymization notions while, while still letting us do useful stuff with data.
- LFLex Fridman
What's anonymization of data?
- MKMichael Kearns
So by anonymization, I'm, you know, kind of referring to techniques like, I have a database. The rows of that database are, let's say, individual people's medical records, okay? And I, I wanna let people use that data. Maybe I wanna let s- researchers access that data to build predictive models for some disease, but I'm worried that that will leak, uh, y- you know, sensitive information about specific people's medical records. So anonymization broadly refers to the set of techniques where I say, like, "Okay, I'm first gonna, like, like, I'm gonna delete the column with people's names. I'm going to not put..." You know, so that would be, like, a redaction, right? I'm just redacting that information. I am going to take ages, and I'm not gonna, like, say your exact age. I'm gonna say whether you're, you know, zero to 10, 10 to 20, 20 to 30. I might put the first three digits of your ZIP code, but not the last two, et cetera, et cetera. And so the idea is that through some series of operations like this on the data, I anonymize it. You know, another term of art that's used is removing personally identifiable information. And, you know, this is basically the most common way of providing data privacy, but that's in a way that still lets people access the, some variant form of the data.
- LFLex Fridman
And so at- at a slightly broader picture, as you talk about what does anonymization mean when you have multiple databases? Like, with the Netflix prize, when you can s- start combining stuff together to figure out-
- MKMichael Kearns
So this is exactly the problem with these notions, right, is that notions of anono- anonymization, removing personally identifiable in- information, the kind of fundamental conceptual flaw is that, you know, these definitions kind of pretend as if the dataset in question is the only dataset that exists in the world or that ever will exist in the future. And of course, things like the Netflix prize and many, many other examples since the Netflix prize, I think that was one of the earliest ones, though, you know, i- uh, you can re-identify people that were, you know, that were anonymized in the dataset by taking that anonymized dataset and combining it with other allegedly anonymized datasets and maybe publicly available information about you. You know?
- LFLex Fridman
And for people who don't know the Netflix prize, was, what was being publicly released as data, uh, so the names from those rows were r- removed, but what was released is the preference or the ratings of what movies you like and you don't like. And from that, combined with other things, I think forum posts and so on, you can start to figure out the names-
- MKMichael Kearns
Yeah, I mean, in that, that case, it was specifically the Internet Movie Database-
- LFLex Fridman
Or the IMDb database.
- MKMichael Kearns
... where, where lots of Netflix users publicly rate their mo- you know, their movie preferences. W- and so the anonymized data in Netflix when kinda ... And I mean, you know, it's, it's just this phenomenon I think that we've all come to realize in the last decade or so, is that just knowing a few apparently irrelevant, innocuous things about you can often act as a fingerprint, like if I know, you know, what m- what rating you gave to these 10 movies and the date on which you entered these movies, this is almost like a fingerprint for you-
- LFLex Fridman
Yeah, it does.
- MKMichael Kearns
... as- in the sea of all Netflix users.
- LFLex Fridman
Yeah.
- MKMichael Kearns
There was just another paper on this in Science or Nature of, about a month ago that, you know, kind of 18 attributes. I mean, m- my favorite example of this is, was actually, um, a paper, um, from several years ago now where it was shown that just from your likes on Facebook, just from the taunt, you know, the things on which you clicked on the thumbs up button on the platform, n- not using any information, demographic information, nothing about who your friends are, just knowing the content that you had liked, um, was enough to, you know, in the aggregate, accurately predict things like sexual orientation, drug and alcohol use, whether you were the child of divorced parents. So we live in this era where, you know, even the apparently irrelevant data that we offer about ourselves on public platforms and forums often, unbeknownst to us, more or less acts as signature or, you know, fingerprint. And that if you can kind of, you know, do a join between that kind of data and allegedly anonymized data, you have real trouble.
- LFLex Fridman
So is there hope for any kind of privacy in a world where a few likes can r- can identify you?
- MKMichael Kearns
So there is differential privacy, right?
- LFLex Fridman
So what is differential privacy? Let's hear that.
- MKMichael Kearns
Yeah. So, so differential privacy basically is a kind of alternate, much stronger notion of privacy than these anonymization ideas. And it, it, you know, it's a technical definition, um, but, like, the spirit of it is we, we, we, we compare two alternate worlds, okay? So let's suppose I'm a researcher, and I wanna do, you know, I, I, there's a database of medical records, and one of them is yours.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
And I want to use that database of medical records to build a predictive model for some disease. So based on people's symptoms and test results and the like, I wanna, you know, build a probab- you know, model predicting the probability that people have disease. So, you know, this is the type of scientific research that we would like to be allowed to continue.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
And in differential privacy, you act- ask a very particular counterfactual question. We basically compare two alternatives-One is when I do this, I build this model on the database of medical records, including your medical record. And the other one is where I do the same exercise with the same database with just your medical record removed. So basically, you know, it's two databases, one with N records in it and one with N minus one records in it. The N minus one records are the same, and the only one that's missing in the second case is your medical record. So differential privacy basically says that any harms that might come to you from the analysis in which your data was included are essentially, uh, nearly identical to the harms that would have come to you if the same analysis had done, been done without your medical record included. So in other words, this doesn't say that bad things cannot happen to you as a result of data analysis, it just says that these bad things were going to happen to you already-
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
... even if your data wasn't included. And to give a very concrete example, right, you know, um, you know, like we discussed at some length, the, the study that, you know, the, in the '50s that was done that created the, that, ah, established the link between smoking and lung cancer.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
And we make the point that, like, well, if your data was used in that analysis and, you know, the world kind of knew that you were a smoker because, you know, there was no stigma associated with smoking before that, those findings, real harm might have come to you as a result of that study that your data was included in.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
In particular, your insurer now might have a higher posterior belief that you might have lung cancer and raise your premium so you've suffered economic damage. But the point is, is that if the same analysis has been done without y- with all the other N minus one medical records and just yours missing, the outcome would have been the same. Your, your data wasn't idiosyncratically, um, crucial to establishing the link between smoking and lung cancer because the link between smoking and lung cancer is, like, a fact about the world that can be discovered with any sufficiently large database of medical records.
- LFLex Fridman
But that's a l- very low value of harm. Yeah, so that's showing that very little harm is done. Great, but how ... What is the mechanism of differential privacy? So that's the kind of beautiful statement of it, but what's the mechanism by which privacy is preserved?
- MKMichael Kearns
Yeah. So it's, it's basically by adding noise to computations, right? So the basic idea is that e- every differentially private algorithm, first of all, or, or, or every good differentially private algorithm, every useful one, um, is a probabilistic algorithm so it doesn't on a given input. If you gave the algorithm the same input multiple times it w- it would give different outputs each time from some distribution. And the way you achieve differential privacy algorithmically is by kind of carefully and tastefully adding noise to a computation in the right places. And, you know, to give a very concrete example, if I want to compute the average of a set of numbers, right, the non-private way of doing that is to take those numbers and average them and release, like, a numerically precise value for the average, okay?
- 1:22:32 – 1:27:49
The future of privacy: user control, regulation, and markets for data
- LFLex Fridman
So where do you think we'll land in this algorithm-driven society in terms of privacy? So, sort of, uh, China, like Kai-Fu Lee describes, you know, it's collecting a lot of data on its citizens but in the best form it's actually able to provide a lot of- sort of protect human rights and provide a lot of amazing services. In its worst forms, it can violate those human rights and- and, uh, limit services. So where do you think we'll land on... 'cause algorithms are powerful when they use data, so as a society do you think we'll give over more data? Is it possible to protect the privacy of that data?
- MKMichael Kearns
So I'm- I'm- I'm optimistic about the possibility of, you know, balancing the desire for individual privacy and individual control of privacy with kind of societally and commercially beneficial uses of data.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
Not unrelated to differential privacy are suggestions that say, like, well, individuals should have control of their data, they should be able to limit the uses of that data, they should even... you know, there's- there's, you know, fledgling discussions going on in research circles about allowing people selective use of their data and being compensated for it.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
Um, and then you get to sort of very interesting economic questions like pricing, right? And one interesting idea is that maybe differential privacy would also, you know, be- be a- a conceptual framework in which you could talk about the relative value of different people's data. Like, you know, to- to demystify this a little bit, if I'm trying to predict- build a predictive model for some rare disease and I'm trying to u- I'm gonna use machine learning to do it, um, it's easy to get negative examples because the disease is rare, right? But I really want to have lots of people with the disease in my dataset, okay? Um, but s- but... and so somehow those people's data with respect to this application is much more valuable to me than just, like, the background population, and so maybe they should be compensated more for it. Um, and so, you know, I think these are kind of very, very fledgling conceptual questions that maybe we'll have kind of technical thought on them sometime in the coming years.
- LFLex Fridman
(laughs)
- MKMichael Kearns
Um, but- but I do think we'll, you know, to kind of get more directly to answer your question, I think I- I'm optimistic at this point from what I've seen that we will land at some, you know, better compromise than we're at right now where, again, you know, privacy guarantees are few, far between, and weak, and users have very, very little control, um, and I'm optimistic that we'll land in something that, you know, provides better privacy overall and more individual control of data and privacy. But, you know, I think to get there it's, again, just like fairness, it's not gonna be enough to propose algorithmic solutions. There's gonna have to be a whole kind of regulatory legal process that prods companies and other parties to kind of adopt s- uh, solutions.
- LFLex Fridman
And I think you've mentioned the word control a lot and I think giving people control, that's something that people don't quite...... have in, uh, in a lot of these algorithms, and that's a really interesting idea of giving them control. Some of that is actually literally an interface design question, sort of just enabling, uh... 'Cause I think it's good for everybody to give users control. It's not, it's not a, it's almost not a trade-off, except that you have to hire people that are good at interface design. (laughs)
- MKMichael Kearns
Yeah, I mean, the other thing that has to be said, right, is that, you know, (laughs) it's a cliché but, you know, we, uh, w- the, w- is, th- the users of many systems, platforms, and apps, you know, we are the product.
- LFLex Fridman
Mm-hmm.
- MKMichael Kearns
We are not the customer. The customer are advertisers, and our data is the product, okay? So, it's one thing to kind of suggest more individual control of data and privacy and uses. But this, you know, it, if, if, if this happens in sufficient degree, it will upend the entire economic model that has supported the internet to date.
- LFLex Fridman
Mm-hmm.
Episode duration: 1:48:55
Install uListen for AI-powered chat & search across the full episode — Get Full Transcript
Transcript of episode AzdxbzHtjgs