Tuesday, September 15, 2009

A Very Hard Easy Probability Question

Here is a question I set my students today. It is, mathematically speaking, very easy. It can be solved using mental arithmetic, and requires no complicated formulae or advanced concepts at all. It is not a trick question. However, people always find it incredibly difficult to get right, and many people fail to understand the solution once it has been explained to them. I have had vitriolic arguments with distinguished colleagues who refuse to accept my reasoning. What do you think? I will post my answer in a day or two, but in the meantime let's hear some suggestions...

The question is this. Imagine you have been tested in a large-scale screening programme for a disease known to affect one person in a hundred. The test is 90% accurate, and you test positive. What is the probability that you have the disease?

Understanding this kind of question is very important, because it leads to exactly the dilemma you might face if you were screened for a major killer like breast cancer or testicular cancer, and tested positive. Should you immediately opt for a risky procedure to investigate further, or would that be to submit yourself to unnecessary surgery? Gerd Gigerenzer wrote the book Reckoning With Risk about this kind of problem (it's a brilliant book too, so read it if you can), concluding that thousands of people are facing unnecessary dangers because of poor understanding of probability. Over to you...

26 comments:

JuJu said...

Not sure about this being solved by mental arithmetic alone, but here's a pen and paper assisted stab.

From a population of 100 there will be 10 who have the disease and 990 who don't. Of those who do, the test will identify 9 of them correctly and give on false negative.

From those 990 who don't have it there will be 891 correct negatives and 99 false positives.

The total number of positive results will be 108, of whom 9 will have the disease.
Reliability of a positive result: 9/108 = 1/12

However you could be reassured if the test tells you that you don't have the disease, as there is only a 1/802 chance of a false negative.

Anonymous said...

eh, the test affects 1 in 100 of the population, so there is a 1 in 100 chance of having the disease. Who needs a test, since the test will appear some time!

Anonymous said...

sorry, the disease, not the test affects 1 in 100

Josef said...

1/1,000,000



Also have you seen "what the bleep do we know?" Its amazingly funny. Pseudo quantum science abounds.

Shaun "Sheepy" Yates said...

i believe this...

okay if we break it down it goes like this if we just belive the first statement " 1 in a hundred people get the disease " then you are 1% likely to get the disease.

now if you multiply the 1% by the 90% probability you will have..

0.9%

bam. thats how you use your noodle.

also JuJu you confused me when you brought up the 990 number :s

JuJu said...

Oh, Sox!- my comment was undone by a missing 0. It should read:

"From a population of 1000..."

It's only a typo, the rest of the maths still works though.

Anonymous said...

There is a 1 in 100 chance you have the disease, and a 1 in 10 chance the test you take is wrong.

therefore 0.9 x 0.01 = 0.09 chance or 9% chance that you would have the particular disease.

Steve:- said...

First, I would work out how many diseased people would be correctly diagnosed.
x= 0.9 (of the hundred people)

Then, how many non-diseased would be incorrectly diagnosed
y= 100-(99/100*90)= 10.9

Then, I would try this
= x/(x+y)= 0.9/11.8 = 0.076
Or around 8%

...I think :\

Neuroskeptic said...

JuJu is right. There's a 1/12 chance that the diagnosis is true and an 11/12 chance that it's a false positive.

This seems counter-intuitive, but here's a useful way of looking at it. Imagine that the disease were so rare that only 1 person in a billion had it. So 6 people in the world have it. But if you give the test to everyone in the world, 10% of people will test positive, 600 million, which cannot possibly be right because there are only 6 people in the world who have it.

Ben Wilson said...

Ben Wilson here listenned to your talk which will now be yesterday, lets say the large scale screen was 1000 people, so there is a 1 in a thousand chance of getting the disease for everyone in the screen.

Of the 10 persons who actually have the disease 9 would be positive for the disease yes but one wouldn't. But if we look at the other 990, the test would be 10% inaccurate so 89 would be found positive.

So if we then divide 9 (the positive victims) by 100 (all the tested people found positive, really diseased and not actually diseased) the Probability would actually be 0.09.

I look forward to my book...

(Andrew) Terry Baldwin said...

I just looked at this again and, although I may be wrong the answer seems obvious. If the test is 90% accurate and you test positive there is a 90% chance that you have the disease. Simple, I hope.

mccraig said...

1. definition of conditional probability

p(d=1|t=1) = p(t=1,d=1) / p(t=1)

i.e. the prob of having disease given a positive test is equal to the ratio of (prob of having the disease and testing positive) to (prob of testing positive whether or not you have disease)

2. sum-rule

p(d=1|t=1) = p(t=1,d=1) / sum_d_{ p(t=1,d) }

p(d=1|t=1) = p(t=1,d=1) / ( p(t=1,d=0) + p(t=1,d=1) )

3. product rule, numerator and denominator

p(d=1|t=1) = p(t=1|d=1)p(d=1) / ( p(t=1|d=0)p(d=0) + p(t=1|d=1)p(d=1) )

assuming 90% accurate means both a 10% chance of false positives [ p(t=1|d=0) = 0.1 ] and a 90% chance of correct positives aka false negatives [ p(t=1|d=1) = 0.9 ]

p(d=1|t=1) = (0.9 * 0.01) / ((0.1 * 0.99) + (0.9 * 0.01))

p(d=1|t=1) = 1/12

i.e. the number of false positives is far greater then the number of true positives, so despite a positive test chances are still only 1 in 12 that you have the disease

Stu said...

We're talking here about how many people have the disease not how many people will test positive for it, so the 90% is irrelevant, we have been told, 1% of people will have the disease that is your answer.

I remember you asking this question in my first year, I hope I've got it right, it would be very embarrassing if I haven't.

Another thing this will teach your students is the value of reading the question and answering it, rather than dilly dallying with a load of irrelevant bull.

Stu.

Ben Wilson said...

Hrm actually if i rounded it down it would actually 0.1, but its still a chance of 10%, since 1 would have the disease and probably be found positive, whilst out of the 99, 9 would be found positive but not actually have the disease due to the 10% inaccuracy of the test. So there would only be one person out of 10 to have the disease which is 0.1 or 10% in other words. My UCLAN Email is BWilson1@uclan.ac.uk if i have won.

Unknown said...

JuJu is correct and I'm not sure I can add much following the comment from Neuroskeptic.

Steve: the problem with your answer, which seems fairly intuitive, is that the probability of having the disease is different in the population as a whole and the sub-population of those who have received a positive result.

shaun 'sheepy' yates said...

i totally agree with JUJU now.

well done :]

shaun 'sheepy' yates said...

no wait!!

here the ratio of according to the test:


99 (lives) : 1 (dies)

what could also be said is...

990 (people live) : 10 (dies)

acording to the test.. however both sets of number are only 90% accurate therefore..

892 (people live) : 108 (whom die)

now dividing this by ten equals..

89.2% are disease free
10.8% will have the virus

YOU ARE 10.8% LIKELY TO HAVE THE DISEASE!!

Unknown said...

I apologise Steve. I meant to direct my response to Stu. Doh.

For those interested Yudkowsky covers this at the beginning of his discussion of Bayes theorem here:

http://yudkowsky.net/rational/bayes

Sorry if that steals your thunder a bit Mike.

Mike Eslea said...

Yes, JuJu has the right answer, and has explained it very neatly too. We'll forgive the typo.

What I find interesting about this question is the sheer number of different responses it gets. 90% is certainly the most common answer, and many people take a lot of persuading that you DO have to take into account the baseline frequency. Anyone here still not convinced?

1% is probably the next most common, followed by 0.9%, obtained by multiplying the two percentages together. Of the other answers here, Ben Wilson and Steve are agonisingly close to being correct. Steve, I think you were undone by a mistake in the first term of your second equation, which should have been 99 not 100, and Ben I think you made two mistakes - an 89 that should have been a 99, which when added to the 9 would have made 108, not 100. Don't know where you got the 100 from! But nice try anyway. Sheepy, I can't get my head around your last approach - I will have to think about it. But it ain't right.

The main point Gigerenzer makes is that people are much better at understanding natural frequencies (the "imagine 1,000 people" way of doing it) than percentages or probabilities. Mccraig here is obviously an exception, judging by those beautifully presented formulae!

I said in the lecture I would give a copy of Francis Wheen's "How Mumbo Jumbo Conquered The World" to the first student who could post the right answer. I know Juju and Neuroskeptic aren't students so if mccraig is, that makes him/her the winner of the book. Congratulations! Call in at my office to collect it.

Anonymous said...

Hello,

Well done JuJu.

I'm a bit confused though... please can you explain JuJu's correct answer of 12% with the example Gigerenzer uses in his book on pages 5 and 6 which states an answer of 10%.

'The probability that a woman of age 40 has breast cancer is about 1 percent. If she has breast cancer, the probability that she tests positive on a screening mammogram is 90 percent'

...

'Think of 100 women. One has breast cancer, she will probably test positive. Of the 99 who do not have breast cancer, 9 will also test positive. Thus a total of 10 women will test positive'

...

'Now it is easy to see that only one woman out of ten who test positive actually has breast cancer.'

What am I missing in comparing your example with G's??

Mike Eslea said...

@Anonymous

Mine is a slightly simpler Q than GG's, because he has different probabilities for a false negative (10%) and a false positive (9%). He also rounds up the numbers 0.9 and 8.91 to get 1 and 9, whereas in my version the numbers work out exactly to 9 and 99.

@Joseph

No, I didn't see "Bleep", partly because I would never pay to go to something like that, and partly cos it would make me so angry!

Mike Eslea said...

@Anonymous again

I just noticed you said Juju's correct answer was 12%. In fact it is 1 in 12, which is approx 8% (actually 8.33 recurring)

Josef said...

Well Mike I think you might need to start channeling your Chakras. I got the person from skyTV to do mine.

Steve:- said...

Curse my minor oversight!
I would have totally had it.

:(

shaun 'sheepy' yates said...

i feel really stupid now :[

i still dont understand :[

x109876378

phayes said...

“However, people always find it incredibly difficult to get right, and many people fail to understand the solution once it has been explained to them.”

Perhaps draw them a picture: in the case of uniform test accuracy, a square representing the (tested) population divided internally into four rectangles by an appropriately placed pair of lines (1 vertical and 1 horizontal) and labelled. accordingly It's then very easy both to see why the answer is what it is and to calculate it.