Wednesday, April 20, 2016

Why is the statistical package SPSS so unhelpful?

I've just run a statistical test on SPSS to see if there is a difference between articles in the Guardian and Telegraph in terms of Characteristic X (it doesn't matter what X is for my purposes here). The results are pasted below. The presence of X is coded as 1, and its absence by 0.

The first table shows that a higher proportion of Guardian articles (33.5%) than Telegraph articles (24.1%) had X. The second table addresses the issue of statistical significance: can we be sure that this is not a chance effect that would be unlikely to recur in another sample of articles?

Paper * Code Crosstabulation

Code
Total
.00
1.00
Paper
Guardian
Count
121
61
182
% within Paper
66.5%
33.5%
100.0%
Telegraph
Count
60
19
79
% within Paper
75.9%
24.1%
100.0%
Total
Count
181
80
261
% within Paper
69.3%
30.7%
100.0%


Chi-Square Tests

Value
df
Asymp. Sig. (2-sided)
Exact Sig. (2-sided)
Exact Sig. (1-sided)
Pearson Chi-Square
2.322a
1
.128
.145
.083
Continuity Correctionb
1.898
1
.168


Likelihood Ratio
2.386
1
.122
.145
.083
Fisher's Exact Test



.145
.083
N of Valid Cases
261




a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 24.21.


I decided I would like a two sided significance level, and looked at the second table to find it. Unfortunately there are no fewer than four different answers (0.128, 0.168, 0.122 and 0.145)! Which to choose?

Further study of the table only deepened my confusion. The heading is Chi-Square tests but two of the columns are headed Exact Sig. My understanding is that the chi-square test uses the chi-square distribution which is a well known way of working out approximate probabilities. The exact test works out the equivalent probabilities directly without using the chi-square distribution, so the entries in the Exact test columns are not chi-square results despite the table heading. One of the rows is headed Fisher Exact Test and another Pearson Chi-Square which seems to confirm this. But what can we make of the top right figure (0.083) which is Chi-square according to the table heading, Pearson Chi-Square according to the row heading, and Exact Sig according to the column heading? Help!

OK, I know I should have consulted the Help (it doesn't work on my computer so I can't), or a book on using SPSS, or gone on a course and provided employment for an expert. But I don't think this should be necessary. SPSS should produce clear  tablese A with a little explanation of what the numbers mean. In the present case, as exact probabilities can be computed surely this is all that's needed. With a sensible heading for the table, and a little note on what the probabilities represent.

SPSS should produce clear, consistent tables which present only the relevant information with an explanation in, as far as possible, non-technical language.

But then people might understand the output and the market for courses and experts would be much diminished.

Thursday, May 28, 2015

Six sigma and the Higgs Boson: a convoluted way of expressing unlikeliness

A few years ago I was asked by IBM to help them calculate "sigma levels" for some of their business processes. Sigma levels are part of the "Six Sigma" approach to  monitoring and improving business quality developed by Motorola in 1986, and since used by numerous consultants right across the world to package well known techniques in order to con money out of gullible businesses.

The name, of course, was an important factor in helping the Six Sigma doctrine to catch on. It is mysterious, with a hint of Greek, both of which suggest powerful, but incomprehensible, maths, for which the help of expensive consultants is obviously needed.

Sigma is the Greek letter "s" which stands for the standard deviation - a statistical measure for the variability of a group of numerical measurements. Sigma levels are a way of relating the number of defects produced by a business process to the variability of the output of the process. The details are irrelevant for my present purposes except in so far as the relationship is complicated, involves an arbitrary input, and in my view is meaningless. (If you know about the statistics of the normal distribution and its relation to the standard deviation you will probably be able to reconstruct part, but only part, of the argument. You should also remember that it is very unlikely that the output measurements will follow the normal distribution.)

The relationship between sigma levels and defect rates can be expressed as a mathematical formula which gives just one sigma level for each percent defective, and vice versa. Some examples are given in the table below which is based on the Wikipedia article on 25 April 2015 - where you will be able to find an explanation of the rationale.

(An Excel formula for converting percent defective to sigma levels is =NORMSINV(100%-pdef)+1.5, and for converting sigma levels to percent defective is =1-NORMDIST(siglev-1.5,0,1,TRUE) where pdef is the percent defective and siglev is the sigma level. The arbitrary input is the number 1.5 in these formulae. So, for example, if you want to know the sigma level corresponding to a percent defective of 5%, simply replace pdef with 5% and put the whole of the first formula including the = sign into a cell in Excel. Excel will probably format the answer as a percentage, so you need to reformat it as an ordinary number. The sigma level you should get is 3.14.)

Sigma level
Percent defective
Defectives per million opportunities
1
69.1462461274%
691462.4613
2
30.8537538726%
308537.5387
3
6.6807201269%
66807.20127
4
0.6209665326%
6209.665326
5
0.0232629079%
232.629079
6
0.0003397673%
3.397673134
7
0.0000018990%
0.018989562
2.781552
10%
100000
3.826348
1%
10000
4.590232
0.10%
1000
5.219016
0.01%
100
5.764891
0.0010000000%
10
6.253424
0.0001000000%
1
6.699338
0.0000100000%
0.1

But what, you may wonder, is the point in all this? In mathematics, you normally start with something that is difficult to understand, and then try to find something equivalent which is easier to understand. For example, if we apply Newton's law of gravity to the problem of calculating how far (in meters, ignoring the effect of air resistance) a stone will fall in ten seconds, we get the expression:
Io5 9.8dt
(represents the mathematical symbol for an integral that I can't get into Blogger.)

If you know the appropriate mathematics, you can easily work out that this is equal to 122.5. The original expression is just a complicated way of saying 122.5.

The curious thing about sigma levels is that we are doing just the opposite: going from something that is easy to understand (percent defective) to something that is difficult to understand (sigma levels), and arguably makes little sense anyway.

In defence of sigma levels you might say that defect levels are typically very small, and it is easy to get confused about very small numbers. The numbers 0.0001% and 0.001% may look similar, but one is ten times as big as the other: if the defect in question leads to the death of a patient, for example, the second figure implies ten times as many deaths as the first. Which does matter. But the obvious way round this is to use something like the defectives per million opportunities (DPMO) as in the above table - the comparison then is between 1 defective and 10 defectives. In sigma levels the comparison is between 6.25 and 5.76 - but there is no easy interpretation of this except that the first number is larger than the second implying that first represents a greater unlikelihood than the other. There is no way of seeing that deaths are ten times as likely in the second scenario which the DPMO figures make very clear.

So why sigma levels?  The charitable explanation is that it's the legacy of many years of calculating probabilities by working with sigmas (standard deviations) so that the two concepts become inseparable. Except of course, that for non-statisticians they aren't connected at all: one is obviously meaningful and the other is gibberish.

The less charitable explanation is that it's a plot to mystify the uninitiated and keep them dependent on expensive experts.

Is it stupidity or a deliberate plot? Cock-up or conspiracy? In general I think I favour the cock-up theory, partly because it isn't only the peddlars of the Six Sigma doctrine who are wedded to sigma mystification. The traditional way of expressing quality levels is the capability index cpk - this is another convoluted way of converting something which is obvious into something which is far from obvious. The rot had set in long before Six Sigma.

And it's not just quality control. When the Higgs Boson particle was finally detected by physics researchers at CERN, the announcement was accompanied by a sigma level to express their degree of confidence that the alternative hypothesis that the results were purely a matter of chance could be ruled out:
"...with a statistical significance of five standard deviations (5 sigma) above background expectations. The probability of the background alone fluctuating up by this amount or more is about one in three million" (from the CERN website in April 2015. The sigma level here does not involve the arbitrary input of 1.5 in the Excel formulae above: this should be replaced by 0 to get the CERN results.)

Why bother with the sigma level? The one in three million figure surely expresses it far more simply and far more clearly. 

Friday, January 16, 2015

Two possible futures

I've just had another conversation with my friend, Zoe, who has solved the riddle of travelling backwards through time. She's just returned from the year 2050: her memories of the future are hazy but fascinating.

In fact she's been to not one future but two - it turns out that all the speculation among physicists about multi-verses is spot on - there are billions of universes, each representing a possible future for us, and she's been to two of them. The rules of travel through time, and between universes, mean that she is unable to remember much detail, but one fascinating point from first of the two universes she went to is that the accepted paradigm in fundamental physics is the "God with a sense of humour hypothesis." Apparently this is the only hypothesis which fits all the known facts, in particular the apparent arbitrary oddness of the laws of nature.

About 20 years ago - talking now from the first 2050 future - two principles from physics migrated to mainstream culture with far-reaching effects. The first was the idea of an absolute limit to the complexity of ideas that the human brain could deal with. The second was the principle that exact laws of nature were unobtainable in the sense that they necessarily needed ideas more complex than this limit. Together these yielded a third principle that knowledge should be designed so as to reduce "cognitive strain" as much as possible. This last principle then led to dramatic changes in the framework of human knowledge. Instead blaming children who found their school work too difficult, extensive research was undertaken to reduce the cognitive strain (or to make it easier). Similar efforts were made with more advanced ideas: for example, Schroedinger's equation - the basic equation of quantum physics that describes how things change through time - was transformed to a user-friendly bit of software with a sensible name that even young children could use and understand. The new version was formally equivalent to the original equation, but far more accessible

This change had a number of far reaching effects. Universities stopped providing degree courses for the masses because the content of old-style degree courses was just too easy and commonplace. A lot of it, like Schroedinger's equation, had entered mainstream culture, and some of it was accessed on a just-in-time basis when needed.

Progress at the frontier of most disciplines had accelerated sharply when these changes came through. The fact that the basics were so much easier meant that there were many more people working at the cutting edge, and the fact that they got there quicker meant that there was more time to work on problems. The old idea that experts spend ten years acquiring their expertise was still true, but the amount of useful expertise you could acquire in your ten years was much, much more.

Cancers, heart disease, and unplanned death in general, were largely conquered, and Zoe was impressed with the solution to the problem of over-population that this would cause, but unfortunately she couldn't remember what this solution was. (Infuriatingly, the rules of time travel and universe hopping set by the God with a sense of humour means that Zoe could only remember a few details of this future.)

The second future had much more in common with the present. The school curriculum was virtually unchanged, university degrees now lasted for ten years, cutting edge research was even more dominated than it is now by professional researchers using language and concepts almost completely inaccessible to laypeople. Cancer and heart disease rates had improved but only marginally.


Zoe much preferred the first future. Unfortunately the God with a sense of humour, while allowing her to go and have a look, and absorb some of the atmosphere, blocked details like how the user-friendly version of Schroedinger's question worked, and the nature of the advances that had largely eliminated common diseases. 

Tuesday, August 12, 2014

The cult of the truth

Everyone seems to believe in the truth. By which, of course, they don’t mean the truth in which other, misguided souls believe, but in their truth which is obviously the right one. The devout Christian has a different version from the devout Muslim, and the devout atheist will think they are both mad.

It is not just religious maniacs who believe in the truth. It is deeply embedded in the world view of science, of common sense, and even fields of academic inquiry which see themselves as being hostile to what they perceive as science. The truth rules supreme everywhere, or so it seems.

But what is truth? When we say something is true we usually mean, I think, that it corresponds to reality – the so-called correspondence theory of truth. But what is reality, and how can human ideas “correspond” to it? Surely human ideas are a completely different type of thing from reality, so what sense does the idea of correspondence make? Perhaps what we see as the truth is part of a dream, or part of a way of seeing the world we make up in collaboration with other people – as the social constructivists would have us believe? This latter perspective seems obviously true (whatever that may mean!) to me - but this may just be the dream into which I've been socialized.

However, let’s accept the idea of truth and try to guess where it might have come from? If we accept the theory of evolution by natural selection, the answer is simple: the idea of truth helped our ancestors survive. A belief in the truth about lions and cliffs helped our ancestors avoid being eaten by the former and falling off the latter. The idea of a fixed reality, which we can apprehend and see as the truth, is obviously a very powerful tool for living in the everyday world. People who did not believe in the reality of lions and cliffs would not have survived to pass on their genes.

This implies that the idea of objective reality and the assumption that we can apprehend the truth about it is merely a human convenience. Frogs or intelligent aliens would almost certainly view the world in very different ways; what we see as truth and what they see as truth would, I think, be very different.

Most statements in ordinary languages presuppose the idea of truth. When I say that Sally was at home at 10 pm on 1 August 2014, I mean that this is a true statement about what happened. Further, if Sally is suspected of murdering Billy 50 miles away at 10 pm on 1 August 2014, then if it's true that she was at home then it can't be true that she murdered Billy. She can only be in one place at one time - "obviously". Ideas of truth, and the "objective" reality of objects in time and space, and the fact that one object can only be in one place at one time, are all bound up in our common sense world view. It is almost impossible to talk in ordinary language without assuming the truth of this world view - it is just "obviously" true.

However the concept is truth is often taken far beyond everyday comings and goings of everyday objects. So we might say that it is true that God exists, that all water molecules comprise two hydrogen and one oxygen atoms, that married people are happier than unmarried people, and that the solutions of the equation x2+1=0 are x=+i and x=-i.

The difficulty is that, outside of the realm of everyday experience, the notion of truth is actually rather vague, may be difficult to demonstrate conclusively, and may come with implications that are less than helpful. Short of taking the skeptic to meet God, demonstrating his existence is notoriously difficult.  We can't "see" molecules of water in the way we can see Sally at home, so the truth about water molecules needs to be inferred in other ways. Saying that the married are happier than the unmarried is obviously a statement about averages - there will be exceptions - and it also depends on what is meant by "happier". And mathematical statements are statements about concepts invented by mathematicians: applying the word true is obviously stretching the concept considerably. It is all much less straightforward than the truth that Sally was at home at 10 pm on 1 August 2014.

The idea of truth has a very high status in many circles. Saying you are seeking the truth sounds unquestionably praiseworthy. If you say something is true, then obviously you can't argue with it. Truth is good so we like to apply the concept all over the place. I'll refer to this assumption that the idea of truth, and the inevitably associated idea of an objective reality with solid objects persisting through time, apply to everything, as the cult of the truth. This notion is rather vague in terms of the assumptions about reality that go hand in hand with the idea of truth - but this is inevitable as the idea of truth gets extended further and further from its evolutionary origin. Cults, of course, depend on vagueness for their power, so that the cult's perspective can be adjusted to cater for any discrepancies with experience.

Does the cult of the truth matter? Does it matter that the idea of truth is extended far beyond its original focus? Let's look at some different areas of knowledge.

Some of the conclusions of modern physics contradict the implicit assumptions of the cult of the truth. At very small scales things can be in two places at once, and reality only makes sense in relation to an observation; for observers at high speeds measurements of physical processes are different, and the notion of  things happening at a particular time depends on the motion of the observer. This all does considerable violence to everyday assumptions about reality, but physicists would simply that these are outdated and that their notion of reality is more sophisticated. It seems to me as a non-physicist, that these theories have sabotaged the idea of the truth about an objective reality beyond repair. I am reading Brian Greene's book, The hidden reality, about parallel universes, but I can't take the idea of truth seriously in relation to universes hovering out of reach which we will never, ever, be able to see in any sense. The hypothesis that the book seems to be driving towards is that we are living in a simulated world devised by Albert Einstein whose theory of general relativity seems to underpin everything.

Does this matter? Probably not for physics. The illusion of the quest for the truth about everything is probably necessary to keep physicists motivated. But in the wider sphere it is worrisome if naive and outdated ideas of physics underpin other disciplines.

The idea of truth is best regarded as a psychological convenience - usually necessary, often useful, but occasionally a nuisance. Am I claiming this statement itself is true? Of course not! My argument obviously undermines itself. But I do think it’s a useful perspective.

Beyond the rarefied world of modern physics the cult of the truth does create problems. Perhaps the most serious is that the status of truth (and science and the study of objective reality) undermines important areas which can't be incorporated into the cult of the truth. The ultimate aim of many social sciences is to make the world a better place in the future. We might, for example, be interested in making workplaces happier. The idea of truth fits comfortably with the obvious first stage of such a project - to do a survey to find out how happy workers are at the moment, and what their gripes are. The obvious things to do next would be to look at what the workers want, at what they value, and try to design workplaces to fit these requirement. This seems a more important part of the research than the initial survey, but value judgments, and the design of possible futures, do not fit neatly with the cult of the truth. So they are not taken as seriously as they should be. Most of the thought and work goes into studying the past, and the more important issues of working out what people want and how to design a suitable future, tends to get ignored. OK, so the idea of truth could be extended to include these, but only by bending it so that it gets stupid; the cult of the truth tends to deflect our attention from the problem of designing futures.

In fact the situation is even worse than this because the truths studied in many social science tend to be of rather limited scope. So we study how happy people are in particular organizations at particular times. So what? Everyone knows the situation may be very different elsewhere. The truths studied by physicists are assumed to apply everywhere throughout time (although this can be challenged over billions of years or light-years), but the truths of many social sciences are very parochial. The cult of the truth restricts our attention to trivial questions, dismisses the big questions as trivial because the idea of truth does not apply.

There are further unfortunate side effects from taking truth too seriously. If we have one theory which is deemed true, this may be taken to imply that other theories covering the same are assumed to be false. This may be too restrictive: there could be different ways of looking at the same thing, some of which may, perhaps be more aesthetically appealing, or easier to learn about or use. Truth is not the only important criterion. This is particular true of statistical truths, which may sometimes be so fuzzy as to be almost useless.


So, to recap, truth is best treated as a necessary illusion, often, but by no means always, necessary: it should not be taken too seriously outside the realm of statements about the comings and goings of everyday objects. The last sentence is itself close to asserting a truth whose validity it denies: a fully coherent argument here is not possible, but does this matter? Incoherence gives us more flexibility.

Friday, May 30, 2014

Cambridge University closed to undergraduates

I'm lucky enough to have a friend who has solved the knotty problem of travelling backwards through time. She sent me this news report from the Mumbai based World News dated 1 January 2050:

Cambridge University in the UK has finally bowed to the inevitable and closed its doors to new undergraduates. The last cohort started in October last year: their final ceremonial dinner in the historic dining halls was on Christmas day, and they will formally receive their degrees in the New Year. For the last two years Cambridge has been the only university in the world offering degree courses. This new move brings to an end an era which has lasted for centuries.

Until about 2010 a university degree was regarded as proof of the bearer's competence, knowledge, or expertise in some domain. Doctors and engineers with degrees were considered safe to practice; any degree was treated as giving the holder the status necessary to teach their subject. Even degrees in disciplines without any obviously useful or fundamental knowledge at their core , such as English Literature, or Golf Studies, were treated as valid, and marketable, evidence of general competence. If someone had a degree then they could be trusted to do a good job. Or so the assumption went until about 2010.

Then things changed, gradually at first, but then faster, so that now the idea that a university degree is evidence of any kind of competence is frankly as quaint and old fashioned as the idea that serious sport could be drug free.

For some time it had been obvious that many really successful people did not have university degrees - they either never went or dropped out. And most important developments did not seem to require or use university expertise. It was the geeks (Bill Gates, Mark Zuckerberg et al) who hit the headlines, but there was more to it: the stuff taught in degree courses was becoming increasingly old-fashioned and irrelevant.

But the thing that lit the fuse that destroyed degree courses was less obvious. It was the obsession with detecting and punishing "plagiarism" (I've omitted the rather lengthy explanation of this, and other terms in quotes which are not familiar to 2050 readers). Rules were drawn up, software was developed to detect the crime, and there was a strict culture of intolerance to any hint of illicit copying.

From a 2050 perspective this is very odd indeed. Culture depends on copying, maintaining clear links to individual ownership of intellectual property is often difficult, and is now generally agreed to hinder progress. But old-style degrees were based on the assumption that acquiring wisdom is hard, and incentives and measures of attainment are necessary, so individual students need to be "assessed" on the basis of work they have done on their own without any illicit help. It was a sort of sport: the degree material was kept deliberately difficult and often unpleasant, and students had to demonstrate their competence by "assignments" and "exams". Plagiarism was simply a way of cheating, like taking drugs in sports competitions in the early years of the century.

(Students in the last Cambridge cohort did take exams, but their original purpose, and the fuss over plagiarism, was long forgotten - students bought standard answers from the university to copy out in the exam ceremony. This year there was a surge in demand for third class answers, which cost three times as much as answers that would yield a first class degree.)  

This obsession with plagiarism led to two big problems. First, more and more assessments were designed primarily to prevent cheating. So instead of a sensible piece of work which students could have completed with any relevant technological aids, the focus was on short exams where technological aids, even books and notes, were banned. Which, of course, meant that the expertise which was taught and assessed became more and more useless.

The second problem was less predictable, and it took the universities a long time to acknowledge. Plagiarism detection had developed into an arms race, with progressively more sophisticated methods and software both on the university and on the student side. Many of the students treated it as a game which the brightest did very well at. Then employers started to realize what was going on, and that the brightest students were those who were guilty of plagiarism but had not been caught. This meant that the best cv's had two components: a certificate stating that the student's studies had been plagiarism free from the university, and some clear evidence from the students that, in fact, the student had plagiarized extensively but not been detected.


The end for the universities came when a university sponsored study demonstrated conclusively that students with this type of cv were more successful than students with good degree classifications.

Tuesday, February 11, 2014

Being a student in the twenty first century: challenging the consensus

Just been to a seminar on being a student in the twenty-first century. Lots of clichés - increasing complexity and "supercomplexity" of the world, inadequacy of knowledge and skills, "lifewide" education, etc, etc. The world is changing and the student experience needs to change too. Obviously.

The speaker encouraged all comments from the floor, so the clichés were interspersed with a random selection of comments as everyone got on their own particular hobby horse. The seminar leader contrived to turn every comment into a platitude he could agree with - we must treat students as people, there are no right answers, things are getting progressively more complex, and so on and so forth.

There are two general sets of assumptions behind this sort of discussion - mutually contradictory, and both unhelpful. The first is that students and teachers, or facilitators, are always engaged in a collaborative, consensual process with no right answers, and the teacher does not possess superior expertise. This was certainly the philosophy espoused and practised by the seminar leader. He did not set himself up as the expert, and all contributions were accepted and valued. However, it's probably more accurate to say that there were no wrong answers, because all suggestions were accepted as right.

The second is that learning is hard, often unpleasant, and requires incentives, which means that it is inevitable that many learners will fail, and certification is required to distinguish the successful from the failures. Failure obviously implies that the learners' answers are wrong, and that the teachers' answers are right: the teacher is the expert and the teacher and the learner do not agree about right and wrong. This is never made explicit, but is implicit in the talk about dealing with learners' anxieties. Assessment in some form is always assumed, and this makes little sense without clear definitions of right and wrong.

This prompts two thoughts. First, the contradiction between the two sets of assumptions needs to be faced. The first set of assumptions is actually too silly to be worth probing in detail: experts obviously do have some expertise (although usually not as much as they think they do), and some answers are obviously wrong. The second set of assumptions is less obviously flawed, but I think that overturning it, which would mean redefining education, would be hugely beneficial. If the system could be redesigned so that there is more success, and the blame for a lack of progress is not laid at the door of the poor anxious student - this would surely be a good thing. I have outlined some thoughts along these lines briefly in this article, and in more detail at http://woodm.myweb.port.ac.uk/nothard.pdf.

The second thought is about the sterility of this kind of session.  The introductory ideas proposed by the seminar leader were really platitudes: the sort of things you couldn't disagree with without feeling like an idiot or a villain. And then the interjections were mostly along the same lines, and any that weren't were either ignored, or redefined so that they are consistent with the dominant mood.


Sessions like this would be more productive if they had more of an edge, if they incorporated some negative or disruptive thoughts to challenge the cosy consensus. But for this to work, we need to learn to suspend our initial distrust of uncomfortable ideas, and give them a chance to see where they lead.

Tuesday, July 23, 2013

Examining a PhD

I was talking to a colleague in another university recently about a candidate she had just examined as the internal examiner. Like many internal examiners she didn't know much about the topic - which was a fairly technical topic which non-specialists feel, perhaps erroneously, that they can cope with. So she was reassured to meet the external, and realize that he was a genuine expert - he definitely knew what he was talking about.

From then on, my colleague's sense of reassurance started to disappear. First the external asked if there was any reason why the candidate must pass. He was obviously referring to financial ties with the sponsoring organization. The university administrator mumbled no, of course not, in a rather embarrassed way, and the viva got under way.

It was obvious that the candidate knew little about the topic, and his research seemed to consist of little more than the application of a computer program to his case study. Strangely some of the outputs from this program were negative, in a context where negative number made little sense. It was a bit like estimating the age of some fossils and getting a negative number indicating that the fossils were laid down in the future! The candidate was asked for an explanation. He did not know. He was also asked about the computer program. What models was it based on? Where did the answers come from? Again the candidate obviously did not know.

At the end of the viva the candidate was asked if he had any questions or comments. The candidate's supervisor, sitting listening to the viva, then put his hand up and said, yes, he had something to say. He explained that the reason for negative numbers was that the program was comparing two things. So it was a bit like saying that the fossil was a million years younger than another fossil, which of course made sense. But the candidate did not understand this well enough to explain it himself during the viva.

What to do? My colleague's view was that the candidate should fail, or perhaps be asked to do some extra work and resubmit for an MPhil. At the very least, as well as explaining the negative numbers, she thought the candidate should explain and evaluate the model on which the program was based.

The external, however, disagreed. He thought the candidate was not capable of doing this and so should not be asked. He was the expert. My colleague had no real expertise in the area, and was supporting the home team, so she agreed. The candidate was asked to do a few simple things, tailored to what he was thought to be capable of. He was awarded his PhD a few months later, despite the fact that he really did not know much about the topic.

Does this PhD really mean anything?

Tuesday, March 26, 2013

Winning an Oscar, living longer, and the strange idea of a p value



“Win an Oscar, live longer” said the Sunday Times headline on 27 February 2011. Oscar winning actors, apparently, live 3.9 years longer than other actors. Presumably Daniel Day-Lewis, with his three Oscars, has booked himself an additional 12 years to savour his success! 

This was based on an article, Survival in Academy Award–Winning Actors and Actresses, published in the Annals of Internal Medicine in 2001. How can we be sure this is right? The statistic given in the article to answer this question is p = 0.003. This is the so-called p value and is the standard way of describing the strength of evidence in statistics. 

The p value tells us that the probability of the observing data as extreme as this (from the perspective of winners surviving longer than non-winners), on the assumption that winning an Oscar actually conferred no survival advantage at all, is 0.003, so there must be something about winning an Oscar that makes people live longer. Obviously the lower this p value is the more conclusive the evidence for winners living longer.

Confused? Is this really obvious? The p value is a measure of the strength of the evidence that does not tell us how likely the hypothesis is to be true, and has the property that low values indicate high levels of certainty. But this is the system that is widely used to report the results of tests of statistical hypotheses. 

Another way of analyzing the result would be to say that the evidence suggest that we can be 99.85% confident that Oscar winners do, on average, live longer – as suggested in “P values, confidence intervals, or confidence levels for hypotheses?”. This seems far more straightforward, but nobody does it this way. P values dominate, despite, or perhaps because of, their obscurity.

There is another big problem with this research. In 2006 the journal published another article, Do Oscar Winners Live Longer than Less Successful Peers? A Reanalysis of the Evidence”, pointing out a major logical flaw in the research design. Actors who live a long time obviously have more chances to win an Oscar than those who die young. The authors cite an 1843 study pointing out “the greater longevity of persons who reached higher ranks within their professions (bishops vs. curates, judges vs. barristers, and generals vs. lieutenants).” The original study failed to take account of this; when this factor is taken into account, the additional life expectancy is only one year and the confidence that winners will live longer is 93% (which is conventionally not considered statistically significant). This is obviously a separate problem to the p value problem, but it does make me wonder whether obscure statistics, of which the p value is just a minor part, can help researchers hide the logical flaws in their study, perhaps even from themselves.

Even more worryingly, the Sunday Times article claiming Oscar winners live longer was published five years after the article challenging the original research, and included a quote from the author of the original research saying that they get “more invitations to cool parties. Life is better for Oscar winners.” Why let truth get in the way of a good story?