Cookies   I display ads to cover the expenses. See the privacy policy for more information. You can keep or reject the ads.

Video thumbnail
Welcome back everybody. Today I'm in Oxford and I'm talking to Dorothy Bishop
who's a professor for psychology at the University of Oxford thank you so much
for your time Dorothy I read your interesting command in nature magazine
about the reproducibility crisis could you maybe start with briefly summarizing
what this is all about yes so in psychology we've been very well
we're just over the recent probably 10 15 years that there's been a problem
with psychology results not replicating and for some time people just thought
this was maybe a pedantic problem that statisticians were picking up on and so
on and then we had a big study done that
actually tried to reproduce some results that have been published in quite
respectable journals and found that really only about 30 to 40% of them did
reproduce and so everybody started to get a bit alarmed about this and
consider what the problem was and what we can do about it
and I really first got heavily involved for now disaster chair meeting at the
Academy of Medical Sciences which wasn't really so much about psychology but
about biomedical sciences in general and it turned out that they had also been
really getting concerned interestingly enough largely because the
pharmaceutical industry was getting concerned because they were trying to
build on results that had been coming out of experimental labs in bioscience
and finding that they couldn't reproduce the basic result so they you know
couldn't get taught at the past first base and so we had this very interesting
symposium which had a whole diverse collection of people that included
physicists telling us about how they do things in physics but also people from
industry and a lot of issues arose and it was clear there's no one cause and
there's no one solution and that some of the solutions are more to do with
incentive structures which i think is something you're very interested in but
also I got more and more interested in the extent to which we had problems with
how people were designing their studies often not intentionally this we're not
talking here about fraud we're talking sort of people designing studies in ways
that are not going to be optimal for finding out what's really going on and
so in this nature paper I just summarized some things I have from a
talk I'd give on this which was really talking about what I call the Four
Horsemen of the reproducibility apocalypse I have to try and remember
all of them I mean the first one is publication bias it so the fact that we
know and that affects all disciplines that it's much easier to publish things
that show a positive exciting result than another result and that was really
affecting I think not just psychology and it's been known about for a long
time but it just goes on and on and to the extent that I think most scientists
wouldn't even try and publish something if it wasn't significant because they
will feel they wouldn't get me wouldn't get accepted so there's this notion of
this big file drawer full of stuff that isn't published which distorts the
literature because if you think of any particular quest research question there
should be a sort of cumulative process of building on previous research but if
the only research that comes through and is sort of filtered out is the stuff
with positive results you get a very distorted idea so publication bias is
number one number two is what we call P hacking in psychology which is not so
much that you select which papers to publish but from within a study you
select which specific results to pull out and focus on and there's lots of
ways you can P hack you can analyze your data many ways and just focus on the one
way that gives you an interesting result or you may gather data on lots and lots
of variables but only report the ones that look exciting and I got the
impression that a lot of psychologists don't realize this is a problem and
don't really appreciate how again this can really distort the literature
because they tend to think oh I've found something with the p-value is less than
0.05 it must mean something and so that is that is the second big one it
certainly in psychology the the next one is power which is people doing studies
with samples that are simply small to show an effect that they want
to show again this has been talked about since the 1970s but like my impression
again is that a lot of psychologists think it's statistics statisticians
being pedantic and that really it doesn't matter and you know so if they
can only test 20 people in each of two groups you know they may be sort of go
on a wing and a prayer and think they'll find something and in fact the odds are
very high that most of the sort of effect sizes we're talking about in
psychology are really quite small and it's become clear we need much much
bigger samples so what's the typical sample size of your visit like some 50
people or well it depends on the sub area so I mean in some areas it's really
difficult to get large samples if you work as I do with special groups so you
know it can take three years for us to collect a sample of 30 children of a
particular kind but the solution there then is to sort of collaborate and
people are beginning to realize that we have to form larger collaborative
enterprises to investigate some questions I mean if you're just doing
questionnaire studies it's obviously easy to get large samples so it's very
varied and with things like brain imaging it's just very expensive to
collect large samples because each brain scan you may pay 500 pounds so you know
people are not motivated to sort of get large samples until they realize yet
it's a real problem and then just that the fourth horseman is something that is
described was described again many years ago as harking which is hypothesizing
after the results unknown and it ties in with pea hacking because this is looking
at your data and then finding from the morass of stuff there that you pull out
one little interesting looking thing but then you make out when you write it up
that this was your hypothesis so you tell a good story and we're all told
when we write up our results it's important to tell the good story but
what you're not supposed to do is use the same data to generate a hypothesis
test the hypothesis when you're harking that's what you're doing you're first of
all looking at the data and then you're saying oh that suggests this hypothesis
and they use the same data to test it and again that can really create a lot
of problems so those are the four things that I've been particularly focused on
and I'm very interested in how we might fix them and part of it is just
educational and I'm very keen on using simulated data to this sort of where you
you know you you are God you know what the truth is because you've made up this
data set to have certain characteristics and then you can show people how you
know they compete how can find something significant when there's nothing there
and conversely you can simulate data where there is a real event and show
people how if their sample is too small how easy it is to miss it so that is
beginning to become more common in education but it's still not routine
that people are taught statistics that way and I think that's a lot of the
problem so maybe I should add for clarification that the problem was tried
different analysis methods on the data is that you get the statistical
significance wrong because every different method of analysis is a new
attempt yes to find something new yes so you can you can screw yourself over by
trying just around as often as you or want to and this time I'm afraid that
this is something that parsh it partially also happens yeah in physics
well what a lot of collaborations do in physics is that they they decide on a
method of analysis before they analyze the data yes and then they stick with us
so what are what are people doing in psychology now to try to address these
things so one of the things I really like is the development of something
called registered reports as a publication model and I've tried this on
several occasions and it does actually fix all four of those problems I
mentioned because what you do is you try to publish your introduction and methods
and your methods have to be very highly specified ideally with a script saying
how you're going to analyze the data and maybe with some simulated data and that
is what is evaluated by reviewers and on they may suggest changes just as
normally this would all happen after you'd collected the data this is
and prior to data collection but they decide is this an interesting question
is this a good way of tackling it is the study adequately powered and if you can
persuade the reviewers you can then get an in-principle acceptance from the
journal that who will publish your paper if you do what you then said you were
going to do and this puts all the time lag and everything early on in the
process and that's why people don't like it because you're waiting for a review
of comments before you've even got the data but it does mean that reviewers can
be much more constructive but it also completely and the person who's really
developed this chris chambers in the university of cardiff and he says you
know the one thing that shouldn't affect a decision to publish it really is the
results because that's not you know contingent on the quality of the methods
and things so it should be more the quality of methods and sensibleness of
the question and you basically break that link because you don't know the
results at that point so you get this in principle acceptance and then provided
you do what you said you wanted to do you will get published so it slows you
up at the start but by the time you've finished gathering data then the process
is usually very rapid so that is becoming increasingly popular and
initially very few journals would offer it as an option but increasingly it's
becoming more and more standard as people are realizing that it actually
gets rid of a lot of the problems and does help us do better science so the
Germans are offering this as an alternative yet if that message of
reviews yeah yes and but you haven't had editors who know how it works and who
are enthusiastic about it and it's sort of I think it's slowly beginning to
break through because it only came in about sort of six or seven years ago I
think that it was a new thing and now I know if initially there was just one
journal that was trying it out and now I think there's about two or three hundred
offering it not just psychology but other disciplines too and in fact plus
one now it's just an agree to offer it so that should be a big change but it's
mostly in the life sciences right I think so yes yeah just my sociology may
be something yes and economics may be as well
those areas of course you you're talking it gets complicated if you're talking
about analysis of existing data sets you you then have less sort of control over
whether somebody's really already had it so so you said earlier that these
complaints from the statisticians about the p-value hacking and the small sample
size have been around since the 70s yeah so it was not like this was or even
earlier given that this was so well known how do you think it could get so
far that it would take so long for psychologists to notice yeah I mean I
think there's there's two things one is that people genuinely didn't understand
how serious it was and it's like you say I mean people see this p-value less
important they think this must be meaning something and I mean I try and
illustrate it with you know thinking of if you had somebody who was a magician
and said I can do you were you know a particular hand of cars you know maybe
you know I don't play poker but something that's quite unusual and you
come along and he deals it and you ah you know he's an amazing magician but if
you know that prior to you he's tried this on you know 100 other people who
didn't get this and there's only one in a hundred where it sort of actually
worked you have a completely different attitude towards that result so you have
to understand that you know you're talking about probabilities they have to
be interpreted in the context of all the tests that you've done they're not a
measure of an effect size in the same way that people think of it so I think
this is that misunderstanding but this change I think also the very interesting
change has been social media because social media has people a voice to
people that previously didn't have one which is mainly junior early career
scientists who may have been encouraged to do this all sort of really suffered
as a result of finding they can't replicate something that was published
and looked as if it was solid and they are actually you know actually getting
quite militants' about making science better and in the past the only way if
you found something in a journal that you didn't think was well done or didn't
agree with would you a letter to the editor who may decide to publish it
several months later whereas now people can zip up on Twitter
and say hmm so what do you mean was there getting militant about it they
aggressively draw attention to it yeah and they're concerned to try and bring
about all sorts of changes so we with a couple of colleagues I've been running a
course on advance reproducible methods for three years now four years I think
this was the fourth we've just start with early career researchers and they
go off you know really fired up and in Oxford we start we had a very good group
of early career researchers who started various initiatives so they started
having a journal club called reproducibility ta at the end where we
drink tea and talk about but we also have had other events and this has
culminated in putting in a bid to the university to support someone to sort of
really coordinate these activities and my colleague in anthropology Lera
Fortunato has headed up this bid to get funding across the university so we're
trying to bring in all disciplines to just improve the credibility of
scholarship even in humanities you know where you're gone you might be talking
about electronic collections of items in museums and so on and sort of just
making sure that things are open and properly documented and if you're doing
science your scripts are available but the other thing that we have to tackle
is the incentive structure so we have to make sure that the university when
they're hiring and firing when they're promoting people that they have an
interest in issues like you know how credible your science is rather than
just it in a flashy journal but they are on board on that and in part they've
also heavily influenced by the funders who are very motivated to bring about
change so that original meeting I was at at
Academy of Medical Sciences that was supported by the Wellcome Trust and two
of our Research Council's MRC and BBSRC and of course
they don't want to spend their funds on research that isn't credible so they are
highly motivated to fix the problems once people became really aware of the
problems the motivation from the funders is there and that will translate into
new requirements for people submitting grant proposals which mean that you know
whether they want to or not I mean I think this university does also want to
do very high quality research of course and so again once people become really
aware of just how endemic the problems are
I think this we're on the cusp now I've really seen quite a lot of change and
the way people do things it's gonna be quite different I find a super
fascinating because I always have this impression that in their Kadima nothing
ever moves nothing ever changes nothing I have to say University vogue this
tends to be horribly true because we are an old institution where everything
everybody's very very careful and many people have to be involved in all
decisions but it is nice that the you know at heart there's a lot of people in
the university in different disciplines who are all really very keen to bring
about these changes and that's that's how strengths really because we keep
finding new people I mean somebody in economics or somebody in politics or
somebody in computational biology and they're all interested in this same goal
and they will have different ways of solving the problems so I think we we're
the only place I think internationally who are trying to do something at the
level of all disciplines converging but we've only just started we only could we
really got going in January we launched in January but which is only last month
so we're already feeling very very positive so you said you have kind of
you have the university administration and also the funding agencies behind you
which is is a good start and the community of course yes how important do
you think it is that the public is aware of these problems and of your efforts to
do something about it yeah that's a very good question which was raised at this
initial meeting we were at because there were some people who are saying well you
know we mustn't really say too much about the problems because the public
will lose confidence but the there were people there who I were agreed with who
science journalist who said this is appalling you know
you need to talk about the problems otherwise people will really lose
confidence the there is a difficulty which is it can be weaponized so already
we've seen this happening in the US where people who have a particular
agenda that they don't want the Trump government to run government doesn't
want to for example obey regulations about environmental protections are
starting to say well you can you we don't have to take any notice of any
regulations unless the data is open and given that you know data on things like
asbestos was gathered years ago before there was a chance for open data they
can their top four decide that they don't want to take any notice of it or
they can just say well all the stuff on climate change of course you know if
science is not very reliable there's different points of view so we that's
that's the hardest thing really is trying to on the one hand be open and
honest about what we're doing while on the other hand ensuring that this
doesn't therefore act as a hostage to fortune and allow people to just
weaponize what you know that we're doing and the best way though is to just make
sure the science is really really good I think and you know the better we the
more self-correcting the faster we self-correct and deal with problems I
think the less easy it will be for people to just try to deny bits of
science that they don't like do you also have people or do you know example of
people who are actually commenting on this a habit of piworld you're hacking
or something you know to argue that one should not take this or that science
seriously I don't think be hacking particularly but I think I mean even I
as a reviewer I have to say I mean I I've got quite skeptical about quite a
lot of things that come through if the data isn't open and particularly I'll
always more and more thing I just wish people would even if they don't go down
the route of doing a full registered report if they would at least
pre-register their plan and their hypothesis so rather recently I actually
had to referee a paper which I was very concerned probably was affected by pee
hacking and you know the author's just say
because they've pulled out one little result here and me what this reply and
just say well you know everybody does it and they're right I mean everybody does
but you know this is not an adequate excuse for doing it so it's a problem
yeah I actually think that kind of argument greatly contributes to the
problem yeah because it's why people don't realize there's something wrong with it
they say that's what we have learned yeah that's what everybody does must be okay and in
psychology there's even a very classic example of a sort of guidebook to how to
be a good scientist which explicitly recommends this both be hacking and
hacking yes it says don't feel you have to test your hypothesis what you first
first do is you look at your data and you look at them everywhere you can and
find what's interesting and then construct the paper around them oh well
that's painful yeah yeah yeah but when it comes to the involvement of the
public I have to say that I think it's probably worse to try to sweep the
problems under the rug instead of being open about it and yeah yeah we have a problem
but we're working on it yeah yeah yeah I agree but I think we do have to be
rather shamefaced about some of the things I mean it does mean confessing to
quite a lot of phenomena that certainly in psychology there have been some very
what we thought were very robust and and you know things that made it into
textbooks but now we're beginning to realise probably don't stand up or or a
much weakened at least than we thought they were well we will see what holds up
I guess yeah well thank you so much I think this is a good place to wrap this
up thanks everybody yeah thank you thanks everybody for
watching and see you next week yeah