Skip to Main Content
Skip Nav Destination

Editor's Note:

In this special section, we publish the proceedings of the plenary session of the entrepreneurship division of Academy of Management at the 2024 Conference in Chicago (Illinois, USA). The plenary brought together a distinguished panel of scholars to converse about the importance of replication in management and entrepreneurship research and respond to audience comments and questions. Many senior scholars, including journal editors, were part of the audience.

Vishal K. Gupta and Christina Theodoraki, then Program Chair and Professional Development Workshop (PDW) Chair respectively for the Entrepreneurship (ENT) Division, moderated the plenary discussion with the following panelists:

Per Davidsson, Recipient of 2023 Global Award for Entrepreneurship Research, and Talbot Family Foundation Chair in Entrepreneurship at the Australian Centre for Entrepreneurship Research (ACE) in the QUT Business School (Management) in QUT Business School (Australia)

Constance (Connie) Helfat, J. Brian Quinn Professor in Technology and Strategy, Tuck School of Business, Dartmouth College (USA)

Herman Aguinis, Avram Tucker Distinguished Scholar and Professor of Management at George Washington University School of Business (USA)

Jeffrey McMullen, David H. Jacobs Chair in Strategic Entrepreneurship and Professor of Entrepreneurship at the Kelley School of Business at Indiana University (USA)

The following report of the plenary session was prepared by Athina Skiadopoulou and Majid Rahimi, PhD students at Manderson Graduate School of Business at The University of Alabama, with support from Vishal Gupta. The authors of this report are grateful to JSBED Editor-in-Chief and Entrepreneurship Division Historian Patrick J. Murphy for feedback and support.

The video recording of the plenary session is available at https://www.youtube.com/watch?v=5AY4DcsqcAY.

12 August 2024, Academy of Management Conference, Chicago, Illinois

Vishal K Gupta, 2024 Program Chair of the ENT Division, opened the plenary session by welcoming the audience and introducing the panelists. He started by emphasizing the practical importance and interest in the topic of replication.

VKG: Before we start, let me emphasize the practical importance and interest in this topic. All major media in the U.S. and overseas are paying attention to the crisis of research credibility in our scientific enterprise. Here are some examples from major newspapers in the U.S.:

  1. The New Yorker: “They studied dishonesty – was their work a lie?”

  2. Wall Street Journal: “Flood of fake science forces multiple journal closures.”

  3. New York Times: “Stapel's audacious academic fraud.”

  4. Washington Post: “Stanford president will resign after questions about research.”

Concerns about the research credibility of the scientific enterprise are expanding, growing and becoming more and more vocal. Having said that, I will now invite each of our panelists to introduce themselves and share some opening remarks focused on their position on this topic. We will start with Dr Per Davidsson.

PD: I am very passionate about meaningful cumulative knowledge. We are not in this game to produce papers just to enhance our careers. We are part of an extremely important societal institution – research – which needs to sustain its credibility.

Personally, if I had a medical condition, I would not seek advice from an individual empirical study. I would want to see results repeated before I trust anything. But I see replication as one element in a package that leads to meaningful cumulative research. We also need theory with shared, well-defined concepts – not a lot of different concepts for the same thing, and not the same concept for a lot of different things. We need shared, validated operationalizations. It's a bit pointless to do an exact replication of a study using measures we don't trust in the first place, right? We need reviews and meta-analyses that are not just replications but approach a question in different ways. If all this evidence tends in a certain direction, we can be more certain of it. But we also need attempted exact replications so that we know more of the contextual boundary conditions of results. That way, we know that influential studies are actually right. And especially with the increasing problems with academic dishonesty, we need to make sure that this was not the underpinning of the original results.

CH: When Vishal asked me to do this, I said, “Well, my views have kind of evolved. Are you sure you want me to do this?” And he said yes. So let me give you the quick evolution. When I was first starting out, I'm a strategic management scholar and I went to the New Faculty Consortium. Carolyn Woo, who is one of the leaders of the field, got up and said there are five kinds of papers you can write to get published. One of them was to write a replication study. I always remembered that. And as years went on, I thought, “You know, I'm not seeing any replication studies.” Very few. I would say, parenthetically, that Research Policy has published replication studies all along. It started to bother me more and more. When I was – actually not yet a co-editor of SMJ – I was talking to Bettis et al. (2016). Turns out, he was worried about this. So we decided to do a special issue of SMJ.

We roped in Myles Shaver. It was published in 2016. Our aim was not to do exact replications. The aim was not to find fault. The aim was to advance scholarship on particular areas where we need more than one study. We said we want to do what we called quasi-replications. I believe Andreas Schwab and some of you have been calling these “constructive replications” – which are about seeing if an initial result, which has been reasonably well cited, actually holds up in different time periods, contexts, technologies, industries and so on. We also made the point that one replication does not in itself invalidate another study or validate a particular point of view. You do need cumulative knowledge. We did publish this issue, and I am very proud of it. The entire special issue – something like 11 papers – are quasi- or constructive replications. I love Chris Crawford's study with his co-authors. But I don't think you need to do exact replications first in order to do a constructive replication. You can take the original paper and, say, in a new setting with different data, try to use similar variables and methods – then improve on them.

So, why have my views evolved? My views have evolved because sometimes what I am seeing in other fields is a real emphasis on exact replications, and it's never clear to me why particular authors get targeted for exact replications out of the whole universe of things you could replicate. That gives me a bit of pause because there's a fine line between conducting an exact replication because you think a result looks odd – it doesn't pass the smell test – and conducting an exact replication because you do not like a result, or it conflicts with your belief system. That's not a way to advance science. I've always thought the best way to argue with a result you don't like is to go do your own research to test what you think is a better way of doing it. That's different from a replication.

But when it comes to replications, I am actually a big proponent of the quasi- or constructive replications. Because I think that really helps us build the body of knowledge.

HA: There are many issues that we need to think about to put replication or replicability in context. Per mentioned the issue of credibility. Connie mentioned the issue of trustworthiness and repeated studies. So, this is an issue that all of us are dealing with at this point.

It is an issue very critical for the long-term sustainability of our business schools and our industry. Because if our research is not credible, rigorous, useful and impactful, then no one is going to give us money to do it. Because it is our problem, I'd like us to work through it together. So, I'd like to do a little thought experiment.

Okay, I want you to think about the latest article you just … Read – maybe this morning, maybe last night – a published article, maybe on the flight. Not any abstract idea – think about one article you just read. An article in a domain you're fairly familiar with. Got it?

I'm going to ask you some specific questions for this article you read. So, try to remember the article.

Question 1: Are you fully convinced that the authors reported – openly and transparently – all the data collection and data cleaning procedures. For example, how they managed outliers – if they included or excluded outliers? If you see the final model with the set of control variables was the initial and final set, or whether there were several sets of controls that the authors included and excluded until eventually, that final set was published and that information may not be included in the paper. So – tell me: are you fully convinced that everything in terms of missing data, imputation techniques, any kind of data transformations – if they transformed data to normalize it – if all of those data cleaning procedures were openly and transparently described in the paper?

Question 2: Raise your hand if you're fully convinced that the authors reported all the analysis. For example, the final model you see – is that the initial model? Did they put variables in and out? Did they try different econometric techniques or models – non-parametric, parametric? Were all of those analyses that were done before reporting openly and transparently reported in the paper? Raise your hand if you believe that's true.

Question 3: Are you convinced the authors fully and transparently reported all the results, including the results for hypotheses that were not supported? Quick side note: I'm not blaming the authors. Sometimes a reviewer asks you to take out information on Hypothesis 3 because the paper is very long and that one wasn't supported. So – take it out. Raise your hand if you believe all the results were openly and transparently reported in that published paper.

Question 4: And the final question: Do you think, if you tried to do the same study – assuming that you even get together with the authors of the original study, and they tell you exactly what they did – you would get the same p-values, the same effect sizes, the same substantive conclusions? Raise your hand if you think you would be able to do that.

So, I'm going to close this because I want to hear from the other panellists.

But given this thought experiment. Is our research sufficiently credible? Is our research sufficiently impactful in terms of being useful? Are journal reviewers sufficiently equipped with the methodological skills and knowledge to evaluate the papers they receive? How about the journal editors and associate editors – who, in many cases, were trained 20 or 30 years ago, when many of the methods we use today weren't even known? And how are we training the doctoral students, who are the future scholars, so that their research will be sufficiently rigorous, credible and impactful?

I want you to think about these questions, because I think many of them are part of the conversation we need to seriously engage in.

JM: When Vishal first brought this up, I kind of said, “No, I don't want to do this.” I thought it would be boring. I didn't think anyone would show up. I was wrong – so, well done, Vishal.

Why did I say that, though? It wasn't flippant. I just thought – when I deal with replication studies, we're always open to them. There's no policy for or against them. In fact, I kind of resist such policies, because I want to approach this subjectively. I want to evaluate each case: Does it make sense? Is there potential value here or not? But when I look at replication submissions, my biggest reservation, my biggest fear, is that they're not written in the name of truth. As much as I'd love to see a replication written the way Herman just described – what I often see is exactly what Connie described: a hit job. It comes in, and it makes no sense why that paper – often from long ago, not current – is being revisited. It almost always involves a top name, and it feels vendetta-driven. If you look at article after article – and anybody who's edited knows this is true – over time, you start to get a sense of what's about truth versus what's about power, politics and influence. Over time, you read enough reviews, enough papers – you get a sense of it. What's the motivation behind this?

When you get that little feeling of “something's off here,” I think it's our due diligence and our obligation as editors to look more into this and ask this question: What's the reasoning behind this? So that's the first fear. That wouldn't stop me from sending the paper out or pursuing – or looking into this issue more closely or having a replication study published. That's not the point. But my first orientation would be: Why now? Why this paper? What do we learn as a result of it? So the biggest challenge we usually see is: What is the gain? If this is wrong, if they find that this paper needs to be overturned – if it's about method, that there's a methodological error – that's a whole different discussion that we're having right here.

Basically, there's an implication here that something was done wrong on purpose. Some of the stuff that came up just now has an intentionality to it. That's a different situation than: Do the results hold under slightly different conditions? Do the results hold now, 20 years later, after a study?

That would be a study done in the name of truth, right? Like – does science from 20 years ago still hold today, under conditions now, or in this alternative context? That doesn't feel like replication to me. That would be, in my opinion, a new study. You're building on previous research. You're contributing to the knowledge base. You have a contribution to make.

Often, when I get a paper positioned as a replication study, it's a direct replication most of the time, and there's some kind of implication that somebody did something wrong and this is a moment to reveal it. So, is this about policing? Which – if it is – is this the best alternative for policing? Is this the best mechanism to capture things after the fact? Would we like policing to be more on the prevention side than on the “after the damage has been done”? Some of science is based and built with a conservative bias for a reason. The reason the bar's set so high to say “this is knowledge” is so that the next study has to test and refute that. And you don't want to just put out in the first study in a particular area something that we don't believe is rigorously conducted.

If we do that, we create this situation for this desperate need for replication. So I wrestle with what – I take my position as: I'm open to it. I think it can be done well. I think you can have contributions.

If you have a direct replication, you have to make a very compelling case for why this is necessary, why this is needed. If that's made, that's on the authors to do that. They're going to have to do that because the reviewers are going to demand it as well. And you're going to get feedback: “A contribution hasn't been made.” You're going to run into all this anyway.

So that's kind of the orientation. But that's the same orientation we use for any study. So I don't quite see what the difference would be for a replication study versus a new study. That would be how I would approach any paper. I would be like: What's the motivation here? How is it written? Is it conducted correctly? Is it transparently communicated what the procedures were? So I have trouble figuring out what's categorically different about a replication study – unless it's a direct replication. And then we're talking about a very specific case.

I think that's a reasonable debate to have: Should we try to publish those? But then we get into some serious philosophy of science issues, which I'm happy to talk about as well. But that's a different case in point. I don't want to just say replication means that we're just seeing if science holds. That would be science. Replication seems to be a specific case of that for me.

So – enough for now. I know we're going to keep talking, so thank you.

VKG: Thank you all for these opening remarks, which reflect different views about replication. To make our conversation more structured and interactive, we'll ask our panel a set of questions, and we'll also open the floor to questions from the audience in just a couple of minutes. But let me start with the first question:

What does the word “replication” mean to you? Does it mean that the methods described in the paper should be replicable, or that the results presented in the paper are replicable? Or does it involve both? Or is it something else?

PD: There is some terminological confusion about replicability and reproducibility and so forth. I sympathize with Jeff's concerns about hit jobs, and so forth – when it comes to re-examining a particular paper and the apparent intent is to say that it doesn't replicate. But I – I don't – this is not my majority view of what replication is about. And also, positive replications need to appear as well. Because one of the first things I learned – learned very early – I'm drifting from the question a little bit. We had a seven-country collaboration, examining largely the same thing, publishing in a special issue. Each published their own results within their own theoretical framework and analysis method.

That gives you a very confusing image of the important relationships. We were also forced to have a harmonized analysis – maximize the similarity in the analysis – and we came up with three strong factors about the regional drivers of high startup rates. Very strong evidence – because we forced that harmonization. We need both – the freedom to do it the way we like, and the collective harmonized view. Another early experience: I had a study where I had six separate samples of publishable size – at least then, publishable size from 150 to 200 cases in each. I ran the same regression in those six samples, with sort of six core explanatory variables. You know, when you have statistically significant effects, of a standard effect of 0.10–0.20, they could be statistically significant – they're all over the shop if you compare sample to sample. We grossly underestimate the random element of how those results come out. And for that reason, we need to replicate as well. There may be no ill intent or cherry-picking on the part of the original study – but we need to have very strong evidence. Much stronger than a single study can provide.

VKG: Dr Helfat, as you answer this question, can you also touch upon the distinction you made between replication and quasi-replication?

CH: I actually think these quasi or constructive replications are replications, because they are efforts to use similar data, using similar methods, to a particular paper. So, as Per was saying, you can actually get close to comparing apples and apples. You know – and trying to figure out a particular relationship between particular constructs or sets of variables. But they're not exact replications, because you're not using the exact same data with the exact same methods. So, I actually think those are valuable because they help us build knowledge. I've had one of my studies replicated. I had no idea about it until it showed up in the journal. I was so flattered. Well, then I read it – it was quite interesting. They used data from the same country, from the same survey – that was a national survey in Finland – but in a later time period, and using basically the same types of variables, because they came out of the same survey. All right, but a later period. And they found the core result held, but one of our sort of results about the slope – they found something different. Well, we know perfectly well that we had a particular time period, and the curvature might change. I thought, well, that's interesting. We should learn more about this, right? So, I think that's valuable. So, what does replication mean to me? I mean – I'm pushing quasi/constructive replications. But you are talking similar data, starting with similar methods, right? And maybe either a different context or different time period, so you can tell: If a result changes, what's different?

And then you could say, okay, I'm going to change the variables a little bit. I think beta is a better way to measure them. You can see, if you do that, what happens. Or you can change the methods – methods have improved in the past 20 years, right? What if we do that? You do it step by step. You can see what's changed. So it's being specific about the context, the data, the methods and why the things change.

HA: I'm going to amplify and also talk about the same issues. The nature of the game in our industry is making a contribution. Unless we make a contribution, the paper is not published. And typically, a contribution to theory. That means we now understand, predict and explain a phenomenon better. Theory sounds very fancy and complicated, but the bottom line is – whatever you study – after your studies are done, can you understand, explain and predict that phenomenon better?

Related to that point of contribution – in a paper we wrote in the Journal of Management Scientific Reports, which is an offshoot of Journal of Management – we described three kinds of replication studies. One is called reproducibility studies, which is essentially: use the same data as the original authors, run the same analysis and see if you can get the same results. That's what Crawford et al. (2022) wanted to do, and they found that in only one-third of the studies they could find the same thing. But the reason is that the authors had not reported enough information for them to run the same analysis. For example, if you have a model with control variables, but in your correlation matrix, the controls are not in the matrix – you cannot run the same analysis. So, the fact that you cannot reproduce doesn't mean that there was something wrong. So those are the kinds of studies that go for right-or-wrong, kind of a hit job. There's a lot of politics in that.

So:

  1. Reproducibility – same data as the original study.

  2. Literal and constructive replication studies. Two kinds of replication studies. Literal means that it’s new data, but the same measures, and hopefully, you draw from the same population.

  3. Constructive is different design, different measures and possibly different samples from different populations.

So:

Reproducibility = same data, Literal = different data, same design/measures, Constructive = different data, different design and measures and the third – Per gave a couple of examples – is a generalizability study. You do the same study across different settings, different populations and different industries. As you think about the design of each of these studies, you're thinking about: What contribution am I going to make when the study is done? The key issue is: when you find differences or similarities, what is the contribution to theory?

So:

Why is it that across several cultural contexts, the effects are similar? Why didn't culture have an impact? Why didn't industry have an impact? Or – if I measure the DV as, let's say, firm performance – if I use an accounting measure vs. a market measure of performance – why does that have an impact on the results, either getting the same results or different results?

Andreas Schwab is the lead author on another paper in the Journal of Management Scientific Reports titled “How Replication Studies Can Improve Doctoral Student Education” – because we want our doctoral students to publish and get jobs. Do you have doctoral students here? Raise your hand. Do you want to publish and get jobs? All right! So, you need to think:

If you want to do a replication study (and I'll give you examples of journals that have published them), how can you design the study and execute the study, like Jeff said, so that the replication study makes a contribution to theory and is publishable?

So I'll talk some more about that. But for now, I just want you to think about different kinds of studies, different kinds of methods, that lead to different goals and different kinds of contributions.

JM: You know, it's interesting – I don't really think that if you're refuting or you're challenging, it's as difficult to replicate or get a replication published as if you are confirming, right?

So if you're – if you're – if you refute this or you can refine or challenge existing research, there's – nobody's going to say, “Do you have a contribution?” So that – that part – there's definitely a bias here for that kind of replication already. That would be a replication that's not actually replicating. It's showing that the research wasn't reliable, or isn't reliable anymore, or there's some condition or fact that we have to deal with. If people did those replication studies and they just absolutely confirmed, and then none of that was published, well, we have a serious issue there.

Right? So that's really the debate and discussion – as a journal editor. Why? Because nobody's going to cite that. That's the biggest challenge that you run with. Please come back to that.

I throw this out there on purpose because this is a thought process that runs through your head when you are editing.

You know:

Is anybody going to use this?

Is anybody going to cite this?

Is anybody going to build on this?

I think they could – if you do confirmation studies, and then build out what that teaches us and the robustness of knowledge in a space.

This is how things change. You still have to show how the field has changed as a result of your research. It can't be: I did my research; it shows the field was right. And that's the presumption that we work with in science: That the theory holds until you show there are conditions under which it doesn't hold.

And so that's what makes it very difficult to publish the straight-up confirmatory replication study. I did have one instance – a few instances – but one in particular where somebody sent in – it was not a replication study. It was more of a challenge, because I think they were challenging a paper, and it was presented as a replication – a proposal for a replication.

They'd gone through the analysis of a paper – an older paper. And I solicited a really great research methods expert that I know. Basically, I was like: “Can you please go through this and tell me what you think?”

Because his methods expertise is – I think everybody would agree in the room – this was a fantastic person. And so, I had him go through everything. And I was like: “I would really like your objective analysis.” I felt like there might be some partiality toward the authors – I wanted to make sure I did not have any conflict there. And he comes back – he goes: “Yeah, okay, I know what this is about.”

And he goes through the whole thing – shows me the analysis – we walk through everything.

And I'm like: “What's your general assessment here?” His take was: In this case, the methods have evolved, and these people are applying today's standards to a paper from years ago.

So in this case – it was – it was written – (By the way, this wasn't me reading something into it – this is accusatory written) – basically, we're questioning the analysis and whether this was done with full transparency in mind, or was there some kind of manipulation, all that. So, I had to take it very seriously, obviously.

And the assessment in the end was: No – the results hold. With new methods and new standards, we would reveal this, but if you go back 10 years, this would have been state-of-the-art. So one problem that we have in this is: The state of the art is improving. I don't want a replication standard using methods from 10 years ago, because arguably, methods are getting better, or else we wouldn't have evolution of science in the field. And so that's one of the other challenges in this space that I think is very real on a pragmatic level. I'm not talking theoretically or hypothetically – I'm talking about actual, practical publication of these types of research.

PD: One of the myths of replication is that replication studies are not read and cited. And we don't know that, because there are so few of them to begin with. There has never been published in JBV an article with “replication” or “replicate” in the title – ever. There are five in ET&P, all from the last few years. There is one in SEJ this year. So, we're moving in the right direction. And there are twelve in JBVI, all from 2015. Of course, they couldn't be much earlier than that because the journal is not much earlier than that. I can give you examples. One of mine: Dahlqvist et al. (2000), “Initial conditions as predictors of new venture performance: A replication and extension of the Cooper et al. study. It has over 400 citations. It was in a short-lived and somewhat obscure journal, because there was a merger, but also because replication studies are not so easily accepted in other journals.” Frank et al. (2010), “Entrepreneurial Orientation and Business Performance: A Replication Study.” 484 citations. This is in Small Business Review, which actually publishes a lot of JBV-style research, but it's not well known in entrepreneurship. It is the sixth most cited ever in that journal. I have a lot of other examples here that are cited at the median or above for that journal, for that year. And the reason for this, of course, is that replications try to replicate influential research – not studies that have gone undetected over time. So, I don't accept this argument that they're not read nor cited.

JM: What you just reported were facts. I talked about – what I was talking about was perception. They're not the same thing. There is a very big difference between beliefs and reality. And there is a divergence here between them. And my point was that – when the authors submit this, it is on the authors' shoulders to make a case for why this paper is compelling and necessary, because reviewers will come back with: “What's the contribution here?”

There is an editorial bias – without a doubt – that these papers won't be cited as much. So is this education? I totally – I don't disagree with you. What I'm responding – what I'm responding to here is – this isn't JBV-specific. My God. This is field-specific. This is basically social sciences-specific, right?

So – it's a matter of – every time we go to these things, one reason I did not want to do this is because it becomes about the editors, as if we are paid officials with big budgets, and rolling out policies. But it's all volunteer labor. You people in the room are the reviewers. You're also the authors. We're both.

The system is us.

We always talk from the author's position, as if there are reviewers and editors out there that are somehow different – that that's a separate community and we're the authors. We're the victims. But we're not. I have to first communicate as an author, and then I have to turn around as an editor, and I have to say: Did they make a compelling case? Are the reviewers – were they just exhibiting a bias? That's a lot to keep asking of editors – to be policemen of the world. It's something we need to do – it's the job, I understand that. But it also starts – our system is very much based on self-policing and doing the right thing. And nobody wants to ever talk about that part. They always want to talk about the systems of how we can police better. That's the least efficient model. The most efficient model is: How do you do science correctly, transparently and openly up front, so we're not in these positions?

Let me tell you people – it's not fun when you're about to ruin someone's career and you're delivering that news to them. Once – I was an – and really briefly here – I was an auditor in accounting before this, when I was young. And I had to bust somebody stealing. The guy was probably my age now. You know, man – everybody thinks: Oh, that'd be so cool.

No. Is it really cool to see a 53-year-old man crying in front of you?

There's nothing great about that. We don't want to be in that position. How can we design the system to prevent these kinds of things? I 100% agree with you, and I think if publishing and policing is the way we can do it – great. But that's got to be the first step. It cannot be the fun step. Replication cannot be: “Okay, now we've solved the problem.” There's something more systemically problematic here that we've got to talk about. Not just today, but I want to open that conversation because that is a conversation no one wants to have – but that's the one that we need to have.

HA: I love this – this debate, or different perspectives. Per mentioned a bunch of papers that were published – and not only published, but also highly cited. And let me tell you the four reasons why you can replicate – or tell me if I'm wrong or right. In all of the papers – all of the papers that you mentioned – and there are many others published, not that many – but:

First of all, you picked a very good precedent study. You said the original study was very influential, dealing with an important issue and also – I would have to add – fairly transparent and open. So you can actually see exactly what the authors did. So, picking the right precedent study is absolutely key. That's number one. Second issue – that you mentioned, and Jeff mentioned – this is not a right versus wrong goal. You're not trying to do a hit job – as, I think, Jeff, you mentioned that term.

So:

  1. Pick a good precedent study.

  2. This is not about right vs. wrong.

  3. Critically, critically – the replication study makes a contribution.

For example, I talked about constructive replication, which is when you use different kinds of measures. If you're using – let's, say, I gave the example of a market-based measure vs. an accounting-based measure – you have a theory hypothesis for why the results should be the same or different. So you're learning something about the phenomenon through the use of different measures. It's not a hit job – you're changing the design features, so you make a contribution to theory – you understand the why better. Or if you add a moderator, which would be a cultural difference – it's not that “Oh yeah, in China it's different from the U.S.” I need to know why.

I need to know why. Because that's when you make the contribution. So the third point is: The replication has to make a contribution to our theory, to our knowledge. Number four – this was alluded to – a single paper is never the final arbiter of knowledge.

I'm not familiar with all of the studies that you cited,

but I suspect that those studies were not just a one-shot, one-off, but they were part of a program of research. So, you're building a stream of research, and the replication is part of that,

and then you follow up and you continue working on it.

So it's not the final arbiter, but it is part of that stream of work that continues building knowledge.

So let me just summarize: “The secret sauce to publishing a replication”

  1. Pick a good precedent study

  2. Don't do a hit job – it's not right vs. wrong

  3. Make a contribution to theory

  4. Recognize that no single paper is the final word – build a program of research

AU: I'm a little disturbed to listen, because we've tried many times to publish replications, and maybe we'd be accused of being on one end or the other, but fundamentally – in the open science model, we keep testing research until we identify that it works. Per's point – that he wouldn't trust a medical decision based on the kind of research we do – is a very valid one. And the reason why you wouldn't want to have an operation based on the kind of research we do is because when we publish our work, we don't make the data available – like they do in open science, like they do in some economics or psychology journals – So that other people can make sure you didn't make a human error. Maybe you just made a human error – wasn't intentional. But we should know.

“Well, you know what, that was 10 years ago – who cares,” someone makes a human error – as we all do – maybe we ought to know that, because if I want to have that operation, I want to be sure that the research is actually really valid. And I'm going to want to be sure that the data is accurately analyzed, and that there have been subsequent research demonstrating the same thing. We have to stop lying to ourselves. We are pretending – we lie to ourselves. We pretend, when we publish a paper, that it’s scientifically accurate. We pretend we don't need to go back and check it again because it's already in, it's been validated by three reviewers and an editor. And we pretend as though we can keep going forward and say, “Well, the methods change, and the data changes.” Well, the reality is – we're using methods now that nobody can really understand, because the complexity of them requires computer algorithms that we can't even look at on a regular mathematical spreadsheet. We're assuming that everything works. And yeah, sure – 10 years from now, we'll probably find out it didn't. But at least, let's make the data available.

And I think every journal has an obligation to have at least one space where they publish a replication – in good honor, in good honesty, good integrity. And it doesn't matter whether the replication supports or contradicts the work – it will advance our science. And maybe, when we have 10 or 20 replications that show us that actually what someone demonstrated works, then maybe per would have more confidence in saying: “I might look at this work and go under the knife with the data that we have.”

CH: All right, this is great to hear the enthusiasm. So, I wanted to address your point about compelling people to disclose their data. Because we had a long discussion about this at SMJ. Now, I'm not sure how similar entrepreneurship is, but at least in the field of strategy and management, People go to a lot of trouble to collect data that is really unique. Right? This is not like just, you know, some dataset that everybody uses. Okay, if you think about the incentive effects of compelling people to release their data immediately, when they've probably collected that data with the eye that, “We could do several studies out of this” – you will no longer have all these cool, unique data. Then we thought – well, maybe we could embargo – you know, say you have three or four years to do this, and then you have to release your data. And then we thought – we don't want to be policemen, right?

So what we tried to do was encourage people. What I'm – I'm not saying there's a right or wrong answer to this, but there's a trade-off, depending on the kind of research.

HA: Are you familiar with the phrase “the research – practice gap”? Anyone? What do you mean when we say that? What do we mean by that? Anyone?

AU: The research is saying something, and the practice is doing something different.

HA: Essentially, the research – practice gap means that the research that we're creating is not being used by practitioners. Okay – how are we doing at home? Zero. How are we using our own theories, our own research, to manage our shop?

Okay – we know, based on – one of my research domains is performance management. We know that individual performance – the level of performance – depends mainly on two things: knowledge, skills, abilities and motivation for any domain. So to be a good performer, you need to know your job, and you need to want to do your job well. Let's – instead of performance, let's talk about research performance. You need to do your job well. You need to be trained very well – and we're talking about advanced methods, theoretical knowledge and knowledge of the field.

So you need to know how to be a very good researcher. Number two, you need to want to do really good research – not just get published in the A journals. The first part has to do with training and knowledge. The second part has to do with the motivational systems that are now putting an incredible amount of pressure on faculty to publish in A journals – a smaller and smaller list – by all means necessary. So you get what you pay for. You're not paying for credible research. You're not paying for impactful research. You're paying for A publications. Therefore, you get A publications. I'm not saying that these are non-credible, but we need to think about using our own research to manage our own industry. We know that editors now are dealing with thousands of submissions. I just went to the Journal of Management meeting – they received 1,300 submissions as of today. We're in August. They're going to hit 2,000 by the end of the year. What do we do? Oh – we'll just add more Associate Editors.

The same structure.

  1. We know organizational design.

  2. We know job design.

  3. We don't change those.

  4. We just add more AEs.

So now you get 24 AEs, and you're just crossing your fingers that you'll get lucky with the AEs assignment, because that's 1 of 24 journals, essentially. So we need to think about how we can use our own research to change the way we do things. So, when I served as President of the Academy of Management just a couple of years ago, we started to do what we call mini-experiments. Because when things are going well, you know – they're built good to great – you don't want to change things. So we need to start to think: What if a journal implements replications for a couple of years? Will the impact factor go down? I don't know.

But if we manage the risk right – but let's get some data and do a little mini-experiment. What if – I mean, we received 2,000 submissions. Do you know how many Nature gets? Do you know how many Science gets? Does anybody know? Raise your hand.

20,000?

30,000?

40,000?

Oh no – we are a very difficult and unique decision.

No, we are not.

So we need to rethink and use our own knowledge. We teach our MBAs, by the way, on designing jobs, on managing teams, on designing organizations and on rewarding people. If we use the knowledge we give to our MBAs, I think we'll be a lot better off.

AU: Yeah, I want to riff a little bit on Connie's and Herman's last responses. I mean, if we generalize the discussion slightly to make it about research integrity more generally – so replication is part of that – but also data transparency, asking people to share data and code pre-registration to cut back on p-hacking and so forth …

I'd like to get the panel's thoughts – and, Connie, you already addressed this a little bit – on how these principles apply to other types of papers. So that would include papers with proprietary data, papers with confidential data, like all these papers with Nordic country registry data. And what about qualitative empirical studies? We see a lot more qualitative studies in the entrepreneurship field, and I don't think you can replicate, you know, a Kathy Eisenhardt paper. So my question is: What standards of transparency and replicability do we apply to those kinds of papers, or do we not apply it at all? And if so, are we implicitly or explicitly discouraging researchers from engaging in the kinds of activities that don't meet these new transparency standards?

VKG: I'm hearing from some doc students and junior researchers that it seems higher expectations are being placed on quantitative researchers, while the expectations with replication are not the same for qualitative research.

PD: I – I think part of the issue there is that there is a clash with another growing concern, and that is research ethics and integrity, and all that – protecting those who give the data. And that, of course, is even worse with qualitative research, right? So, those demands that seem so easy – sort of pre-registration and then data sharing – that logic fits limited experimental studies. So, you – you can – you have all your hypotheses, you collect data for a specific purpose, then you're done. You can share your data.

If you collect a huge dataset that can be explored in millions of ways, like we did with the PSED studies and the Australian CAUSE studies – well, eventually we put these in the public domain, right? But of course, there is a concern that those who put in all the work, then publish one study, and then share – it's free for everyone to do whatever they like. We – we can't have that. And we can't – we can't have demands that qualitative researchers should share all the raw data, because there will be so much in it that makes it identifiable. Right? In relation to the broader integrity issue, and the impossibility of reviewers and editors being able to catch everything – the ultimate policing is our internalized values. That: I don't engage in dodgy behavior to further my career. We – we must feel that very strongly internally, and we must instil that in future generations. Because, to solve it with policing is never going to work. The 99% of the policing must be our internal sense of integrity and our professionalism.

JM: I mean, I understand Herman's point, and I think it's a great point about: do we change the structure? Ironically, the one reason I doubled the number of AEs when I took over JBV was for this very reason, though. Because – what is the best way to capture this?

It's attention. So if an editor has so many manuscripts on their desk that they can only be in process mode, then that means they are not engaging with the paper in a way that a human being needs to, to make judgment calls on: Is this good work or not? That includes policing the reviewers. But if you have time to read the reviews – not just vote count – you're reading the paper, you're reading the reviews, you're deeply getting into the paper, and that requires more time. This is a voluntary workforce.

Too often, somebody signs up – even at the Academy, you know, it's a great honor to be an editor for three years, but it is such a heavy load that they just don't have the attention span to be able to engage with each paper the way that they need to, in order to really deeply connect and feel out: Are these reviews sincerely crafted? Are they not crafted by AI? We haven't even talked about that. We haven't gone there. Believe me – if there's something that'll drive me out of being an editor in the not-too-distant future, that's it right there. Because it is – that's a whole new level. AI-generated reviews. Soon, there'll be the papers generated by AI, the reviews generated by AI … I'm looking at how I can turn my position into an AI. That'd be very convenient. But – there is a real risk and a fear of that. One final thing: administrative structure, again.

And nobody talks about this. Elsevier will probably hate me for this. I mean – we are owned by Elsevier. We are not – we are not an Academy journal. JBV – there is not an association in entrepreneurship like there is with SMS or other areas. So – you know, I mean, why is that an issue? Because there's not really a governing body to help with that. And private companies have different interests, clearly, than associations have. And so you run into some issues here. Who – how do I build that administrative structure? I've done it constantly on my own – with volunteer labor, with our editors, with the volunteer force. JBV.org – that was created just by us, with no budget, by the way. That was on our own.

This is how we've done it – over and over. It's been volunteer labor. It's been you reaching out, contributing, because you care about the field. But when we talk about administrative structure – I've got to figure out how to do that on top of everything else we're doing. We had a thousand manuscripts ourselves last year. I looked at each and every one of those coming through. And I'm open – when we get voluntary labor, it's awesome. But that's what it takes to change an administrative structure. And that's – that's a real challenge for us. That's – that's just a fact. We don't – it's not a pay-pay model.

HA: In 2019, one of my former doctoral student (Angelo Solarino) and I wrote a paper in SMJ on transparency and replicability in qualitative research that has been received by I would say very robust response by a number of qualitative colleagues, esteemed friends, and in an editorial in ASQ by Pratt et al. (2020). So, this is an ongoing conversation. Replicability is contentious. Some people in the qualitative sphere say that this is not a goal of qualitative research, but our point was that it has to be transparent because one of the key tenets of qualitative research is thick description – that is a pillar of qualitative research. And if you read the paper and you don't know exactly what the authors did, the theoretical conclusions might be challenged. I think it's in the interest of qualitative researchers to be as transparent as possible. Transparency is really a principle that applies across the board.

CH: To add one thing – I know we have a question – but to Peter's question, I know Peter is a co-editor of SEJ so I got to get him a plug for that journal. In my research, I tend to use a lot of proprietary data – company data or proprietary datasets. It's just because of the kind of research I do. I always try and make the description of my data, of my variable construction – I disclose all the versions of what program I've used, what Python code I use, where you can find it – so that even though I can't disclose my data, if you wanted to try and get close to this, you could know what I did. Right? And I don't know if I succeed in doing this, but I think that is at least a middle ground for data where you can't fully disclose – to really try and show as carefully as possible what you do.

AU: I received a survey from Elsevier in February. It was two parts. The first part asked how I felt about the review process of an Elsevier journal, since I had just recently published there. That was about 25% of the survey. Then came part two, and it was about how I consider AI in reviews. I went through the roof. I was very upset because we're talking about peer review – and that's not peer review anymore. If the publisher is already asking these questions, it means to me that they are way ahead – they're going to launch something pretty soon.

JM: To be clear: right now, it's illegal. If you upload somebody's paper, you're breaking the law. I just want to be clear about that. That's not Elsevier doing this. If you're doing this, you're breaking the law. You're beyond ethics. And you have all the tools to do it. If you're thinking about doing it – don't do it.

PD: Let me share a positive example that also addresses the research-practice gap. Practitioners don't often act on our findings – and frankly, they shouldn't. Single studies aren't strong enough evidence for that. But I've had the privilege of having some of my research replicated. You can get the impression from the replication crisis that almost nothing replicates. I think that's exaggerated. Imagine the first study gets the result exactly right and it's at p = 0.05 – then, by definition, about half of replications wouldn't be significant. So requiring the same significant result is a harsh criterion. And we often don't know which study is “right.”

Anyway, the positive example: In 2009, I published a paper on growth and profitability. The simple question was: how do you end up with both above-average growth and above-average profitability? Do you start by fixing profitability and grow on that, or become profitable because of growth? We found – using Australian and Swedish data – that it’s much more likely you'll succeed if you first fix profitability and then grow. If you grow with weak profitability, you're more likely to end up in the underperforming quadrant. It was a bit controversial.

It was replicated in Finland in biotech and again recently this year. But the major replication came in 2022, published in ET&P. They redid what we did in 28 countries. All supported the results, with similar effect sizes from 2 to 2.5 times more likely to follow the same trajectory. They also used time lags from 1 to 7 years, while we only had 2–3 years. And they used different analytical approaches – results were consistent. So far, they've covered 29 sites. I posted about it on LinkedIn: 35,000 views, 80 comments and 40 reposts – lots of practitioner engagement. Many said they were relieved to see evidence countering the “growth at all costs” mantra. That kind of evidence gives us something to take to practice. And most of the time, we don't have that kind of strong evidence.

VKG: In 2005, the Stanford epidemiologist John Ioannidis suggested that most published findings are false. Some researchers in our field have taken this to mean that most management and entrepreneurship research is false and cannot be replicated. Do you agree with this observation? If true, is this a cause for concern for our field?

HA: I don't agree with Ioannidis at all. First of all, we need to remember what that paper was about. The paper looked at p-values; it didn't look at effect sizes. And the p-value is a horrible way to do science because we are so eager to learn whether there's an effect that we look at the wrong thing. The p-value tells you the probability that the effect does not exist in the population. If that probability is very small, we say, “Okay, it probably does exist.” But we're not saying the effect exists – we're saying it is highly unlikely that it does not exist. That's two steps removed. Back in 2010, we published a paper titled Customer-Centric Science. It laid out a better approach: Step 1 is to look at the p-value. If it's unlikely to be zero, move on. Step 2 is to examine the effect size. Look at correlation coefficients, η2, ω2, d values and log ratios – depending on the design. This tells us how big that relationship or effect is. Step 3 is to understand the importance of the effect for both theory and practice. For theory, we can compare it to what we previously knew. But for practice, we're not as equipped. That takes extra work – maybe a mini qualitative study or focus group – to ask: does this matter to practitioners, managers, policymakers? So, the p value is just one small, early step. It's five or six steps away from the real goal, which is to make our research useful – to change the world, improve society, improve lives in organizations. All Ioannidis did was focus on p-values. And, of course, p-values don't replicate reliably – there's sampling error.

Let me give you a quick story. In grad school – my PhD is in industrial psychology – there was a lab of social psychologists. We know that if you examine the world closely, any kind of things will be statistically related at some decimal place. For example, even the relationship between the weight of this microphone and the amount of water in my glass – at the billionth decimal point, it's not zero. Gravity affects both. Pressure affects both. There is some relationship. So what do you need? A very large sample size. That way, the p-value will eventually become significant. One student told me they kept collecting data – 10 people a day – until the p-value dropped below 0.05. That's where I stop. That's how bad p values are for science.

CH: At SMJ, we actually said: no asterisks. We don't want cutoffs and threshold values. We want authors to report the exact p-value, and then focus on effect sizes. That depends on the size and power of the study and related factors. But we've had a hard time getting other journals to sign on to this. We want you to look at the magnitude of the effects, so you can report your exact p-value, so we have some sense of what this is but tell us about the magnitude effect. This depends on the size the power of the study.

AU: May I add a comment? I'm from CTO Division and work in complex systems research. I want to say that these problems are deep. First, we're dealing with social constructs, whether we like it or not. Second, we must ask if our research is scientific in a positivist or constructivist sense. Then there are challenges with operationalization and measurement. Also, there's the old problem of idiographic versus nomothetic sciences. As someone with a physics background, I've had fierce debates with colleagues who say management research isn't science. But that's not true. Social constructs are just different. To sum up, the problems of replication in management are more complex. We must consider what's underneath. Even in major theoretical works – like Beck's Risk Society or Kuhn's The Structure of Scientific Revolutions – some key terms were never clearly defined. We need to reflect on those foundational issues too.

PD: I think I addressed some of those things in my opening statement.

VKG: Some of you referred to the replication study by Chris Crawford and co-authors published in Entrepreneurship Theory and Practice. Nineteen seminal entrepreneurship studies using PSED data found that less than a third replicated. More generally, the number of studies failing to replicate in management and entrepreneurship research is accumulating. A famous study in entrepreneurship was on business planning and performance, which we have one of the co-authors here. I think this has prompted a call for greater emphasis on replication in entrepreneurship research and doctoral education. What do you think about emphasizing replication in doctoral research?

PD: It needs to be emphasized. There are good arguments, like comparisons with medicine. But when we say only a third replicated – what does that mean? If a replication has a positive effect but it's not statistically significant, yet still practically or theoretically meaningful, I don't see that as a failed replication. We need to focus on effect sizes.

CH: Right. Actually, someone who published in the Strategic Management Journal special issue – Lori Rosenkopf – did a really nice replication. She said doctoral students should have to write a second-year paper. Why not make it a replication? It's a great way to learn how to construct variables and understand what makes for strong research. I'm all for this. But I'm not for “gotcha” replication studies. I'd prefer students collect their own data or expand something so they learn how to build on research. Think of how much we'd learn.

HA: This idea is actually described with evidence in Schwab et al. (2023) in Journal of Management Scientific Reports, titled How Replication Studies Can Improve Doctoral Student Education. The paper outlines how many replications are published, where, how to do them right, and examples of success. Benefits include learning, training and building rigor. Going back to earlier points – performance is about KSAs (knowledge, skills and abilities) and motivation. Even with the best training and values, if we keep the madness of “publish or perish” for tenure, people won't follow those values. The pressure is too much. And those who know how to manipulate the system are even more dangerous.

I've asked ChatGPT to create a complex dataset with a three-level structure and random error. If someone asked to see it, it would look perfectly imperfect. The only way to detect fraud would be asking for receipts or knowing the exact data source. Sharing data is no longer a reliable way to detect fraud – AI can now fake it. So we must work on motivational systems. As full professors, deans, we need to understand: we get what we pay for. If we reward only hits, we get 20-author papers, no consistent research streams. “I put you in mine, you put me in yours.” Publish in A journals by any means necessary. That's how people get tenure.

You move your whole family to another state, get a new driver's license, versus, “Oh, hypothesis 3 didn't work out – I'll just remove it.” Who uses this research anyway? Why call it a crime? We really need to think not only about KSAs but also motivation.

JM: I agree 100%. But also – our obsession with novelty gets a bad rap, and sometimes that's unfair. Novelty can be good too. People study things that are interesting to them, and entrepreneurship is inherently about new ideas. That makes it harder to build cumulative knowledge. Science always looks backward, and that's hard to reconcile with entrepreneurship scholars who look outward. Editors and reviewers ask for theoretical contributions, which forces a backward gaze. But entrepreneurship thrives on the new. So we have to balance novelty and accumulation. Some students and colleagues are just tired of classic topics. Think about entrepreneurial orientation – it's been around so long, we have meta-analyses of meta-analyses. But new scholars don't want to study it. It's like rebooting Spider-Man every 10 years. People want something fresh. We don't want to limit research to only cumulative work. We also want to avoid stale journals. New ideas and methods need a voice too. That's hard to balance with the push for replication and robustness. We'll get there, but not overnight.

AU: Coming from a qualitative background, we often use “transferability” as an alternative to “replicability.” Does that make qualitative research less reliable? Or could this be something quantitative researchers consider – replicating in other contexts?

CH: That's what I meant by quasi-replication. I'm not a qualitative researcher, but others here can answer that.

HA: Yes. Like any research, there are three types of replication: reproducibility, literal or constructive replication and generalizability, which is basically transferability. But the point isn't whether it transfers or not – it's why. Can we explain and predict why something transfers? That's what matters. At the end of the day, all research is about understanding the world better. I understand constructivism vs. positivism – I've been called a neopositivist many times. But there is a real world out there. If you try to stop a train with your hands, it'll hit you. Reality exists, even if we study it in different ways.

JM: Qualitative research is typically used to build theory. That theory can then be tested in other contexts. But we shouldn't impose positivist standards on qualitative work – it's a different paradigm. I agree. Transparency is an ideal in both, but the motivations and logic are different. That matters.

VKG: Thank you. I know many in the room had questions, but we're out of time. Christina, closing remarks?

CT: I'm Christina, PDW Chair for the Entrepreneurship Division. Our goal for these plenaries is to tackle timely, bold and sometimes difficult topics. Replication is one of those – it's risky, misunderstood and essential. We've opened Pandora's box on replication studies. As researchers, we aim for transparency, rigor and impact. This plenary has responded to that call beautifully – with tremendous insights from our speakers and audience.

We've talked about theory, method, misunderstandings and how to improve doctoral education. As a qualitative scholar, I've heard for years that “you can't publish replication” or “you can't publish qualitative research.” Today proves otherwise. It's not about whether it's replication or qualitative – it's about how it's done. Entrepreneurship is about experimenting, observing and adjusting. That's what we've done today. Many of us share concerns about where research is heading, how AI challenges integrity and what it means to replicate with rigor. So, we've opened, advanced and filled this space. Thank you to our amazing panel – Jeff, Herman, Connie and Per – and to all of you. Let's keep raising the bar.

AU:

Audience Member

CH:

Connie Helfat

CT:

Christina Theodoraki

HA:

Herman Aguinis

JM:

Jeff McMullen

PD:

Per Davidsson

VKG:

Vishal K Gupta

Following common practice for reporting such proceedings (see, for example, Sarasvathy (2000)), some words or phrases, such as “you know” and other filler or repetitive and inconsequential nuances of speech, have been deleted.

Bettis
,
R.A.
,
Helfat
,
C.E.
and
Shaver
,
J.M.
(
2016
), “
The necessity, logic, and forms of replication
”,
Strategic Management Journal
, Vol.
37
No.
11
, pp.
2193
-
2203
, doi: .
Crawford
,
G.C.
,
Skorodziyevskiy
,
V.
,
Frid
,
C.J.
,
Nelson
,
T.E.
,
Booyavi
,
Z.
,
Hechavarria
,
D.M.
,
Li
,
X.
,
Reynolds
,
P.D.
and
Teymourian
,
E.
(
2022
), “
Advancing entrepreneurship theory through replication: a case study on contemporary methodological challenges, future best practices, and an entreaty for communality
”,
Entrepreneurship Theory and Practice
, Vol.
46
No.
3
, pp.
779
-
799
, doi: .
Dahlqvist
,
J.
,
Davidsson
,
P.
and
Wiklund
,
J.
(
2000
), “
Initial conditions as predictors of new venture performance: a replication and extension of the Cooper et al. study
”,
Enterprise and Innovation Management Studies
, Vol.
1
No.
1
, pp.
1
-
17
.
Frank
,
H.
,
Kessler
,
A.
and
Fink
,
M.
(
2010
), “
Entrepreneurial orientation and business performance — a replication study
”,
Schmalenbach Business Review
, Vol.
62
No.
2
, pp.
175
-
198
.
Pratt
,
M.G.
,
Kaplan
,
S.
and
Whittington
,
R.
(
2020
), “
The tumult over transparency: decoupling transparency from replication in establishing trustworthy qualitative research
”,
Administrative Science Quarterly
, Vol.
65
No.
1
, pp.
1
-
19
, doi: .
Sarasvathy
,
S.D.
(
2000
), “
Seminar on research perspectives in entrepreneurship (1997)
”,
Journal of Business Venturing
, Vol.
15
No.
1
, pp.
1
-
57
.
Schwab
,
A.
,
Aguinis
,
H.
,
Bamberger
,
P.
,
Hodgkinson
,
G.P.
,
Shapiro
,
D.L.
,
Starbuck
,
W.H.
and
Tsui
,
A.S.
(
2023
), “
How replication studies can improve doctoral student education
”,
Journal of Management Scientific Reports
, Vol.
1
No.
1
, pp.
18
-
41
, doi: .
Ioannidis
,
J.P.
(
2005
), “
Why most published research findings are false
”,
PLoS Medicine
, Vol.
2
No.
8
, pp.
e124
, doi: .

or Create an Account

Close Modal
Close Modal