How do you gauge the importance of (data-driven) models?

Imagine you are the editor of two journals, one with higher impact than the other.  You receive two modeling papers, one with more predictions than validations and vice-versa.  You can only accept one to each journal. Where would you send each?

What I’m asking is:  How do you gauge the importance of a data-driven* model?  By the number of predictions (including the proposal of novel experiments) that it makes or by the number of existing experiments that it agrees with? If both, then how would you weight each different types of contributions?

* I specify data-driven models to separate them from theory-driven models.


4 thoughts on “How do you gauge the importance of (data-driven) models?

  1. I tried very hard to imagine your scenario but the truth is it has too many other uknowns. “all other things being equal” is not a real option. data-driven and theory-driven is on a continuum and both extremes are not popular for high impact journals (the one is too specific, the other too speculative). obviously, the kind of journal is a factor. I’ll assume for the moment that it’s a general interest one, like nature or science or pnas or CB if we stay in the biological sciences… then i try to not think of a specific area.

    For a data-driven model, the assessment criteria are quite easy. If it is published together with data that challenges existing models it has higher chances – an effect and a model! with or without data, it would be assessed with respect to what it does to existing models – is it simpler, more general, more convincing… does it fill a gap that has been convincingly argued? you can quite clearly see what the model does to the field and if it this contribution suffices your journal standards.

    For theory-driven models, it’s more difficult. They are more speculative and therefore you have to estimate the likeliness that they will really make a difference. the information is not in the paper, you cannot just look at the graphs and the formulae, you have to deal with the story of the model. for a start, many of the reviewers do not like to do that. So it’s of central importance how straight forwardly and convincingly the point is made. I guess, another factor is whether the senior author is known to produce good solid work – you are much more likely to take time and listen to someone you trust.

    So if you have difficulties publishing theory driven work, be aware there’s more randomness in the process:
    – try show it relates to questions/results deemed central.
    – make a 10 times effort in the write-up – it will make all the difference
    – suggest reviewers that are able/like to deal with theoretical questions
    – try again, again, again, again. it’s too much of a lottery, don’t overestimate the significance of one unfavourable decision.

  2. …also your personal beliefs of what’s “the truth” are much more likely to bias the perception of the significance of a model if it’s theory driven… more randomness…

  3. I just realise that i totally misread the question – sorry shouldn’t reply to blog post first thing – can you delete the comments? but i think any good data driven model cannot but have both. validation for being data driven, and prediction = generalisation for being a model…

  4. I think it depends on a number of things. In making predictions, a paper shouldn’t extrapolate its model further than the evidence warrants, particularly if the model is based purely on data and not on generally accepted theory. In that sense, and I restrict my following comment to that sense only, data-based model papers which have “more predictions than validation” should usually be revised before publication.

    Of course there are times when this rule of thumb doesn’t apply: for instance, when validation isn’t reasonably available and the predictions are very important or interesting. Suppose you are trying to model some poorly-understood phenomenon which only occurs very rarely, but which is important when it occurs. Then it may be worth knowing what the predictions of the best data-based model are, even if you can’t put much confidence in those predictions.

    At the end of the day, if the model came from a very limited dataset, or explained the known data relatively poorly, and then made interestingly counterintuitive predictions, that wouldn’t sell the study to me. Rather the reverse: it would positively make me doubt the model’s credibility still further. On the other hand, if the predictions were in line with intuitive expectations, what would the interest be in publishing them? In other words, the predictions would have to be intuitively credible but thought-provoking in order to improve the paper.

    That’s not to say that counterintuitive predictions made on the basis of relatively little evidence can’t be correct, but when that’s all they are based on (as opposed to an elegant theory), I think they tend to be wrong. When they turn out to be right, well, then they become evidence instead of predictions, and the model’s degree of validation increases.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s