Innovation

Probabilities that aren't

The Probability of Technical Success, I am going to suggest, is a distracting, dangerous methodology.

Here's my starting point. Most programmes in pharma only get advanced when they have a 'PTS' of >50%. (And I am not even going to discuss why 50% would be a useful number again...) Applying the same rules as any other serial user of probabilities, that would mean that those decisions were right over half the time. (If weathermen say there's a 50% chance of rain tomorrow, half of the times they say that it has to rain, or they revise their probabilities - it's a constant-feedback loop.) And, in pharma, by definition, they were not. Most of the decisions to move forward (therefore, 'PTS >50%') were 'wrong' - the raw numbers say so. We'd see a lot less 'attrition' if they were right - by definition, over half of all projects would succeed. It's not a question of what is a 'right' decision or a wrong one - the PTS is calculated against a certain outcome, so there is a binary right or wrong outcome.

So, my question to one company recently: where is the feedback loop? How often do you revisit the calculations, or the methodology, to see how to improve the number (and therefore their utility)? Their answer: ooh, never... Maybe we'd do that every 10 years.

I'm well aware of the nonsense inputs that produce a 'PTS' (mouse model/ no mouse model, disease area averages, etc.), but let's accept that it can never be perfectly predictive. But if we know the numbers are wrong, why aren't they revised? The answer, it seems (at least at this company), is that it would produce 'negative' NPVs.

I am also aware of the extensive crunching done on the numbers, with some companies tornadoing and Gaussianing and all kinds of other clever -ings. However, if your company gives precedence to risk simply because a disease has a mouse model where others don't, none of that really matters. Or it kills entering diseases where other mechanisms have failed in the hundreds, despite you having a novel mechanism. The underlying, big chunk assumptions are what is wrong in most cases.

And so the cycle continues. The infection and infestation of drug development and portfolio management by numbers no-one believes.

Some possibilities:  

  1. We do the numbers, but no-one really believes them unless they're extreme (5%, 95%...), because they're useful in planning (as Philip Tetlock recounts the tale: 'future Nobel laureate Kenneth Arrow, when he was a young statistician during the Second World War, discovered that month-long weather forecasts used by the army were worthless, and warned his superiors against using them. He was rebuffed. “The Commanding General is well aware the forecasts are no good,” he was told. “However, he needs them for planning purposes.”'). That is, we know they're wrong, but the variance is wrong in both directions, so we use them because what else would you use?
  2. We do the numbers so that it looks like we're putting governance into a subjective system. Better than telling the analysts about the HIPPO.
  3. We do the numbers because we always did the numbers, and no-one has yet decided to try something different.

(There may well be other reasons you can tell me, but I haven't yet heard a defence of the idea of a no-feedback-loop system.) Most conversations suggest that there is an awful lot of gaming, or body English, used to make sure the numbers that we get are the numbers that are wanted: 'you don't need a weatherman to know which way the wind blows...'

Then, consider this: imagine a 70% PTS for a phase III, and 70% Probability of Regulatory Success (both high numbers, in reality). That means, by definition, a probability of 49% of achieving the desired label, going into the expensive study, if we believe anything we just calculated... (Even if we ignore the possibility of that label giving us commercial success.)

There are several questions open to decision support.

  • Q: Could you improve the numbers to be more 'true', even if they're not as good-looking, by using constructive feedback loops? (A: yes, of course.)
  • Q: Could we imagine a better predictive system? (A: yes.)
  • Q: Could we run a controlled study to see whether a better predictive system would produce better results over two years? (A: yes.)
  • Q: Why would we not do all three of these? (A: ?)

This may well seem a specious argument. 'Well, with what would you replace them? McKinsey designed our process 10 years ago, and we know it is cr**, but just imagine the effort to uproot it and start again...' Well, as they say, 'your lack of imagination is not an argument.' An industry that believes (or says it believes) in precision, in diagnostics, and statistical analysis, should be in a healthier place.

We have to do better, and we know we can. We just have to want to.