This blog article refers to the article in press by A.K. Ghose et al. on “Technically Extended MultiParameter Optimization”, 1 and the somewhat pivotal works of T.T. Wager et al. on “Defining Desirable Central Nervous System Drug Space through the Alignment of Molecular Properties, in Vitro ADME, and Safety Attributes” and “Central Nervous System Multiparameter Optimization Desirability: Application in Drug Discovery”,2 and attempts to explain what an MPO is and discusses the two systems’ design.
A Multiple Parameter Optimisation (MPO) tool in any application domain is one where the user selects several important parameters that collectively indicate prediction of an outcome for a particular endpoint (e.g. oral bioavailability). The user then creates a scoring system which balances a scoring matrix across all of the selected parameters, so as to reduce the data to (usually) a single number or small collection of numbers (“scores”). This is simply a data reduction, which was briefly discussed in a previous article about man-made metrics for drug discovery (Reducing Data: Ligand Efficiency and Other Fallacies).3 Unlike QSAR / QSPR, where we use various rigorous mathematical and statistical methods to determine which factors are important, and weight them accordingly, often an MPO in drug design uses criteria that are picked by senior scientists who have many years’ experience observing the particular endpoint in question. It has the added benefit of usually being a bit more human-readable compared to typical QSAR / QSPR
An MPO differs from a hard-logic filter (e.g. Lipinski’s Rule of 5), in that it considers optimal and suboptimal values with graduated scores, whereas in a hard-logic filter, there are only two states: pass or fail. Typically anything failing a hard-filter is thrown away, whereas a moderate scoring MPO material might be tweaked to improve it as part of Lead Op. You simply show the MPO your structure (or SMILES string), and it will calculate the values for the selected criteria, and give you a score – in the case of the Wager MPO, a score between 0 and 6, with a score of 4 or higher being representative of something that is “probably CNS penetrant”. If you are in abundance of chemical matter, you might throw the lower scorers away, but if you are limited with molecular hit matter, you might design your molecule to improve it (it is an MP optimiser after all).
An MPO for determining likelihood of central nervous system (CNS) penetration has been outlined back in 2010 when Travis Wagner and colleagues at Pfizer, using internal data, determined a six-criterion system.2 It is fair to say that this work changed the way many medicinal chemists designed their output for neuroscience targets across multiple organisations (including ours). The same authors revisited their work earlier this year with some further pseudo–post-hoc validation data (more on this later). Arup Ghose is a name well known by chemoinformaticians, and I recall reading his works at the turn of the millennium alongside those of Lipinski and Oprea for filters for oral bio-availability and the like (he also attributed with the invention of the AlogP method of LogP prediction). Ghose and colleagues recently published their own ideas on a suitable CNS MPO using a humanised-QSPR type approach.
The 2010 Wagner paper has been cited over 200 times, with the Pfizer MPO being used by various organisations and groups as their primary MPO for neuroscience projects. As a result, Ghose’s suggestions are an interesting variance, especially as Ghose’s is statistically more rigorous in its design.
For a model to be a useful one, it needs to be validated. This is normally done by taking a data set, and randomly splitting it into a design (training) set and a validation (test) set. You build the model on the training data and see how well it holds up against the test set. This is how the Ghose model was designed and built, and hence it can statistically demonstrate its validity within the data set. In the case of the Travis MPO, they fell into the normal non-statistical pitfall of creating a Texas Sharpshooter Fallacy (like Lipinski and many others, below), in that they used the whole data set to build the model, and then had no data that was external to the training set in order to validate it. In the case of Wager’s 2016 paper, they effectively demonstrated conformational bias in recent development.
A Texas Sharpshooter Fallacy is a mathematical fault-of-reasoning where a person shoots a wall with a gun and then goes and draws the target around all of the bullets in the wall, and claims they were all within the target. Without an extra set of bullets to then go back and shoot into the target, you cannot validate how good a shot he is. This is basically the use of all of the available data in the creation of a model, and leaving none aside to test it with, and then calling it successful because all of the data matches (even though it was the same data used to make the model).
Figure 1: Data on numbers of candidate and drugs with their MPO ranges from Wager et al. (loc. cit.)
As can be seen in Figure 1, these are trends and not steadfast rules, however MPO’s are very useful in reducing the perceived risk of endpoint failure. There may be a problem in assessing the quality of this MPO, as many organisations use this method for development of early materials, and as a result, we have a confirmation bias issue, discussed at the end of this article.
Table 1 shows the different criteria used for their optimisation methods. Though it contains more components, Ghose’s system requires less computing to calculate, having only AlogP as a required machine-calculable element (the rest could be done by eye), however, realistically you would have your software do the maths for them all. Ghose and colleagues used a computational data reduction method to reduce the criteria to eight in a way that has effectively tried to reduce the elements that give variation by method (e.g. different software will determine pKa, LogP and logD differently. In fact the same software will give different values depending on the version. Recently we have seen our LogP and LogD values change overnight as ChemAxon changed the way they natively calculate those criteria). By trying to avoid these and stick to human measurable, functionally discreet criteria it becomes less method dependant. The use of AlogP rather than a complex ClogP, KlogP, ACDlogP also attempts to minimise errors (or rather make errors more consistent) where there are novel chemotypes that are not likely in the model set for the LogP calculator (see previous article on CLogP and other short stories).4
Table 1: Comparison of the criteria in both Wager MPO and Ghose TEMPO
* No method suggested
The way the criteria are scored varies between authors. In the case of Wager et al. the criteria were scored according to Figure 2 (mostly monotonic, with a hump function for tPSA), whereas Ghose used a hump function across all of the criteria (Figure 3 and Table 2).
Figure 2: Criteria plots, each detailing a parameter (“desirability function”) in the Pfizer (Wager) MPO. The six criteria are scored on their value for each compound, with a result being a score of between 0 and 6.
Figure 3: A hump function (albeit upside down), where P is preferred, and Q is qualifying, U is upper and L is lower. The penalty is the applied in a scale for materials outside of the preferred range.
Table 2: The scoring range for Ghose et al.’s TEMPO
A good scoring system should be a mathematical construct based on a criteria and its relevance as described in Figure 4.
ScoreABC… = (criterionA * coeffA) + (criterionB * coeffB) + (criterionC * CoeffC)…
Fig. 4: A simple scoring formula, where a criterion (e.g. LogP), is multiplied by its weighting coefficient. All the component products are then summed.
In a system like that of Figure 4, each criteria is multiplied by its weighting, which is derived from its determined importance in its contribution to the endpoint. In classical QSAR, it is determined by PCA or other regression technique, but in the case of human data reduction it is often whimsical. In the case of Wager et al., each component of the MPO was given the same weight, that is, each coefficient was 1. In the case of Ghose et al. each criteria was given a weight derived from the data reduction analysis in the model design (Table 2, column 6 “coeff (C)”). You can see in Ghose’s that the number of basic amines is three times more important than the number of rotatable bonds, for example, whereas in the Wager MPO, all features are as valuable as each other.
Comparison and confirmation bias
Typically we could compare models here to see which predicts or gives a better correlation to CNS penetration by taking a dataset from our pipeline and seeing how it predicts, however we have a problem with confirmation bias. Like the atomic bomb dispersing isotopes rendering certain archaeological aging techniques impossible in modern times, the Wager CNS MPO system may have dispersed into product pipelines since (and possibly before, if the system was used internally) their paper, and so materials that are CNS penetrant that also score highly in the CNS MPO in our compound deck may be because materials that did not score highly, were not developed in the first place.
As a result, we would need a data set that was evidently not based on or used in the generation of this CNS MPO system, or in fact any CNS or development guide as they will influence the materials in the comparison set.
Conclusions and comments from the blogger (whose opinions are his own).
Without a doubt MPOs are fantastic tools to simplify and somewhat humanise the abundance of data in order to give chemists information about materials to make or avoid. It is also beyond question that the original works by those Old Guard of filters (the Lipinski’s and Oprea’s and Ghose’s), have shaped how we design and prioritise materials, and likewise Travis Wager and the Pfizer team really influenced multiple groups around the world by shining a light on how to optimise their materials for CNS penetration at the design stage. I believe that the Ghose et al. TEMPO, despite probably being named that way purely for the cool acronym, is a statistically and logically more rigorous piece of data reduction. Wager’s 2010 paper it seems more contextual, and thoroughly details the thoughts and trends behind the MPO is a less pure-stats way.
The problem with confirmation bias is actually testament to how well adopted the Wager CNS MPO system and others have been picked up, – it does however now make it quite difficult to compare systems (all we can do is try to re-zero across the MPO’s to see what it translates to. It is likely we will keep an eye on both systems in our data generation and see how they track side by side.
Conformational bias is apparent in a number of other areas in drug discovery, and a prime example is Ligand Efficiency metrics, which have permeated multiple organisation’s design principles.
In the few organisations I have worked in, I have seen and used multiple CNS and other MPO tools which means that the compounds at the end are tainted by design, and are useless for comparing methods. I wonder how true this is across the larger chemical community.
So my challenge to the reader, next time you look at your enumerations / libraries and line up your synthesis priorities next to your nicely coloured coded MPO score columns, whichever tool you use, and whatever the endpoint, what information you are really getting from them, and are you perpetuating a cycle of confirmation bias (and limiting chemical space in doing so)?
1.Arup K. Ghose, regory R. Ott and Robert L. Hudkins, ACS Chem. Neurosci., article in press,
2. (a) Travis T. Wager, Xinjin Hou, Patrcik R. Verhoest and Anabella Villalobos
DOI: 10.1021/cn100008c |ACS Chem. Neurosci. (2010), 1, 435–449
(b)Travis T. Wager, Xinjin Hou, Patrcik R. Verhoest and Anabella Villalobos.
DOI:10.1021/ascchemneuro.6b00029 | ACS Chem. NeuroSci (2016), 7, 767-775.
Blog written by Ben Wahab