The partition coefficient (logP) of a material defines the ratio of its solubility in two immiscible solvents – although we normally use octanol : water, it could be any combination of immiscible fluids. This property is one of those chemical descriptors that pervades all aspects of ADMET and is used to filter out and define chemical space in which to work. Oddly, for such an important property, most projects and programs are built upon materials where the LogP has never been experimentally determined: relying on predicted values generated by software.
Recently, our DMPK scientist presented a series of predicted logP values vs some that he expertly determined in the lab. Whilst the correlation was good in many cases, there were some significant outliers, so he came to ask me, the computational chemist, to see if I might explain why the calculated logP was so different. There were some obvious structural features that can beguile certain methods of calculating logP – yes, there is more than one method of calculating logP – and other methods might closer predict the outlier values in our case.
Not All LogPs are Calculated Equal
When chemists talk about ClogP they are usually erroneously referring to “calculated” logP. To a CADD scientist, ClogP means something different – ClogP is a proprietary method (owned by BioByte Corp. / Pomona College) used to predict logP. Whilst there are a range of methods for prediction, there are three basic groups, and the vast majority of the current methods are flavours thereof:
Atomic (e.g. “AlogP”, ) & Enhanced Atomic / Hybrid (“XlogP”, “SlogP”)
Fragment / Compound (“ClogP”, KlogP, ACD/logP)
Property based methods (“MlogP”, “VlogP”, “MClogP”, “TlogP”)
Atomic logP considers that each atom has a contribution to the logP, and that the chemical entity’s final value is purely additive. Crippen et al. first proposed such a method in a series of papers in the late 80’s, with the refined version dubbed “AlogP”.1 The method is effectively a table look-up per atom, and there are plenty of free AlogP calculators available. It is suited to smaller molecules, particularly those with non-complex aromaticity or those which do not contain electronic systems that are known to have unexpected contributions to logP.
Enhanced Atomic or hybrid logP (XlogP, SlogP etc.) is a modification of the AlogP system – to try and address the shortcomings of atomistic approaches to larger systems, it takes the value of each atom type, as well as a contribution from its neighbours, as well as correction factors which help sidestep known deviances in purely atomistic methods. This is an attempt to allow for larger electronic effects. It is fast, being a table look-up technique, and many free software use this too. The smarter hybrid algorithms know the state of each atom and thus how much of a contribution its neighbours add.
Fragment / Compound logP is a method that uses a dataset from full compounds, or fragments, which are experimentally determined, and then modelled using QSPR or other regression techniques in small fragments rather than per atom. Fragment contributions are then added up, with correction factors. The rationale here is that sometimes atomistic approaches do not adequately model the nuances of electronic or intramolecular interactions, which may be better modelled by using whole fragments. This method tends to be better for systems with complex aromaticity, and larger molecules – on the condition that the molecule contains features that are similar to those from which the modelling was conducted. In the case of very obscure motifs in your molecules, then the model from which the prediction is made may not have a very good correlation.
Property based methods…
There are a whole host of methods for determining logP using properties, empirical approaches, 3-D structures (e.g. continuum solvation models, MD models, Molecular Lipophilicity potential etc…), and topological approaches. Most of these methods are reasonably computationally intense, and are buried in the world of informatics and stats, but one is worthy or particular note: Moriguchi’s method (or MlogP), which used the sum of liphophilic atoms, and sum of hydrophilic atoms as the two basic descriptors in a regression model that was able to explain nearly 75% of variance in experimentally determined LogP values of a dataset of 1230 compounds.2 The group later added 11 correction factors, and the model explained 91 % of variance. It is very fast, and so historically it was employed for large datasets, and was included in several property prediction software, such as Dragon, and ADMET Predictor (Simulations Plus, Inc.). Nowadays as computational speed has increased, MlogP is used less, as more accurate methods become manageable, even at large library sizes.
So, which method do you use?
Biovia’s Pipeline Pilot, and Discovery Studio sport a version of AlogP, and Knime has multiple free X and A logP calculator plug-ins. CCG’s MOE uses both an unpublished atomic model (Labute) and a hybrid SlogP. DataWarrior uses ClogP, Dotmatics / Vortex natively use XlogP, but you can patch in others. Cresset BMD’s offerings use SlogP and Optibrium’s StarDrop uses a fragment method. ChemAxon uses multiple methods (including hybrid (VG) and fragment e.g. KlogP), and if you have their InfoCom nodes in Knime, then you can use multiple methods and weight them according to your understanding, or better yet, you can do a quick correlation check across the methods with known data in your series (if your group has the resource to experimentally determine a few of your own LogPs), and then weight your model accordingly.
As a rule (to which there are exceptions):
Simple small molecules (e.g. fragment sized) – AlogP will probably perform just fine, but a hybrid method would be better.
Complex but standard small molecules (the normal development type med chemists love), then fragment / compound logP methods will often be the most accurate. Hybrid methods are your second best option (but still reasonably good).
Complex, non-standard molecules (with rare motifs), then a hybrid system or fragment-based logP may be equally good (or bad), it depends on the model on which the fragment logP is based. You could also get your team to determine some experimentally and see if you can’t build yourself a model…
For statistical insight into many state-of-the-art and classical methods, and how well they perform across large experimentally determined sets, see Mannhold et al.’s thorough review.3
So, to conclude, not all logP prediction models are built equal and there will be times when some models exceed others in accuracy, depending on your chemistry. Hopefully now you’ll at least be able to explain in your group meetings why your predicted logPs were way off…
- Ghose, A.K.; Crippen, G.M. Atomic physicochemical parameters for three-dimensional-Structure directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactions. J. Chem. Inf. Comput. Sci. 1987, 27, 21–35.
- Moriguchi, L.; Hirono, S.; Liu, Q.; Nakagome, I.; Matsushita, Y. Simple method of calculating octanol/water partition coefficient. Chem. Pharm. Bull. 1992, 40, 127–130.
- Mannhold, M. et al. Calculation of Molecular Lipophilicity: State-of-the-Art and Comparison of Log P Methods on more than 96,000 compounds. J. Pharm. Sci. 2009, 98, 861-893.