AI – a cure for the ROI?

Marcus 1

The face of the Pharmaceutical industry has changed beyond recognition over the past 20 years with many of the major players passing through multiple rounds of M&A, calving off large swathes of their portfolios, synergising, repurposing drugs ………all ultimately to improve Return On Investment (ROI). It is no secret the sector has had a rough ride with blockbuster drugs becoming increasingly rare, only ≈10% of drug candidates in phase I reaching approval in the years 2006-2015 (1) , increased payer pressure to cut prices, company revenues taking a hit as patent cliffs pass by and a dearth of innovative medicines being brought to the market.

How long will companies be able to sustain the significant cost of R&D whilst still turning enough of a profit to satisfy shareholders? The morality of industrial drug discovery (DD), long questioned in any case by those outside the industry, will be under serious scrutiny – not hard to see why with companies like Gilead charging upwards of $80,000 for a full course of the hepatitis C drug Sovaldi (2).  Take Ebola for instance.  Prior to the 2014 outbreak in West Africa (3) Ebola may have been viewed as an African problem and perhaps not an attractive area for investment.  After the outbreak and associated hysteria in certain corners of Western society all of a sudden this had the potential to be a little more than just an African problem.  Ebola was discovered in 1976 yet a study by the University of Edinburgh in 2015 estimated around half of all funding for Ebola research occurred in the 2014-2015 period after the outbreak.

Even the philanthropy of the Wellcome Trust is driven by ROI which unlike VC funding, is perhaps based more on intellectual than financial value, but here too we see a move away from traditional DD as demonstrated by the demise of the Seeding Drug Discovery Initiative. Granted this change in focus may be designed to generate new targets or technologies but the sentiment is clear; as traditional DD becomes more difficult with target patient populations potentially dwindling as a result of increased personalised/specialised therapies or peripheral areas of unmet need, where is the motivation for investment?

We are in desperate need of new anti-bacterials, as the population becomes older the prevalence of dementia is on the rise and alongside the ever present spectre of cancer all represent substantial investment if we are to have a meaningful impact in the development of effective treatments for these indications. It is clear that with the patient heterogeneity and the variable aetiology of these conditions that the model of drug discovery to date needs a significant change in prosecution.  In an effort to speed up the DD process we have seen a recent spate of AI-Large pharma collaborations (4/5/6) but are these tools merely ways to speed up old methods or will they genuinely result in the generating novel targets which might otherwise remain undiscovered by conventional means of investigation?

Regression analysis such as QSAR has long been used within DD to correlate physiochemical and functional parameters to guide chemical synthesis programs. Correlations between various ‘attributes’ (derived from principal component analysis) are used to generate a model which can be used to predict the behaviour of new compounds.  Of course the model is only as good as the number of data points and parameters used to define it and, once defined, remains unchanged.  The essential difference is that AI (specifically Deep learning) generates self-adapting models using a multi layered network approach that wasn’t really possible before the development of GPUs which allowed the parallel processing of vast amounts of data.  The data is assessed according to any one of its ‘attributes’ in the ‘top layer’ before being passed to the ‘second’ layer and processed according to another attribute etc. and as this is an unguided approach the systems needs a sufficient volume of data and many iterations to generate a reliable model of correlation.  After each iteration the algorithms used to generate the correlations can then be altered as the network ‘learns’ from the previous iteration.  Like any other computer model generating system though, it is liable to the ‘garbage in, garbage out’ (GIGO) concept.

It is easy to see how AI can be and is effective in, for instance, developing candidate compound libraries generated from well-characterised protein and protein/ligand crystal structures and suggest routes of synthesis (7) but the question of how AI can truly revolutionise an ailing industry is a long way off being answered. The regression analysis used in AI is the same as that used in QSAR for years, essentially it’s just the volume of data and the learning aspects that are different.  The hope, however, is that AI will generate unique correlations that have thus far eluded us or only revealed themselves serendipitously.  Pfizer’s hopes to quickly analyse and test hypotheses from “massive volumes of disparate data sources” (including more than 30 million sources of laboratory and data reports as well as medical literature)(8) seems, to the untrained eye (mine!) to be fraught with danger regarding curation of the input data.  Even in the simplest instance of a standard compound IC50, how would un-curated inter-institution variations affect a blind, self-determining analysis?  Perhaps, conversely to the GIGO scenario, considering the volume and disparate nature of data used (literature, experimental, predicted) and correct application of principal component analysis in a given enquiry, AI may actually be resilient to these small variations.

With regard to mental health we only have to look at our efforts to provide an objective definition of a subjective experience in the reconceptualization/re-categorisation/inclusion/elimination of mental disorders in subsequent editions of the ‘Diagnostic and Statistical Manual of Mental Disorders’ (9) to know that our understanding of these disorders is in a constant state of refinement.  AI assessment of a potentially novel pathway/target based on the prevailing definitions of a given condition superposed on the inherently variable subjective clinical data would, it seems, yield different answers from one year to the next.

AI has been hampered not necessarily by the development of algorithms but by the availability of sufficiently broad, curated training data sets and the development of both GPUs and adequate storage (10). With the advent of ‘-omics’ technologies able to acquire vast amounts of data, only relatively recently have the means been developed by which we can effectively interrogate this huge repository of information.  It would seem then that a standardised curation of this data is of primary importance if the industry is going to rely heavily on AI to effectively generate new medicines……notwithstanding the importance of generating clinically verified biomarkers in parallel…..but that’s for another blog!!

We only have to look at GenBank as an example. From its inception in 1982 it took until 2003 for the first release of the curated RefSeq collection.  I remember trying to identify novel splice variants in the late 90’s only to be frustrated by poorly annotated and simply incorrect sequences.  Contemporary parallels can be drawn with the Protein Data Bank (PDB) especially in relation to a) the Structural genomics program where un-curated, non-peer reviewed, homology based structures are being submitted to the database (11), and b) inaccurate Protein-Ligand co-crystal structures (12).

It is clear that AI can/will be/is a benefit during every step of drug discovery and that algorithm refinement is an ongoing, iterative process but what is not currently clear is whether AI will deliver where, for instance, HTS failed in dramatically impacting on the inefficiencies of the DD process (13). I have no doubt that very soon AI will become a fundamental part of all aspects of health care and drug discovery but I wonder whether this will actually precede the demise in scale of small molecule drug design and highlight the need to pursue other avenues (e.g. Gene therapy/Biologics) more vigorously.   In any case as the complexity of both the diseases/unmet need and the required solutions increase it will be interesting to see how ROI will be maintained and how much more Big Pharma consolidation we will see over the coming years.

Blog written by Marcus Hanley


  3. Fitchett et al. (2016) Ebola research funding: a systematic analysis 1997–2015, Journal of Global Health. Available from:
  9. The DSM-5: Classification and criteria changes. World Psychiatry. 2013 Jun; 12(2): 92–98. Published online 2013 Jun 4.
  11. Domagalski et al. (2014) The Quality and Validation of Structures from Structural Genomics MinorMethods Mol Biol. 2014 ; 1091: 297–314
  12. Reynolds, Charles H (2014) Protein-ligand cocrystal structures: we can do better. ACS medicinal chemistry letters, 10 July 2014, Vol.5(7), pp.727-9





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s