Citizen scientists

Tesh 1Many moons ago while I was an postgrad in Dundee, when a computer screen was the size of a medium to large beach ball, i.e. the late-90s, I came across a curious project over the then relatively nascent internet called SETI@home.

The project was conceived by the Berkley SETI Research Centre as an initiative to engage with the general public to help address a specific (and possibly the ultimate) question, “Is there anybody out there?”. The premise being that if there are other (extra-terrestrial) intelligent life forms out there in the universe, they may be broadcasting (knowingly or otherwise) their presence and thus should be detectable by simply listening in.

A vast amount of data was collected, consisting of recorded radio signals across a chunk of the spectrum, which then needed to be analysed. The scientific community could not fund/carry out the analysis and needed an “out of the box” solution. Step in SETI@home, where the idea was to make use of the processing power of the vast numbers of PCs that were starting to take up residence within the public domain. The strategy was not particularly interactive and solely relied in the ability of PCs to crunch through the data when not in use by their owner. All you had to do was register and packets of data were sent out to your PC (the University of Dundee’s in my case, ahem!) and your PC did the rest. There was something hypnotic, ethereal and satisfying as the screen presented how much data had been processed and how much more there was to go. Even as someone who only has a passing interest in space related matters, wondering if there were any “curious” signals in the data, how long it would take to get a positive answer and if enough bandwidth had been covered to account for a negative outcome, was exciting and alluring. Though it was not particularly interactive, it did capture the imagination of the wider public and made use of their individual resources to try and answer an important scientific question.

A decade or so later I came across another project ( that pricked my curiosity and again it was a space related project. The premise was as simple as it was unimaginable – spot planets as they traverses across the face of their host star(s) in the line of site from Earth from up to ~1500 light years away. Tesh 2

This however was made possible by the recently launched Kepler space telescope by NASA, which is a space based observatory with the sole purpose of peering at a patch of the sky continuously and recording (every 30 minutes) the light emitted by the stars within its field. Every 3 months or so the data would be downloaded by NASA and distributed to its collaborators within the scientific community for analysis. This involved dealing with an incredible volume of data, for which data processing and analysis pipelines had to be set up for.

Though these were in themselves incredibly successful, it was recognised fairly early on that engaging with the public in a meaningful way would be advantageous, not only to prevent public apathy (which seems to haunt many of these large projects) but also in recognising that any analysis pipeline set up would ultimately be limited.

Analysis of any large data sets requires assumptions, which are usually applicable across the majority of instances but not where there are deviations from the “expected scenario”. The ability of the human eye to perceive subtle differences and patterns was seen as an advantage here, and which, with the right approach could potentially be tapped. Step in (hosted by and citizen scientists.

At its simplest level (below), the platform asks the public if there is a periodic signal within the light curve of a particular star. However, the level of analysis could be more complex if desired, e.g. identifying the star type, accessing the unprocessed/raw data and links to information about a stars’ age, its metallicity, a deep visual look into its neighbourhood and so on. There were forums set up to discuss the data in general, reoccurring glitches in the data, individual stars, analysis pipelines for larger bespoke batch analyses and much more.

Tesh 3So did these citizen scientists find anything? Yes, being the clear answer. To date there have been 10 peer reviewed publications and more will no doubt follow


These types of endeavours, where the public can be engaged in a meaningful way to answer specific and scientifically inspiring questions, are important on a number of levels:

  1. Access to a “free” and potentially vast resource.
  2. There are important (and sometimes unexpected) discoveries to be made.
  3. It prevents public apathy.
  4. Exposure.
  5. Funding (e.g. from the greater exposure of the project)

The question arises that in this age of big data, particular with the explosion in cell biology and disease related big data projects, why do they not also have such endeavours? For such well-funded scientific areas, for there to be only one such endeavour on (Etch a Cell) is if nothing else sad. Etch a Cell is an initiative led by the Francis Crick Institute where the aim is to engage with the public to help build 3D models of the nuclear envelope from electron micrographs. This is of interest to me and (possibly some) other people in my field of research but it hardly captures the ziet geist as Planethunters and SETI@home did and continues to do so.

Blog written by Tesh Patel

From lab to launch….

Drug discovery is a time consuming and expensive undertaking. Currently it commonly takes between 10 to 15 years to get from the start of the discovery process through to launch (if you make it !) and the cost can range from $2 billion to $5 billion depending upon whose statistics you reference. The preclinical discovery phase tends to be short and relatively cost effective, the output of which then runs the gauntlet of a battery of toxicity and safety tests before being allowed to enter testing in humans. After what is usually a relatively short efficacy study in a small number of patients provides a suggestion that the drug may work, a series of larger and longer clinical studies ensure that the drug is both safe and effective. If all goes well in these studies the regulatory authorities, usually FDA or the EMEA, will review the extensive data package and if opinion is positive, give approval for the drug to be launched for use in routine clinical practice.

We all want this process to be faster and more effective but without any compromise on safety and determination of efficacy – and by ‘we’ I mean drug discovers, patients, the pharmaceutical industry and regulators. We also want the process ideally to be more efficient and predictive with less chance of failure, particularly in late stage, expensive studies (one of the major reasons the costs above are in the billions – if every drug that entered clinical studies worked the cost of discovering a new drug would be ~$350 million). A focus on orphan diseases where patients are more homogeneous and we have strong understanding of the genetic basis of their disease is delivering higher success rates. Perhaps the poster child for this approach has been cystic fibrosis and the introduction of therapies that effectively address the genetic defect by repairing the defective protein – in CF it’s the cystic fibrosis transmembrane conductance regulator (CFTR), a chloride ion channel. Vertex Pharmaceuticals introduced the first of its expanding portfolio of CFTR repair therapies in 2012 – this was the CFTR potentiator, ivacaftor (tradename Kalydeco). Ivacaftor demonstrated impressive clinical effects in patients with a specific CFTR mutation (G551D) and has demonstrated efficacy in a number of follow on trials in CF patients with mutations which are biophysically similar to G551D (a CFTR protein that makes it to the cell membrane but is loathe to open). G551D is the third most common CF disease causing mutation that accounts for somewhere between 2 – 5% of the CF population so relatively rare as there are estimated to be ~70,000 patients worldwide.

So what happens if you have a medicine which you believe will deliver benefit to additional patients but they are few and far between, with not enough to undertake a robust phase 3 trial ? This was the conundrum facing Vertex when looking to expand the labelling for Ivacaftor. In what is the first of its kind the FDA granted expanded approval to Vertex for Ivacaftor based upon in vitro data only. This could be a landmark step and the FDA has acknowledged that this approach could have implication for other drugs that have a well understood safety profile and address well characterised diseases. With Ivacaftor Vertex have a drug with a robust safety package, a strong understanding of its mechanism of action and have put considerable effort into assessing the correlation between preclinical cellular assays, clinical biomarkers and registerable endpoints. To support the request for expanded labelling Vertex expressed ~50 mutations in Fisher rat thyroid cells, a cell system widely used by the CF field as it has low expression of background chloride channels and can be used in a variety of assays (including Ussing chamber ion transport). Mutations that delivered a 10% increase in chloride transport when treated with Ivacaftor were considered responsive. This wasn’t an arbitrarily selected figure but one borne out by Vertex’s clinical experience with Ivacaftor and other compounds from their developing CFTR repair portfolio. Of those tested 23 mutations have been added to Ivacaftors labelling (26 failed to meet the criteria).

In real terms this means that ~900 CF patients in the US alone will now have the opportunity to access this breakthrough medicine – my congratulations to Vertex for pioneering the approach and my congratulations to the FDA for entertaining it….let’s hope it can be pursued for many other diseases.

Martin 1

Image source:

Blog written by Martin Gosling


Durmowicz A.G et al (2018) The FDA’s experience with Icavaftor in cystic fibrosis: establishing efficacy using in vitro data in lieu of a clinical trial. Ann Am Thorac Soc. 2018 Jan;15(1):1-2. doi: 10.1513/AnnalsATS.201708-668PS.

Kingwell K (2017) FDA Oks first in vitro route to expanded approval. Nature Reviews Drug Discovery; doi:10.1038/nrd.2017.140

AI – a cure for the ROI?

Marcus 1

The face of the Pharmaceutical industry has changed beyond recognition over the past 20 years with many of the major players passing through multiple rounds of M&A, calving off large swathes of their portfolios, synergising, repurposing drugs ………all ultimately to improve Return On Investment (ROI). It is no secret the sector has had a rough ride with blockbuster drugs becoming increasingly rare, only ≈10% of drug candidates in phase I reaching approval in the years 2006-2015 (1) , increased payer pressure to cut prices, company revenues taking a hit as patent cliffs pass by and a dearth of innovative medicines being brought to the market.

How long will companies be able to sustain the significant cost of R&D whilst still turning enough of a profit to satisfy shareholders? The morality of industrial drug discovery (DD), long questioned in any case by those outside the industry, will be under serious scrutiny – not hard to see why with companies like Gilead charging upwards of $80,000 for a full course of the hepatitis C drug Sovaldi (2).  Take Ebola for instance.  Prior to the 2014 outbreak in West Africa (3) Ebola may have been viewed as an African problem and perhaps not an attractive area for investment.  After the outbreak and associated hysteria in certain corners of Western society all of a sudden this had the potential to be a little more than just an African problem.  Ebola was discovered in 1976 yet a study by the University of Edinburgh in 2015 estimated around half of all funding for Ebola research occurred in the 2014-2015 period after the outbreak.

Even the philanthropy of the Wellcome Trust is driven by ROI which unlike VC funding, is perhaps based more on intellectual than financial value, but here too we see a move away from traditional DD as demonstrated by the demise of the Seeding Drug Discovery Initiative. Granted this change in focus may be designed to generate new targets or technologies but the sentiment is clear; as traditional DD becomes more difficult with target patient populations potentially dwindling as a result of increased personalised/specialised therapies or peripheral areas of unmet need, where is the motivation for investment?

We are in desperate need of new anti-bacterials, as the population becomes older the prevalence of dementia is on the rise and alongside the ever present spectre of cancer all represent substantial investment if we are to have a meaningful impact in the development of effective treatments for these indications. It is clear that with the patient heterogeneity and the variable aetiology of these conditions that the model of drug discovery to date needs a significant change in prosecution.  In an effort to speed up the DD process we have seen a recent spate of AI-Large pharma collaborations (4/5/6) but are these tools merely ways to speed up old methods or will they genuinely result in the generating novel targets which might otherwise remain undiscovered by conventional means of investigation?

Regression analysis such as QSAR has long been used within DD to correlate physiochemical and functional parameters to guide chemical synthesis programs. Correlations between various ‘attributes’ (derived from principal component analysis) are used to generate a model which can be used to predict the behaviour of new compounds.  Of course the model is only as good as the number of data points and parameters used to define it and, once defined, remains unchanged.  The essential difference is that AI (specifically Deep learning) generates self-adapting models using a multi layered network approach that wasn’t really possible before the development of GPUs which allowed the parallel processing of vast amounts of data.  The data is assessed according to any one of its ‘attributes’ in the ‘top layer’ before being passed to the ‘second’ layer and processed according to another attribute etc. and as this is an unguided approach the systems needs a sufficient volume of data and many iterations to generate a reliable model of correlation.  After each iteration the algorithms used to generate the correlations can then be altered as the network ‘learns’ from the previous iteration.  Like any other computer model generating system though, it is liable to the ‘garbage in, garbage out’ (GIGO) concept.

It is easy to see how AI can be and is effective in, for instance, developing candidate compound libraries generated from well-characterised protein and protein/ligand crystal structures and suggest routes of synthesis (7) but the question of how AI can truly revolutionise an ailing industry is a long way off being answered. The regression analysis used in AI is the same as that used in QSAR for years, essentially it’s just the volume of data and the learning aspects that are different.  The hope, however, is that AI will generate unique correlations that have thus far eluded us or only revealed themselves serendipitously.  Pfizer’s hopes to quickly analyse and test hypotheses from “massive volumes of disparate data sources” (including more than 30 million sources of laboratory and data reports as well as medical literature)(8) seems, to the untrained eye (mine!) to be fraught with danger regarding curation of the input data.  Even in the simplest instance of a standard compound IC50, how would un-curated inter-institution variations affect a blind, self-determining analysis?  Perhaps, conversely to the GIGO scenario, considering the volume and disparate nature of data used (literature, experimental, predicted) and correct application of principal component analysis in a given enquiry, AI may actually be resilient to these small variations.

With regard to mental health we only have to look at our efforts to provide an objective definition of a subjective experience in the reconceptualization/re-categorisation/inclusion/elimination of mental disorders in subsequent editions of the ‘Diagnostic and Statistical Manual of Mental Disorders’ (9) to know that our understanding of these disorders is in a constant state of refinement.  AI assessment of a potentially novel pathway/target based on the prevailing definitions of a given condition superposed on the inherently variable subjective clinical data would, it seems, yield different answers from one year to the next.

AI has been hampered not necessarily by the development of algorithms but by the availability of sufficiently broad, curated training data sets and the development of both GPUs and adequate storage (10). With the advent of ‘-omics’ technologies able to acquire vast amounts of data, only relatively recently have the means been developed by which we can effectively interrogate this huge repository of information.  It would seem then that a standardised curation of this data is of primary importance if the industry is going to rely heavily on AI to effectively generate new medicines……notwithstanding the importance of generating clinically verified biomarkers in parallel…..but that’s for another blog!!

We only have to look at GenBank as an example. From its inception in 1982 it took until 2003 for the first release of the curated RefSeq collection.  I remember trying to identify novel splice variants in the late 90’s only to be frustrated by poorly annotated and simply incorrect sequences.  Contemporary parallels can be drawn with the Protein Data Bank (PDB) especially in relation to a) the Structural genomics program where un-curated, non-peer reviewed, homology based structures are being submitted to the database (11), and b) inaccurate Protein-Ligand co-crystal structures (12).

It is clear that AI can/will be/is a benefit during every step of drug discovery and that algorithm refinement is an ongoing, iterative process but what is not currently clear is whether AI will deliver where, for instance, HTS failed in dramatically impacting on the inefficiencies of the DD process (13). I have no doubt that very soon AI will become a fundamental part of all aspects of health care and drug discovery but I wonder whether this will actually precede the demise in scale of small molecule drug design and highlight the need to pursue other avenues (e.g. Gene therapy/Biologics) more vigorously.   In any case as the complexity of both the diseases/unmet need and the required solutions increase it will be interesting to see how ROI will be maintained and how much more Big Pharma consolidation we will see over the coming years.

Blog written by Marcus Hanley


  3. Fitchett et al. (2016) Ebola research funding: a systematic analysis 1997–2015, Journal of Global Health. Available from:
  9. The DSM-5: Classification and criteria changes. World Psychiatry. 2013 Jun; 12(2): 92–98. Published online 2013 Jun 4.
  11. Domagalski et al. (2014) The Quality and Validation of Structures from Structural Genomics MinorMethods Mol Biol. 2014 ; 1091: 297–314
  12. Reynolds, Charles H (2014) Protein-ligand cocrystal structures: we can do better. ACS medicinal chemistry letters, 10 July 2014, Vol.5(7), pp.727-9