Only a small proportion (<2%) of the total genome codes for proteins and the remainder had up to now been termed non-coding or ‘junk DNA’. The aim of the ENCODE (Encyclopaedia of DNA elements) project was to attempt to characterize these undefined regions. The consortium has recently published 30 papers detailing, amongst much data, regions of transcription and regulatory areas that were previously unreported.
One of these papers by Maurano et al., used a technique to map sites of regulatory elements within the DNA and compare these with noncoding variant polymorphisms associated with common diseases that have been identified through genome-wide association studies (GWAS).
The group examined many different cell types including primary cells, immortalized, malignancy derived or pluripotent cell lines, hematopoietic cells, progenitor cells as well as some fetal tissue samples. They used Deoxyribonuclease 1 (DNase1) hypersensitive sites (DHSs) of increased chromatin accessibility as a marker for binding sites of regulatory elements such as transcription factors and thus mapped the regulatory regions in this material. In total, they identified DHS positions spanning 42.2% of the genome, a higher density of regulatory regions than previously appreciated. They then examined the position of single nucleotide polymorphisms (SNPs) identified by GWAS and found a 40% enrichment of these SNPs in DHSs. This analysis shows that the common genetic variants associated with disease are often located at recognition sequences of transcription factors. The authors also demonstrated that these regulatory regions may control the expression of genes that are distant (>250kb) rather than solely the expression of the nearest gene.
Further interesting data from the consortium was obtained through the study of cancer lines. Over 40 cancer lines of different origin were examined and data obtained showing that cancer lines possess regulatory DNA regions that are not present in normal cells (Stamatoyannopoulous, J. A., 2012).
The new information provided by ENCODE is not yet readily applicable to drug discovery, however, this data could provide a map of transcriptional and regulatory regions that could help to identify novel therapeutic targets. In a recent article in Nature Drug Discovery, Michael Snyder one of the principal investigators of the ENCODE consortium explains that changes in gene expression through a change in regulatory sequence could enable identification of proteins that could make useful drug targets.
Applications that could be useful in drug discovery settings include the use of knockdown technologies to screen for biological effects, or zinc finger nuclease technology that can introduce mutations to regulatory elements to determine if changes in these regulatory regions are causal of disease.
Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, Shafer A, Neri F, Lee K, Kutyavin T, Stehling-Sun S, Johnson AK, Canfield TK, Giste E, Diegel M, Bates D, Hansen RS, Neph S, Sabo PJ, Heimfeld S, Raubitschek A, Ziegler S, Cotsapas C, Sotoodehnia N, Glass I, Sunyaev SR, Kaul R, Stamatoyannopoulos JA.
Science. 2012 Sep 7;337(6099):1190-5. doi: 10.1126/science.1222794. Epub 2012 Sep 5.
What does our genome encode?, Stamatoyannopoulous, J. A. 2012, Genome Research 22: 1602-1611
An audience with Michael Snyder, Nature reviews Drug Discovery Oct 2012. 11: 744
The ENCODE papers are available online at go.nature.com/iN6Ezx.