The pros and cons of gene expression analysis by microarrays
Article Outline
One of the hallmarks in the currently evolving genomics revolution is the development of DNA chips or microarrays. These glass slides or filters consist of a library of genes immobilized in a grid. Each individual spot in the grid contains DNA from a single gene that will bind to the messenger RNA produced by the same gene. By isolating the pool of mRNA from a tissue or specific cell type it is theoretically possible to visualize the entire transcriptome in a single procedure. In principle this allows elucidation of many important research questions. From identification of phenotype/genotype relationships in genetic diseases to predictive toxicology (see [1], [2], [3], [4], [5] for recent reviews). The potential of the approach seems endless not only in medicine but in many biological disciplines. The first 5 months of this year more than 300 papers have appeared in which array technology has been applied. These studies yielded interesting results but not yet the discoveries they were expected to generate. This is due to the fact that many problems inherent to the methodology still have to be solved. The study by Yano et al. [6] in this issue is a nice example of both strength and weakness of the methodology in its current state. In this paper the authors analyzed gene expression in samples of liver tissue obtained during partial hepatectomy. Filter microarrays derived from Research Genetics (Huntsville, USA) were used which contained spotted cDNAs derived from 4043 genes, which as we now know comprises 10–15% of the human genome. Since the apparent completion of the human genome project it should be possible to represent the complete genome on arrays and this we will probably see in the near future. The advantage of the filter arrays used in the study of Yano et al. is the relatively low cost and the fact that the filters can be stripped and reused up to five times according to the manufacturer. A major disadvantage is the control for background hybridization. It is well known that background hybridization varies with the composition of cDNA so ideally one needs a suitable control for every cDNA spotted on the array. The filter arrays currently on the market do not provide such controls. In addition the arrays may contain errors because of cross contamination in the clones used for the spotting procedure [7]. The arrays made by synthesizing oligonucleotides on a glass matrix following the methodology developed by Affymetrix (www.affymetrix.com) could be better in this respect. On Affymetrix arrays each gene is presented by 20 oligonucleotides consisting of 25 bases. For each of these 25mers a control is synthesized on the array consisting of a 25mer containing one mismatch. In principle this allows rigid control of non-specific hybridization. Recently, Rosetta Inpharmatics [8] reported on an alternative approach employing a modified Ink-jet printer to syhthesize oligonucleotides on a matrix. Such an approach might be very versatile but is yet too novel to judge its performance. To my knowledge an independent comparison of specificity and sensitivity of the different types of array systems on the market has not been made. Clearly such a comparison is warranted. However, it may still be a little bit to early for such a consumer's test because the design (see [9] for review) and particularly the amount of genes spotted is changing almost every day. A problem common to all array designs is the sensitivity. All array systems currently on the market show a considerable procedure related variation in results [1]. In general this kind of experimental error is largest at low signal to noise ratio's which can make it difficult to assess whether or not a certain gene is expressed [10]. Lee et al. [11] recently reported a possible solution to this problem. They studied the performance of a custom made filter array and taking into account all potential sources of experimental error, developed a statistical approach to calculate the probability of expression of the genes on the array. They confirmed the considerable variation in extent of hybridization necessitating replication of the procedure at least three times. The advantage of this procedure is that cDNA specific subtraction of background hybridization is not necessary. By assuming that a significant number of genes is not expressed the noise in the measurement of these genes compared to the noise in the measurement of the genes that are expressed allows accurate estimation of the chance that a certain gene is expressed. The success of the method strongly depends on the number of replications which in most studies reported to date is the weak point. In the study of Yano et al. duplicate analysis is performed. According to the authors reproducibility was good but they did not provide detailed analysis of this, in particular related to expression levels. Yano et al. stress the need for quality control of the cDNA samples obtained. This seems particularly relevant for tissue samples obtained during surgery. By definition the samples are derived from patients with varying histories of disease. Yano et al. used liver samples from four patients with colorectal liver metastases and one with cholangiocarcinoma. Of course, samples have been taken from non-afflicted regions in the liver yet the disease most probably will also affect gene expression in these parts. For instance the patient with cholangiocarcinoma was jaundiced and infected. This alone will induce considerable variation in gene expression. Another problem is the heterogeneity in the samples. Zonation in the liver affects gene expression, the liver contains different cell types and when not flushed blood derived cells will also be present. Despite these shortcomings the study of Yano et al. yielded interesting results. A striking difference in expressed genes between the different samples was observed. According to the authors this may have been partly due to variation in the quality of the mRNA isolated from the different patients. Yano et al emphasize the need for checking whether the preparations are free of genomic DNA. Particularly when the aim is to resolve the transcriptome this is of course an essential control. A total of 2418 genes was found to be expressed in five livers. About half of these gene transcripts was found in four of the five livers. Two samples showed evidence for activation of an acute phase response and coordinate expression of genes involved in this response was found. Relative expression of two housekeeping genes gyceraldehyde-3-phosphate dehydrogenase and cytochrome c oxidase subunit Vic varied considerably precluding normalization of the expression patterns. Cluster analysis [12] was performed in an attempt to elucidate coordinate regulation of particular genes. Such analysis is often used to usher out coupling of gene expression. In the present study application of this method nicely demonstrated clustered regulation of genes involved in the acute-phase response. Again also here the noise in the measurements determines the success of the analysis.
Development of bioinformatical tools to mine the complex data output of microarrays is still in its infancy. Methods have been transplanted from other fields of science but the application to analysis of array results needs refinement. Application of the right type of statistical treatment is perhaps the first problem which has to be solved. When analyzing the transcriptome rigorous replication seems necessary and adequate methods to normalize the results have to be developed. This will allow construction of ‘gene expression fingerprints’ for disease states which could be a very powerful diagnostic tool. In most studies microarrays are used to compare gene expression levels in two or more controlled conditions. In general, a factor of two change in expression is considered to be adequate to decide whether a particular gene is up- or downregulated. Clearly use of such an arbitrary factor is not warranted and may lead to serious mistakes. The error in expression on arrays varies with the extent of expression and error maybe non-random [10]. In other studies t-tests are used. This kind of statistics, relies on large amounts of independent samples and on repeatability. Error due to multiple testing has to be corrected in such analysis, For example in an experiment with three replications and two treatments, performing 1000 Student t-tests each with 2 degrees of freedom (even if the data were truly independent), 50 significant differences would occur when in reality no difference would exist at all. Particularly, application of Bayesian statistics maybe the solution to the problem [13]. This kind of statistics allows inclusion of prior information on metabolic structure which increases the sensitivity of the analysis.
The last sentence of the paper by Yano et al. reads “Our findings are only the first step toward total characterization of the human liver transcriptome that will provide comprehensive information integral to elucidating the pathophysiology of hepatic diseases”. The rapid developments in both microarray hard- and software ensure that we do not have to wait long for the next steps to come.
References
- . DNA microarrays: raising the profile. Curr Opin Biotechnol. 2001;12:48–52
- . Examining the living genome in health and disease with DNA microarrays. J Am Med Assoc. 2000;283:2298–2299
- . Comparing functional genomic datasets: lessons from DNA microarray analyses of host-pathogen interactions. Curr Opin Microbiol. 2001;4:95–101
- . Applications of biochip and microarray systems in pharmacogenomics. Pharmacogenomics. 2000;1:289–307
- . Discovering patterns in microarray data. Mol Diagn. 2000;5:349–357
- Profiling the adult human liver transcriptome: analysis by cDNA array hybrididzation. J Hepatol. 2001;35:178–186
- . When the chips are down. Nature. 2001;410:860–861
- Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol. 2001;19:342–347
- . New developments in microarray technology. Curr Opin Biotechnol. 2001;12:41–47
- . On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data. J Comput Biol. 2001;8:37–52
- . Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci U S A. 2000;97:9834–9839
- . Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868
- . Gene epression profiling in Escherichia coli K12: Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. J Biol Chem. 2001;276:19937–19944
PII: S0168-8278(01)00156-8
© 2001 European Association for the Study of the Liver. Published by Elsevier Inc. All rights reserved.
