Centre for Scientific Computing and Complex Systems Modelling (Sci-Sym), School of Computing, Dublin City University, Ireland.
Abnormal DNA-methylation is well known to play an important role in cancer onset and development, and colon cancer is no exception to this rule. Recent years have seen the increased use of large-scale technologies, (such as methylation microarray assays or specific sequencing of methylated DNA), to determine whole genome profiles of CpG island methylation in tissue samples. Comprehensive study of methylation array data from transcriptome high-throughput platforms permits determination of gene methylation markers, important for cancer profiling. Here, three large-scale methylation datasets for colon cancer have been compared to determine locus-specific methylation agreement. These data are from the GEO database, where colon cancer and apparently healthy adjacent tissues are represented by sample sizes 125 and 29 respectively in the first dataset, 24 of each in the second and 118 of each in the third. Several data analysis techniques have been employed, including Clustering, Discriminant Principal Component Analysis, Discriminant Analysis and ROC curves, in order (i) to obtain a better insight on the locus-specific concomitant methylation structures for these diverse data and (ii) to determine a robust potential marker set for indicative screening, drawn from all data taken together. The extent of the agreement between the analysed datasets is reported. Further, potential screening methylation markers, for which methylation profiles are consistent across tissue samples and several datasets, are highlighted and discussed.
Keywords: colon cancer, epigenetic events, promoter hypermethylation, mutation, multivariate data analysis, clustering, principal component analysis, discriminant analysis, biomarkers, screening.