Progenetix data acquisition & processing pipeline

ZH Seminars Bioinformatics - Qingyao Huang

12:15 (ZOOM Call)

Abstract Over the years we maintain and update the Progenetix database, which provides an overview of copy number abnormalities in human cancer from currently 113322 array and chromosomal Comparative Genomic Hybridization (CGH) experiments, as well as Whole Genome or Whole Exome Sequencing (WGS, WES) studies. The cancer profile data in Progenetix was curated from 1600 articles and represents 420 and 542 different cancer types, according to the International classification of Diseases in Oncology (ICD-O) and NCIt "neoplasm" classification, respectively. In this seminar, I will give an overview of our current pipeline from data retrieval, metadata extraction/curation, data processing and evaluation to the inclusion/exlusion in the database. The pipeline utilzes widely used packages for raw (fluoresnce intensity) file extraction, calibration, segmentation, as well as our in-house methods/tools for post-processing quality control. Also, I will present our current effort on the pruning of the NCIt classification hierarchy tree to arrive at a biologically and functionally relevant level of data summarization, benchmarking and visualization.