Abstract The use of diagnostic gene sequencing in the clinical setting is increasingly common. Established sequencing panels targeting genes involved in cancer predisposition, tumor growth, and chemoresistance are widely available, and numerous efforts to validate whole-exome and whole-genome clinical sequencing assays are underway. As a result, genome analysts and molecular pathologists are increasingly tasked to the labor-intensive process of interpreting the clinical significance of tumor variants. Building these clinical reports entails reviewing the literature, population frequency databases, and predicted functional impact of each observed variant. The manual effort involved in this process is a major bottleneck in generating reports from tumor sequencing data.
To assist genome analysts in this process, numerous independent curated knowledgebases have been constructed. These knowledgebases capture evidence describing associations between molecular biomarkers and their role in diagnosis, prognosis, or therapeutic response prediction. Despite their shared purpose, these knowledgebases lack shared standards for the generation, structure, and retrieval of their content. As a result, the analyst is left with a difficult tradeoff: for each knowledgebase left unused there is increased potential for missed findings of clinical significance, but for each knowledgebase used the analyst must understand the nuances particular to that resource and evaluate the evidence accordingly when generating the clinical report.
To survey the scope and nature of this tradeoff, we created the Variant Interpretation for Cancer Consortium (VICC; cancervariants.org, an international consortium of established clinical interpretation experts dedicated to alleviating the bottleneck in clinical report generation. We harmonized clinical biomarker associations across six established knowledgebases (including CIViC, OncoKB, Jax-CKB, and others) into the VICC meta-knowledgebase, and evaluated the harmonized associations to determine the extent to which content overlapped between them. This talk describes the results of that study, illustrating how biomedical literature generation has overcome our collective ability to capture this knowledge, and the corresponding challenges of standardizing disparate efforts to drive clinical report generation. We demonstrate our preliminary work to address these challenges in concert with ClinGen and the Global Alliance for Genomics and Health (GA4GH), and present a roadmap for advancing our ability to search and apply biomedical knowledge in the clinical setting.