Genome-wide study of variations in Plasmodium falciparum and their association with different malaria interventions in Tanzania
Zurich Seminars in Bioinformatics - Catherine Bakari Mvaa (Christian Nsanzabana group @ TPH)
- 12:15 UZH Irchel Y55-l-06/08 and ZOOM Call
Abstract Malaria affects millions of people globally, and it remains one of the major public health problems occurring in most parts of Tanzania, causing thousands of deaths each year, especially among vulnerable persons in rural settings with poor health systems. Even though the burden has decreased due to the deployment of different interventions in recent years, all those gains are threatened by biological threats that largely affect the success of different control strategies. The parasite uses different mechanisms to escape diagnostic tools and drug treatment, challenging one of the key control strategies, based on prompt detection and case management. Sequencing technologies, especially Next Generation Sequencing (NGS) have played a major role in understanding parasite genomic variation including drug resistance, diagnostic resistance, and population structure. However, exploring more advanced technology such as Third generation/long-read sequencing technologies might be worthy in terms of information that can be generated. Both whole genome sequencing and long reads could provide information on the complex genomic region of the parasite compared to short-read sequencing technologies.
We propose to develop a whole genome long-read-based assay with Oxford Nanopore (Minion) to explore variations including structural variation. We hypothesize that more information on drug resistance gene, hrp2/3 gene, and parasite diversity will be obtained from this assay and will add value on the usefulness of the MMS on informing the performance of interventions to the NMCPs. We will develop an assay-based Nanopore technology, as it is relatively cheap and easy to implement in malaria-endemic countries. We also aim to develop a bioinformatics pipeline that will be useful for calling and visualization variants obtained from long-read sequencing data, the pipeline will be useful for all countries implementing MMS. We aim to implement the developed laboratory assay and bioinformatic pipeline on large-scale samples collected through the MSMT project in different regions with varying transmission intensity in Tanzania mainland. Finally, will explore different information that can be mined from short and long reads platforms with a subset of samples with sequenced data generated form MIPs and Illumina, whole genome and Illumina and whole genome and Nanopore. With this dataset will explored on sequence coverage, quality, ability to detect complexity of infection and profiling drug resistance our aim is to assess how these technologies can complement each other in terms information generated and their applicability in MMS. We will develop a long-read assay for variation detection using culture strains with known drug resistance, hrp2/3, and diversity profile; we will explore different methods of DNA extraction, DNA enrichment, and selective Whole Genome Sequence (sWGS), and at each point will assess DNA yield and sequence coverage will also be assessed. A bioinformatic pipeline will be developed for data analysis and making the use of long-read sequence data, this will be done by exploring/evaluating different tools/software and the best-performing ones will be used to develop a full pipeline. The laboratory assay and the bioinformatic pipeline will be implemented in samples collected through the MSMT in 13 regions of Tanzania mainland with different transmission intensities, at the end, short and long reads data will be compared using established and developed bioinformatics pipelines respectively. We will suggest the ideal laboratory assay for sufficient DNA yield required for sequencing and ideally sequencing method to be used in MMS for timely feedback to the NMCP.
Novel mutations might be discovered in drug resistance genes such as pfmdr1 and plasmepsin 2/3 and in the pfhrp2/3 gene with the use of developed long read assay and will map their prevalence and distribution in the country. We will come up with a bioinformatics pipeline that will simplify the data analysis and provide quick feedback to the NMCP.