UZH BIO390 - Learning Goals

This page indicates some of the learning goals, as emphasised by the different lecturers. Some points will have been discussed in different lectures; accordingly, exam questions may not refer to information of one specific presentation.

Learning goals have a general scope

Please be aware that some of the "Learning Goals" may reflect aspects not necessarily captured by the lectures in the current semester - The ones reelevant for the current semester's exam are related to the given lectures. Also, updates may occurr at any time.

Consider individual pages

Please consider individual course pages and documents linked from there. Those might be updated later on, so it is a good idea to check back before the end of the course.

Bioinformatics: Definition & Concepts

definition of "Bioinformatics" (cf. Anna Tramontano)
categories of informatics tools used in bioinformatics
hypothesis versus data driven science
areas of bioinformatics/bioinformaticians, (in contrast to "pure" modelling, statistics etc.)
3 main categories of biological data, and example resources
definition of API
common sequence related file formats
hierarchies and relationships as 2 main principles of ontologies
areas of "not-bioinformatics", and why
bioinformatics tools (programming languages, libraries, online resources) and their specific use cases

Sequence Analysis

basics of DNA and Protein sequences
substitution matrices
BLAST
- parameters
- terms
- scores

Statistical Bioinformatics & Machine Learning

usage of gene expression profiling
multiple testing correction
parameters for hierarchical clustering
statistical evidence for a change in the mean
dimensionality reduction
central limit theorem
hierarchical clustering
clustering coefficient
unsupervised machine learning tasks/components
ML model types

Bioinformatics tools & resources

common programming/analysis languages/environments in bioinformatics and their preferred use
- e.g. R, Perl, Python, JavaScript ... but also environments & packages like Mkdocs, ReadTheDocs, Bioconductor ...
components of bioinformatics online resources
- Databases, middleware, APIs, frontends ...
database types / concepts
- SQL vs. document databases
data curation (biocuration)
- importance of classifications, ontologies
- some ontologies and their use (NCIt, UBERON, DO)
- CURIEs as identifiers (see below)
- Null Island
- ISO 8601 for dates and times (and why / why not?)

Some Q & A (thanks to the providers of these questions)

Progenetix use case: In comparative genomic hybridization, in the case of an high copy number segment of DNA, more tumor DNA will hybridize to the metaphase chromosomes just because of higher likelihood?
- in essence, yes; it is a mix of higherlikelihood and therefore higher binding probability (it is actually hard for a given fragment to encounter the right place on the chromosome or array) w/ or w/o competition effects (latter when using normal reference DNA)
CURIEs: ...hierarchical coding systems where individual codes are represented as CURIEs - aren't they a type of URI rather then a "code"?
- CURIEs are universal identifiers (URIs) consisting of a public and a local part
- they are universal (like UUIDs) but unlike UUIDs (which can be anonymous) they are resolvable (i.e. the public part can be resolved to a URL where then the local part can be used to retrieve the resource)
- the "hierarchical coding systems" usually don't use the CURIE internally but only the private part; but using the complete CURIE externally makes it unambiguous
Progenetix use case: In Progenetix one can either use the GA4GH-Beacon API to query (i.e., do information retrieval) OR use the pgxRpi API to load the data into an analysis environment (i.e., for eventual knowledge extraction)?
- Not really. The bycon software stack implements database access / middleware / Beacon API instance. This (outward facing) API (compatible to the Beacon specification) can be accessed by various clients for data retrieval (e.g. the beaconplus-web or progenetix-web JavaScript front ends, manual http requests, Beacon aggregator services...).
- The pgxRpi itself is an API for the R environment, i.e. another client accessing the Beacon API. So here one gets (DATA - Progenetix' Beacon API - pgxRpi API - R analysis environment.

Regulatory Genomics and Epigenomics

secondary/tertiary human genome structure
functional genome content
transcription factors & genome interaction
chemical genome modifications, their effectors and results
Chip-Seq
read mapping
peak calling
sequence compression algorithms

Metagenomics

concept of taxonomic diversity
concept microbial community dissimilarity
how are sequences used to derive an adopted species concept for prokaryotes
principle steps for 16S rRNA-based taxonomic composition analysis
essential steps of short sequencing read assembly into contigs and scaffolds
basic steps of metagenomic analysis: from raw reads to the reconstruction of genomic scaffolds

Proteomics

principles of proteome organization in the cell
key experimental and computational concepts for the collection and analysis of high confidence protein-protein interaction data
peptide fragmentation
target-decoy approach
protein quantification
MassSpec similarity queries

Biological Networks

Protein interaction and metabolic networks
databases and online resources for different types of pathway and interaction data
detection of protein complexes
graphs, nodes, edges, paths
geodesics, graph diameter
common types of degree distributions
adjacency matrix
shortest path matrix
assortative and disassortative graphs
community (module) detection
cliques
motifs, graph representations of metabolic networks

Text Mining

text mining pipelines & (current) common programs/applications
article/literature repositories (with focus on accessibility)
processing steps in text mining
- stemming etc.
common problems in text mining
search engine precision metrics
benchmarking

Semantic web, RDF, Ontologies

semantic web
- elements
- benefits
components of ontologies
stack of standards in semantic web and their functions
RDF for modeling data
OWL/OBO for modeling a biomedical domain
querying knowledge graphs for answering biomedical questions

Clinical Bioinformatics & Personalized Medicine

genomic variants (types, numbers)
reference genome(s)
main bottlenecks of molecular diagnostics in the clinical setting
goals of many personalized health initiatives
currently favoured clinical NGS technology
clinical trial participation

Genomic data & privacy

reasons for needing many genomes
genomic privacy and re-identification (concepts)
principle of re-identification attacks over the Beacon protocol
long range familial searches
opinions about risk vs. opportunities
direct-to-consumer genetic testing -> what, how
technical and regulatory solutions against privacy breaches & data abuse
genotype-phenotype (G2P) (ab-)use