Skip to content

BIO390 - Introduction to Bioinformatics

Summary

The handling and analysis of biological data using computational methods has become an essential part in most areas of biology. In this lecture, students will be introduced to the use of bioinformatics tools and methods in different topics, such as molecular resources and databases, standards and ontologies, sequence and high performance genome analysis, biological networks, molecular dynamics, proteomics, evolutionary biology and gene regulation. Additionally, the use of low level tools (e.g. Programming and scripting languages) and specialized applications will be demonstrated. Another topic will be the visualization of quantitative and qualitative biological data and analysis results.

Practical Information

Requirements

The introduction to Bioinformatics is a series of lectures aimed at students w/ a medium to advanced undergrate level in Life Sciences. Participants are expected to be knowledgeable in the basic concepts of molecular biology and genetics, but also to have some basic understanding in statistics and concepts of programming, if not practical experience (i.e. have attended introductory courses, done some data analyses in R or Python etc.). Experience with common platforms used for shared code/document management (e.g. Gitlab/Github...) is helpful but not strictly required.

Schedule & Notes

  • Autumn semesters
  • 1 x 2h / week
  • Tue 08:00-09:45
  • UZH Irchel campus, Y-03G-85
  • OLAT - but not much there...
  • No lecture recordings - we do not record the lectures since HS23 (regular attendance is expected) but there might be still 2022 lecture recordings available
  • Course language is English

Syllabus

Some learning goals should provide you with additional guidance - but please be aware that those may include details which not have been covered in the current semester; still good to know but not necessarily relevant for the exam.

Next: Building a Biological Information Resources

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Qingyao Huang

This lecture will use introduce bioinformatics methods, principles and tools for building and maintaining information resources in life sciences, with particular emphasis on 'omics data types.

Continue reading

Upcoming: Clinical Bioinformatics

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Valerie Barbie (Director SIB Clinical Bioinformatics)

Medical practice is undergoing a revolution around personalized health: this major change is driven by the continuous development of cost-effective high-throughput technologies that produce gigantic quantities of data in numerous areas, from imaging to genomics, and of the corresponding tools required to process these data.

Continue reading

Upcoming: Genomic Data Risks & Opportunities

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Michael Baudis

The understanding of the impact of inherited and somatic genome variants on phenotypes and diseases requires a thorough understanding of such variants amongst populations in general and carriers of the phenotypes and diseases in particular. Such information can only be provided through the inclusion of data from a multitude of genome resources in variant evaluation efforts, including such from outside (international) jurisdictions. However, opening such resources carries the inherent risk of breaching privacy, particularly through re-identification of individuals or their relatives and potentially through the exposure of individual genome-related personal information including phenotypic and "performance" prediction and relative disease risk.

Continue reading

Upcoming: BIO390 Exam

BIO390 UZH HS24 - Introduction to Bioinformatics
08:15-09:45 @ Y-03G-85 and Y-03G-91 UZH Irchel

The exam will be on the last day of the course on site:

  • time: 08:15-09:45
  • multiple (single + multiple) choice w/ one or two open questions
  • no material, phones etc.
  • student ID for entrance
  • please refer to the learning goals for guidance
    • ¡topics may be edited throughout the course!
    • these just provide some non-exclusive guidance
Continue reading

Upcoming: BIO390 Repeat Exam

BIO390 UZH HS24 - Introduction to Bioinformatics

The repeat exam has been tentatively planned for the week of January 20-24, 2025:

  • Exact date TBD; time: 09:15-10:45
  • Planned room: Y13-L-11/13
  • multiple (single + multiple) choice w/ one or two open questions
  • no material, phones etc.
  • student ID for entrance
  • please refer to the learning goals for guidance
    • ¡topics may be edited throughout the course!
    • these just provide some non-exclusive guidance
Continue reading

Components of the Semantic web

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Ahmad Aghaebrahimian (ZHAW)

Biomedical science is rich in structured and unstructured textual data including but not limited to hundreds of ontologies as well as millions of scientific publications. The semantic web and its stack of standards provide an efficient way for organizing knowledge extracted from such a huge volume of data. Modeling data in knowledge graphs makes complex question answering and reasoning over abundance of information manageable and feasible. In this session, we will find out how.

Continue reading

Biological Networks

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Andreas Wagner

This part of the course BIO390 (Introduction to Bioinformatics) will review examples of biological networks their basic properties.

Learning goals for exam preparation 2024

After this lecture you should be able to

Continue reading

Text Mining and Search Strategies

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Patrick Ruch (HES-SO/HEG Geneva)

Search engines, stemming, NGRAMs ... and much more.

Continue reading

Proteomics

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Katja Baerenfaller, Swiss Institute of Allergy and Asthma Research (SIAF) and University of Zurich

In proteomics one of the important bioinformatics tasks is to generate lists of reliably identified peptides and proteins in mass spectrometry-based experiments. For this, amino acid sequences are assigned to measured tandem mass spectra. The quality of the peptide spectrum assignments are scored and criteria are applied that allow to distinguish the good from the bad hits and to estimate the quality of the dataset.

Continue reading

Metagenomics

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Shinichi Sunagawa (ETHZ)

Abstract:

Microorganisms are numerically dominant on Earth and drive the cycling of energy, elements and matter. Thanks to advances in high-throughput DNA sequencing technologies and computational power, microbial communities can now be studied without the need to cultivate them in a laboratory setting. Essential tasks in studying microbial communities include the identification and quantification of their member taxa and the pair-wise compositional comparison of different microbial communities.

Continue reading

Regulatory Genomics and Epigenomics

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Izaskun Mallona

We will introduce the epigenomics and regulatory genomics fields, including their aims, techniques, and data analysis approaches.

Continue reading

Machine Learning for Biological Use Cases

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Valentina Boeva (ETHZ)

Brief note: In this lecture V. Boeva will cover the standard machine learning methods used in the analysis of biological data: dimensionality reduction, clustering, classification and regression.

Continue reading

Biological Sequence Informatics

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Christian von Mering

The analysis of biological sequences - primarily DNA, RNA and protein sequences - constitutes one of earliest and core areas of bioinformatics. This lecture introduces principles and examples of bioinformatic sequence analyses and inter-sequence comparisons. Continue reading


Statistical Bioinformatics

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Mark Robinson

Today's topic is the use of statistical methods in the analysis of biological datasets, with examples from high-throughput (sequencing and array) technologies and single cell analyses.

Continue reading

What is Bioinformatics? Introduction and Resources

BIO390 UZH HS24 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Michael Baudis

The "What is Bioinformatics? Introduction and Resources" provides a general introduction into the field and a description of the lecture topics, timeline and procedures.

Topics covered in the lecture are e.g.:

  • a term definition for bioinformatics
  • the relation of hypothesis driven and data driven science, with respect to bioinformatics
  • categories of bioinformatics tools and data
  • research areas and topics
  • the varying emphasis on "bio" and "informatics"
  • databases (primary vs. derived) and data curation
  • data collection & curation
  • file Formats, ontologies & APIs ass areas/topics (w/o details)
  • "not-bioinformatics"

... but also an introduction into the cancer genomics and data sharing topics.

Continue reading

BIO390 Repeat Exam

BIO390 UZH HS23 - Introduction to Bioinformatics

The repeat exam will be on January 24, 2024:

  • time: 10:15-11:45
  • Changed room: Y13-L-11/13
  • multiple (single + multiple) choice w/ one or two open questions
  • no material, phones etc.
  • student ID for entrance
  • please refer to the learning goals for guidance
    • ¡topics may be edited throughout the course!
    • these just provide some non-exclusive guidance
Continue reading

BIO390 Exam

BIO390 UZH HS23 - Introduction to Bioinformatics
08:15-09:45 @ Y-03G-85 and Y-03G-91 UZH Irchel

The exam will be on the last day of the course on site:

  • time: 08:15-09:45
  • multiple (single + multiple) choice w/ one or two open questions
  • no material, phones etc.
  • student ID for entrance
  • please refer to the learning goals for guidance
    • ¡topics may be edited throughout the course!
    • these just provide some non-exclusive guidance
Continue reading

Genomic Data Risks & Opportunities

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Michael Baudis

The understanding of the impact of inherited and somatic genome variants on phenotypes and diseases requires a thorough understanding of such variants amongst populations in general and carriers of the phenotypes and diseases in particular. Such information can only be provided through the inclusion of data from a multitude of genome resources in variant evaluation efforts, including such from outside (international) jurisdictions. However, opening such resources carries the inherent risk of breaching privacy, particularly through re-identification of individuals or their relatives and potentially through the exposure of individual genome-related personal information including phenotypic and "performance" prediction and relative disease risk.

Continue reading

Clinical Bioinformatics

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Valerie Barbie (Director SIB Clinical Bioinformatics)

Medical practice is undergoing a revolution around personalized health: this major change is driven by the continuous development of cost-effective high-throughput technologies that produce gigantic quantities of data in numerous areas, from imaging to genomics, and of the corresponding tools required to process these data.

Continue reading

Building a Genomics Resource

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Michael Baudis

In this lecture we will use our Progenetix resource, a website providing information about genomic copy number mutations in cancer - to present the different components needed for generating, storing, representing, visualizing and accessing a specific type of genomic data and associated classifications.

Continue reading

Components of the Semantic web

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Ahmad Aghaebrahimian (ZHAW)

Biomedical science is rich in structured and unstructured textual data including but not limited to hundreds of ontologies as well as millions of scientific publications. The semantic web and its stack of standards provide an efficient way for organizing knowledge extracted from such a huge volume of data. Modeling data in knowledge graphs makes complex question answering and reasoning over abundance of information manageable and feasible. In this session, we will find out how.

Continue reading

Text Mining and Search Strategies

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Patrick Ruch (HES-SO/HEG Geneva)

Search engines, stemming, NGRAMs ... and much more.

Continue reading

Biological Networks

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Pouria Dasmeh

This part of the course BIO390 (Introduction to Bioinformatics) will review examples of biological networks their basic properties.

Continue reading

Proteomics

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Katja Baerenfaller, Swiss Institute of Allergy and Asthma Research (SIAF) and University of Zurich

In proteomics one of the important bioinformatics tasks is to generate lists of reliably identified peptides and proteins in mass spectrometry-based experiments. For this, amino acid sequences are assigned to measured tandem mass spectra. The quality of the peptide spectrum assignments are scored and criteria are applied that allow to distinguish the good from the bad hits and to estimate the quality of the dataset.

Continue reading

Machine Learning for Biological Use Cases

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Valentina Boeva (ETHZ)

Brief note: In this lecture V. Boeva will cover the standard machine learning methods used in the analysis of biological data: dimensionality reduction, clustering, classification and regression.

Continue reading

Regulatory Genomics and Epigenomics

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Izaskun Mallona

Continue reading

Metagenomics

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Shinichi Sunagawa (ETHZ)

Abstract:

Microorganisms are numerically dominant on Earth and drive the cycling of energy, elements and matter. Thanks to advances in high-throughput DNA sequencing technologies and computational power, microbial communities can now be studied without the need to cultivate them in a laboratory setting. Essential tasks in studying microbial communities include the identification and quantification of their member taxa and the pair-wise compositional comparison of different microbial communities.

Continue reading

Statistical Bioinformatics

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Mark Robinson

Today's topic is the use of statistical methods in the analysis of biological datasets, with examples from high-throughput (sequencing and array) technologies and single cell analyses.

Continue reading

What is Bioinformatics? Introduction and Resources

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Michael Baudis

This year happening at the second lecture day, the "What is Bioinformatics? Introduction and Resources" provides a general introduction into the field and a description of the lecture topics, timeline and procedures.

Topics covered in the lecture are e.g.:

  • a term definition for bioinformatics
  • the relation of hypothesis driven and data driven science, with respect to bioinformatics
  • categories of bioinformatics tools and data
  • research areas and topics
  • the varying emphasis on "bio" and "informatics"
  • databases (primary vs. derived) and data curation
  • data collection & curation
  • file Formats, ontologies & APIs ass areas/topics (w/o details)
  • "not-bioinformatics"
Continue reading

Biological Sequence Informatics

BIO390 UZH HS23 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Christian von Mering

The analysis of biological sequences - primarily DNA, RNA and protein sequences - constitutes one of earliest and core areas of bioinformatics. This lecture introduces principles and examples of bioinformatic sequence analyses and inter-sequence comparisons. Continue reading


BIO390 Repeat Exam

BIO390 UZH HS22 - Introduction to Bioinformatics

The repeat exam will be on January 24, 2023:

  • time: 10:15-11:45
  • Y03-G-85 (normal lecture hall, unless noted of change)
  • multiple (single + multiple) choice w/ one or two open questions
  • no material, phones etc.
  • student ID for entrance
  • please refer to the learning goals for guidance
    • ¡topics may be edited throughout the course!
    • these just provide some non-exclusive guidance
Continue reading

BIO390 Exam

BIO390 UZH HS22 - Introduction to Bioinformatics
08:15-09:45 @ UZH Irchel Y03-G-85

The exam will be on the last day of the course on site:

  • time: 08:15-09:45
  • ¡¡¡ NEW: Room change to Y15-G-20 !!!
  • multiple (single + multiple) choice w/ one or two open questions
  • no material, phones etc.
  • student ID for entrance
  • please refer to the learning goals for guidance
    • ¡topics may be edited throughout the course!
    • these just provide some non-exclusive guidance
Continue reading

Genomic Data Risks & Opportunities

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Michael Baudis

The understanding of the impact of inherited and somatic genome variants on phenotypes and diseases requires a thorough understanding of such variants amongst populations in general and carriers of the phenotypes and diseases in particular. Such information can only be provided through the inclusion of data from a multitude of genome resources in variant evaluation efforts, including such from outside (international) jurisdictions. However, opening such resources carries the inherent risk of breaching privacy, particularly through re-identification of individuals or their relatives and potentially through the exposure of individual genome-related personal information including phenotypic and "performance" prediction and relative disease risk.

Continue reading

Clinical Bioinformatics

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Valerie Barbie (Director SIB Clinical Bioinformatics)

Medical practice is undergoing a revolution around personalized health: this major change is driven by the continuous development of cost-effective high-throughput technologies that produce gigantic quantities of data in numerous areas, from imaging to genomics, and of the corresponding tools required to process these data.

Continue reading

Building a Genomics Resource

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Michael Baudis

In this lecture we will use our Progenetix resource, a website providing information about genomic copy number mutations in cancer - to present the different components needed for generating, storing, representing, visualizing and accessing a specific type of genomic data and associated classifications.

Continue reading

Components of the Semantic web

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Ahmad Aghaebrahimian (ZHAW)

Biomedical science is rich in structured and unstructured textual data including but not limited to hundreds of ontologies as well as millions of scientific publications. Semantic web and its stack of standards provide an efficient way for organizing knowledge extracted from such huge volume of data. Modeling data in knowledge graphs makes complex question answering and reasoning over abundance of information manageable and feasible. In this session we will find out how.

Continue reading

Text Mining and Search Strategies

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Patrick Ruch (HES-SO/HEG Geneva)

Search engines, stemming, NGRAMs ... and much more.

Continue reading

Biological Networks

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Pouria Dasmeh

This part of the course BIO390 (Introduction to Bioinformatics) will review examples of biological networks their basic properties.

Continue reading

Proteomics

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Katja Baerenfaller, Swiss Institute of Allergy and Asthma Research (SIAF) and University of Zurich

In proteomics one of the important bioinformatics tasks is to generate lists of reliably identified peptides and proteins in mass spectrometry-based experiments. For this, amino acid sequences are assigned to measured tandem mass spectra. The quality of the peptide spectrum assignments are scored and criteria are applied that allow to distinguish the good from the bad hits and to estimate the quality of the dataset.

Continue reading

Metagenomics

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Shinichi Sunagawa (ETHZ)

Abstract:

Microorganisms are numerically dominant on Earth and drive the cycling of energy, elements and matter. Thanks to advances in high-throughput DNA sequencing technologies and computational power, microbial communities can now be studied without the need to cultivate them in a laboratory setting. Essential tasks in studying microbial communities include the identification and quantification of their member taxa and the pair-wise compositional comparison of different microbial communities.

Continue reading

Regulatory Genomics and Epigenomics

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Izaskun Mallona

Continue reading

Machine Learning for Biological Use Cases

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Valentina Boeva (ETHZ)

Brief note: In this lecture V. Boeva will cover the standard machine learning methods used in the analysis of biological data: dimensionality reduction, clustering, classification and regression.

Continue reading

Statistical Bioinformatics

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Mark Robinson

Continue reading

Biological Sequence Informatics

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Christian von Mering

The analysis of biological sequences - primarily DNA, RNA and protein sequences - constitutes one of earliest and core areas of bioinformatics. This lecture introduces principles and examples of bioinformatic sequence analyses and inter-sequence comparisons. Continue reading


What is Bioinformatics? Introduction and Resources

BIO390 UZH HS22 - Introduction to Bioinformatics
08:00-09:45 @ UZH Irchel Y03-G-85

Michael Baudis

The first day of the "Introduction to Bioinformatics" lecture series starts with a general introduction into the field and a description of the lecture topics, timeline and procedures.

Topics covered in the lecture are e.g.: Continue reading


UZH BIO390 - Learning Goals

This page indicates some of the learning goals, as emphasised by the different lecturers. Some points will have been discussed in different lectures; accordingly, exam questions may not refer to information of one specific presentation.

Learning goals have a general scope

Please be aware that some of the "Learning Goals" may reflect aspects not necessarily captured by the lectures in the current semester - The ones reelevant for the current semester's exam are related to the given lectures. Also, updates may occurr at any time.

Consider individual pages

Please consider individual course pages and documents linked from there. Those might be updated later on, so it is a good idea to check back before the end of the course.

Bioinformatics: Definition & Concepts

  • definition of "Bioinformatics" (cf. Anna Tramontano)
  • categories of informatics tools used in bioinformatics
  • hypothesis versus data driven science
  • areas of bioinformatics/bioinformaticians, (in contrast to "pure" modelling, statistics etc.)
  • 3 main categories of biological data, and example resources
  • definition of API
  • common sequence related file formats
  • hierarchies and relationships as 2 main principles of ontologies
  • areas of "not-bioinformatics", and why
  • bioinformatics tools (programming languages, libraries, online resources) and their specific use cases

Sequence Analysis

  • basics of DNA and Protein sequences
  • substitution matrices
  • BLAST
    • parameters
    • terms
    • scores

Statistical Bioinformatics & Machine Learning

  • usage of gene expression profiling
  • multiple testing correction
  • parameters for hierarchical clustering
  • statistical evidence for a change in the mean
  • dimensionality reduction
  • central limit theorem
  • hierarchical clustering
  • clustering coefficient
  • unsupervised machine learning tasks/components
  • ML model types

Bioinformatics tools & resources

  • common programming/analysis languages/environments in bioinformatics and their preferred use
    • e.g. R, Perl, Python, JavaScript ... but also environments & packages like Mkdocs, ReadTheDocs, Bioconductor ...
  • components of bioinformatics online resources
    • Databases, middleware, APIs, frontends ...
  • database types / concepts
    • SQL vs. document databases
  • data curation (biocuration)
    • importance of classifications, ontologies
    • some ontologies and their use (NCIt, UBERON, DO)
    • CURIEs as identifiers (see below)
    • Null Island
    • ISO 8601 for dates and times (and why / why not?)

Some Q & A (thanks to the providers of these questions)

  • Progenetix use case: In comparative genomic hybridization, in the case of an high copy number segment of DNA, more tumor DNA will hybridize to the metaphase chromosomes just because of higher likelihood?
    • in essence, yes; it is a mix of higherlikelihood and therefore higher binding probability (it is actually hard for a given fragment to encounter the right place on the chromosome or array) w/ or w/o competition effects (latter when using normal reference DNA)
  • CURIEs: ...hierarchical coding systems where individual codes are represented as CURIEs - aren't they a type of URI rather then a "code"?
    • CURIEs are universal identifiers (URIs) consisting of a public and a local part
    • they are universal (like UUIDs) but unlike UUIDs (which can be anonymous) they are resolvable (i.e. the public part can be resolved to a URL where then the local part can be used to retrieve the resource)
    • the "hierarchical coding systems" usually don't use the CURIE internally but only the private part; but using the complete CURIE externally makes it unambiguous
  • Progenetix use case: In Progenetix one can either use the GA4GH-Beacon API to query (i.e., do information retrieval) OR use the pgxRpi API to load the data into an analysis environment (i.e., for eventual knowledge extraction)?
    • Not really. The bycon software stack implements database access / middleware / Beacon API instance. This (outward facing) API (compatible to the Beacon specification) can be accessed by various clients for data retrieval (e.g. the beaconplus-web or progenetix-web JavaScript front ends, manual http requests, Beacon aggregator services...).
    • The pgxRpi itself is an API for the R environment, i.e. another client accessing the Beacon API. So here one gets (DATA - Progenetix' Beacon API - pgxRpi API - R analysis environment.

Regulatory Genomics and Epigenomics

  • secondary/tertiary human genome structure
  • functional genome content
  • transcription factors & genome interaction
  • chemical genome modifications, their effectors and results
  • Chip-Seq
  • read mapping
  • peak calling
  • sequence compression algorithms

Metagenomics

  • concept of taxonomic diversity
  • concept microbial community dissimilarity
  • how are sequences used to derive an adopted species concept for prokaryotes
  • principle steps for 16S rRNA-based taxonomic composition analysis
  • essential steps of short sequencing read assembly into contigs and scaffolds
  • basic steps of metagenomic analysis: from raw reads to the reconstruction of genomic scaffolds

Proteomics

  • principles of proteome organization in the cell
  • key experimental and computational concepts for the collection and analysis of high confidence protein-protein interaction data
  • peptide fragmentation
  • target-decoy approach
  • protein quantification
  • MassSpec similarity queries

Biological Networks

  • Protein interaction and metabolic networks
  • databases and online resources for different types of pathway and interaction data
  • detection of protein complexes
  • graphs, nodes, edges, paths
  • geodesics, graph diameter
  • common types of degree distributions
  • adjacency matrix
  • shortest path matrix
  • assortative and disassortative graphs
  • community (module) detection
  • cliques
  • motifs, graph representations of metabolic networks

Text Mining

  • text mining pipelines & (current) common programs/applications
  • article/literature repositories (with focus on accessibility)
  • processing steps in text mining
    • stemming etc.
  • common problems in text mining
  • search engine precision metrics
  • benchmarking

Semantic web, RDF, Ontologies

  • semantic web
    • elements
    • benefits
  • components of ontologies
  • stack of standards in semantic web and their functions
  • RDF for modeling data
  • OWL/OBO for modeling a biomedical domain
  • querying knowledge graphs for answering biomedical questions

Clinical Bioinformatics & Personalized Medicine

  • genomic variants (types, numbers)
  • reference genome(s)
  • main bottlenecks of molecular diagnostics in the clinical setting
  • goals of many personalized health initiatives
  • currently favoured clinical NGS technology
  • clinical trial participation

Genomic data & privacy

  • reasons for needing many genomes
  • genomic privacy and re-identification (concepts)
  • principle of re-identification attacks over the Beacon protocol
  • long range familial searches
  • opinions about risk vs. opportunities
  • direct-to-consumer genetic testing -> what, how
  • technical and regulatory solutions against privacy breaches & data abuse
  • genotype-phenotype (G2P) (ab-)use
Continue reading