Advanced bioinformatics
From EDUwiki
Contents |
Overview
During this 7 week bioinformatics programme the participants acquire fundamental skills in statistics, computer programming, and using public databases and apply these skills to real-life bioinformatics problems. This course is part of the bioinformatics track of the GRID computing master at the UvA. Other participants can choose to attend parts of this course.
- Location: Academic Medical Center, Amsterdam
- Date: 31 March - 16 May 2008
- Max participants: 20
- Costs: Free
- Course material: All course material is provided during the course.
- ECTS: 12
- Coordinator: Antoine H.C. van Kampen
- Documents: Schedule 2008 Updated on: April 24, 2008
Registration 1. Send an email with contact details to Barbera van Schaik 2. Indicate whether you are a master student, PhD student, post-doc, or something else. 3. Indicate which modules you want to attend. See the module description for other information you need to provide. Next course: 31 March - 16 May 2008
Note: This course focuses on the analysis of microarray data and the use of public biological databases in bioinformatics. This part of the programme 'bioinformatics' doesn't consider any aspect of GRID (computing).
Module 1 - Introduction
In this first module the students are given the opportunity to acquire some basic knowledge in biology, bioinformatics, statistics, the analysis of microarrays and biological pathways. Depending on his/her background the student may choose to skip certain topics of this module. Please indicate which lectures you will attend when you register for this module. The participants will get a certificate listing the topics that were attended.
Literature: The student will receive handouts during the lectures.
Recommended reading:
- Molecular Biology of the Cell. B. Alberts (Ed.) Garland Publishing Inc,US.
- Essential Bioinformatics. Jin Xiong (Ed.) Cambridge University Press.
Coordinator: Antoine van Kampen
- Lecture introduction cell biology (pdf)
- Introduction statistics
- Lecture Clustering (April 3, 2008)
- Lecture Classification & Regression (April 2, 2008)
- Lecture Classification & Regression (Part 2) (April 3, 2008)
- Introduction microarrays (April 7, 2008)
- Introduction pathway analysis (April 7, 2008)
Module 2 - R/Bioconductor
Much of data analysis in bioinformatics is done within the R/Bioconductor statistical environment. For example, many statistical methods for the analysis of microarray and other high-throughput data are available from Bioconductor. In this module you will get acquainted with R/Bioconductor and will learn to apply a range of statistical techniques to microarray data. The main topics include microarray analysis (2-dye spotted and Affymetrix), linear models, unsupervised and supervised learning, and the use of meta-data. Participants will get a certificate if they successfully carry out the computer exercises. Some programming experience is a plus for this module.
Literature: During the course you will receive handouts from
Bioinformatics and Computational Biology Solutions Using R and Bioconductor Gentleman, R.; Carey, V.; Huber, W.; Irizarry, R.; Dudoit, S. (Eds.), Springer, 2005 http://www.bioconductor.org/pub/docs/mogr/
Coordinator: Perry Moerland
- April 8 : Lecture R/Bioconductor
- April 8 : Excercises Bioconductor 1: Introduction to R and Bioconductor
- April 9 : Excercises Bioconductor 2: QC spotted arrays (Ch 4)
- April 10: Excercises Bioconductor 3: Linear models and differential expression (Ch 23 + 14)
- April 11: Excercises Bioconductor 4: Analysis of Affymetrix arrays (Ch 2 + 3, Ch 25) - no lecture today, we start at 10.30
- April 15: Excercises Bioconductor 5: Unsupervised learning (Ch 12 + 13)
- April 16: Excercises Bioconductor 6: Supervised learning (Ch 24 + 17) - no lecture today, we start at 10.30
- April 17: Excercises Bioconductor 7: Meta-data and pathways (Ch 7 + 8)
Module 3 - Analysis of microarray data and pathway analysis
In this module you will apply what was learnt in the Bioconductor module (module 2) to a challenging microarray experiment from a recent Nature paper. You will analyze the activation status of several human oncogenic pathways. You will validate the signatures found in tumor samples derived from various mouse cancer models. Association with disease outcome of the oncogenic pathway signatures will be validated for various publicly available human cancer datasets. Good knowledge of R and several Bioconductor packages is required (therefore it is compulsory that you attend module 2). Participants will get a certificate if they successfully write a short report on their analysis efforts.
Literature: During the course you will receive handouts from
Bioinformatics and Computational Biology Solutions Using R and Bioconductor Gentleman, R.; Carey, V.; Huber, W.; Irizarry, R.; Dudoit, S. (Eds.), Springer, 2005 http://www.bioconductor.org/pub/docs/mogr/
Coordinator: Perry Moerland
- April 18 - April 25: Module 3
Module 4 - Unix and Perl programming
In this module you learn the basics of Unix and Perl programming such that you can parse and integrate biological databases. The students are introduced to the world of public biological databases and will be given the opportunity to explore and use some of these databases. We assume that the students already have some background in programming.
Unix
Handout: "Unix crash course" by Rob Wolfram (pdf follows)
The basics of Unix will be explained, including scripting in sed and awk.
Perl
Book: Tisdall, JD (2002) Beginning Perl for Bioinformatics, O'Reilly Media, Sebastopol
You can borrow the book from us during the course. At the end you can decide whether you want to buy it (€30) or return it. Students that are already fluent in Perl can skip the theoretical and practical part on May 6 and 7. On Thursday May 8 you will write scripts that can be reused for the use case of module 5.
May 6-7: Perl programming
May 8: Perl application
Introductory powerpoints
Literature study
We will also prepare for the use case of module 5. To get insight into genome assembly and annotation of genes the following papers are studied. Each course member will give a literature talk on Friday May 9th (~30 minutes) about one of these papers/subjects:
- Sander: Lander et al (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860. (pdf) Page 875-892, from "Broad genomic landscape" till "Gene content of the human genome".
- Esther: Next generation sequencing:
- Kees: Guigo et al (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology, 7 (Suppl I):S2 (pdf)
- Roland: Gerstein et al (2007) What is a gene, post-ENCODE? History and updated definition. Genome Research, 17, 669 (pdf)
- Branko: Meyer (2007) A practical guide to the art of RNA gene prediction. Briefings in Bioinformatics, 8(6), 396. (pdf)
Coordinator: Barbera van Schaik
Module 5 - Case study: comparison of protein-coding and non-coding genes
Not all genes code for proteins. There is a group of genes where the transcribed RNA molecule is functional. In this case study we will analyse the sequence properties of protein-coding and non-coding genes and check whether there are differences. E.g. is there a difference in nucleotide composition between these types of genes? Can you predict the type of gene based on di-, tri-, etc combinations of nucleotides? Are regions of the genome that do not consist of genes junk?
- May 13-16: Case study genes
Contact: Barbera van Schaik
