Guest lecture by Gene Myers on Monday 18 April - Ordinary lecture is cancelled
Gene Myers is one of the most influential people in bioinformatics, and a major contributor to the development of BLAST and the Celera assembler. He will give a guest lecture titled "Towards Perfect de novo DNA Sequencing" at 14.15 on Monday 18 April 2016 at the University of Oslo. We cannot let this chance slip by and I have therefore decided to postpone the ordinary lectures in INF4350 and urge you all instead to go to this lecture about sequencing and assembly, which is highly relevant for the course. I am sure it will be both very interesting and inspiring.
Please note that the guest lecture will be held in the room called Nucleus in Bikuben in Kristine Bonnevie's building. The entrance is opposite the main entrance of the building.
The lecture is NOT in Ole-Johan Dahl's building, despite what it says in the schedule for INF4350.
About Gene Myers
Dr Gene Myers is Director & Tschira Chair of Systems Biology, Max Planck Institute for Molecular Cell Biology and Genetics, Dresden, Germany. He is best known for his contributions to the development of BLAST -- the most widely used tool in bioinformatics, and for the paired-end whole genome shotgun sequencing protocol and the assembler developed at Celera that delivered the fly, human, and mouse genomes in a three year period.
Abstract
With the advent of long read sequencers such as the PacBio RS II, the goal of near-perfect de novo reconstructions of unknown genomes is once again a realistic possibility. We will explain why, and further give a hypothesis as to why assemblers have improved only marginally since the era of the Human Genome Project circa 2000, namely that it is not about the assembly, but about the artifacts in the reads and the resolution of repeat families, topics that have not received sufficient attention and that are particularly critical issues for long reads.
Therefore we are developing algorithms that carefully analyze a long read shotgun data set before assembly. By efficiently comparing all the data against itself we have developed a computational approach to accurately determine the quality of any stretch of a PacBio read based only on the sequence data itself. These regional QVs allow us to accurately identify low quality regions, chimers, and missed adaptamers. Removing these artifacts with a process we call scrubbing leaves one with reads that assemble without the need for base-level error correction. We further find that we can identify and annotate repetitive sequences prior to assembly, albeit this aspect is still a work in progress.
We will conclude with a number of sequencing projects we are undertaking and on describing what assembly tools are currently available from our lab.