Small scale assembly of a sequence reads

The purpose of this exercise is to give a small scale example of sequence assembly using a handful of sequence reads instead of the thousands or millions used in a genome sequencing project.

We will be using the software CAP3. The CAP3 software is one of the oldest and is widely used for genome assembly.

The DNA that was sequenced is a vector called pMEC1002. The pMEC1002 is the vector p426TEF (Mumberg 1995) where the gene AmLAT2 has been cloned. p426TEF is a Saccharomyces cerevisiae expression vector. The gene AmLAT2 encode a membrane protein from the fungus Ambrosiozyma monospora.

The sequence for this gene is available at Genbank with** accession number AY923869.

The gene was amplified by PCR using these two primers:

>146_LAT2fwd (38-mer)

GATCAAGCTTAAAATGGGTCAGTTTATTGAAAAATTCA

>145_LAT2rev (37-mer)

GATCTCGAGTCAAACACTACTTACAGAGTCTTTGAGC

The PCR product and the p426TEF vector were both digested with restriction enzymes HindIII and XhoI. The two fragments were ligated together and the circular vector was isolated. The purpose to this cloning was to express the membrane protein in Saccharomyces cerevisiae.

The plasmid DNA was sequenced eight different primers, mostly covering the AmLAT2 gene. The resulting sequence reads are available in the file “2008-12-11 & 2008-12-19 seqs in fasta format.txt”.

You can use the CAP3 assembler available at http://doua.prabi.fr/software/cap3.

Question 1:

Analyze the result of the sequencing and compare with the theoretically expected result. Can you find any problems?