finding sequences, alignments and consensus

In order to determine if there are similarities in the promoter region of akinete-expressed genes, we must go through several steps:

1. RACE map the transcriptional startsite to your gene.

2. Find homologous gene in another filamentous cyanobacteria from genomic databases. The two other sequenced genomes of filamentous cyanobacteria similar to N. punctiforme are Anabaena variabilis and Anabaena sp. strain PCC 7120.

A. How to obtain sequences from homologous genes in Anabaena variabilis.

          i) Go to http://cmr.tigr.org/tigr-scripts/CMR/GenomePage.cgi?org=ntav02

          ii) On the top tab under SEARCHES, go to FIND SEQUENCES, and then to CMR BLAST

          iii) Under PROGRAM choose the appropriate tool, under DATABASE, choose Anabaena variabilis sequence or protein database, depending upon the program you chose, then enter in your query sequence, in our case the protein sequence of your gene (Choose Blastp and protein database) or nucleotide sequence of your gene (Choose Blastx to turn your nucleotide sequence into protein in all 6 reading frames and search the protein database).

          iv) In the next window you will find two copies of the closest matching gene in the A. variabilis genome. Choose the one with the underscore in the title (for instance >Ava_1558, not the >NT02AV1772 type of name) and click on this gene name to open a web page showing the primary annotation for this gene.

           v) Choose PRIMARY SEQUENCE from the list on the left and copy the nucleotide and protein sequence of the gene/protein and place it into a Word document.

          vi) Choose REGION VIEW from the bottom of the list and scroll over your gene and the upstream gene. Note the base numbers and determine the next base (add or subtract 1) for the upstream intergenic region.

          vii) Go to SEARCHES on the top tab, and then to FIND SEQUENCES then to GET SEQ/GENE BY COORDS and type in the coordinates of the upstream intergenic region. Copy this and past it into your word document. You should probably get an extra 6-10 bases inside of your gene to make sure these two sequences correctly overlap before pasting them together for the complete gene with its upstream intergenic region.

B. How to obtain sequences from homologous genes in Anabaena 7120

          i) Go to http://www.kazusa.or.jp/cyanobase/Anabaena/index.html

          ii) Choose SIMILARITY SEARCH, then past in the protein sequence of your gene and click Submit.

          iii) In the gene's page under Sequence Retrieval Links: click on each link to get the protein and nucleotide sequence of the Anabaena 7120 protein/gene and copy each to a Word document.

          iv) Go back to the gene page and click the MAP link (either JAVA or PNG, whatever works best for your computer). On the map, click on the upstream gene and record its start/stop coordinates (whichever is closest to your gene). Go back to your gene's page and determine the coordinates of the upstream intergenic region of your gene (add or subtract 1 from your gene's start location, and add or subtract 1 from the upstream gene location). Type the cooridinates into Init: and Term: boxes and click Submit to retrieve the upstream intergenic region. You should request 6-10 extra bases extending into your gene to make sure the upsteam intergenic region overlapps correctly before pasting them together in your Word document.

3. Make a multiple sequence alignment

          i) While in your Word document, prepare your sequences in FASTA format that looks like this:

> NpF2222
GATCGATCGATCGATCAATTGGCC

>Ava3333
GTACGTACGTACGGCCGGAATTCC

>Alr4444
GATCTTGGTTCCAAGGTTCCTTTCC

   (Note that each sequence name has no spaces, is preceeded by a > and ends with a return. The sequences shouldn't have any returns between the lines.) What sequence should you use? At first use the complete gene and upstream intergenic region. The open reading frame (ORF) of your genes should be quite identical, and we will be interested in anything in the intergenic region that is identical in the 3 strains.

          ii) Go to http://www.ebi.ac.uk/clustalw/#

          iii) Paste all 3 FASTA files into the window and click RUN.

          iv) In the results page, did all 3 sequences get aligned? (If not, the formatting was incorrect, go back and fix it!) Click on the alignment file to see your alignment. The asterices below the lines denote a conserved base for all three sequences.

            v) Copy this alignment and past into a Word document for use in your notebooks, presentations, theses, etc. Annotate the consensus with colors to show where the transcriptional start site(s) is (are) so you can see the spatial relationship between the startsite and conserved regions.

4. Obtain the consensus sequence of a multiple sequence alignment

          i) Go back to the ClustalW submission form page and choose "pir" under OUTPUT FORMAT and re-run the alignment.

          ii) Open the resulting Alignment file and copy it. (Notice the the aligned sequences are now in separate files in FASTA format!)

          iii) Go to http://hiv-web.lanl.gov/content/hiv-db/CONSENSUS/CONSENSUS_TOOL/SimpCon.html

          iv) Paste the separate aligned files into the window and choose "Pretty" as the Output format. Copy this into your Word document for your own notes. Go back and re-run the consensus program but this time choose "Like input" as the Output format. (Now the consensus (>DL; CON) is in a separate FASTA format).

            v) Copy the CON portion of the output into your Word document (we don't need the other three separate gene alignments). Annotate the consensus with colors to show where the transcriptional star site(s) is (are). Please email me all of your results. I will compile these for all our promoters to see if there are any common motifs among our different genes.