In
order to determine if there are similarities in the promoter region of
akinete-expressed genes, we must go through several steps:
1.
RACE map the transcriptional startsite to your gene.
2.
Find homologous gene in another filamentous cyanobacteria from genomic
databases. The two other sequenced genomes of filamentous
cyanobacteria similar to N.
punctiforme are Anabaena
variabilis and Anabaena
sp. strain PCC 7120.
A. How to obtain sequences
from homologous genes in Anabaena
variabilis.
i) Go to http://cmr.tigr.org/tigr-scripts/CMR/GenomePage.cgi?org=ntav02
ii) On the top tab under
SEARCHES, go to FIND SEQUENCES, and then to CMR BLAST
iii) Under PROGRAM choose
the appropriate tool, under DATABASE, choose Anabaena variabilis
sequence or protein database, depending upon the program you chose,
then enter in your query sequence, in our case the protein sequence of
your gene (Choose Blastp and protein database) or nucleotide sequence
of your gene (Choose Blastx to turn your nucleotide sequence into
protein in all 6 reading frames and search the protein database).
iv) In the
next window you will find two copies of the closest matching gene in the A. variabilis genome. Choose
the one with the underscore in the title (for instance >Ava_1558,
not the >NT02AV1772 type of name) and click on this gene name to
open a web page showing the primary annotation for this gene.
v) Choose
PRIMARY SEQUENCE from the list on the left and copy the nucleotide and
protein sequence of the gene/protein and place it into a Word document.
vi) Choose REGION
VIEW from the bottom of the list and scroll over your gene and the
upstream gene. Note the base numbers and determine the next base
(add or subtract 1) for the upstream intergenic region.
vii) Go to SEARCHES
on the top tab, and then to FIND SEQUENCES then to GET SEQ/GENE BY
COORDS and type in the coordinates of the upstream intergenic
region. Copy this and past it into your word document. You
should probably get an extra 6-10 bases inside of your gene to make
sure these two sequences correctly overlap before pasting them together
for the complete gene with its upstream intergenic region.
B. How to obtain sequences
from homologous genes in Anabaena
7120
i) Go to http://www.kazusa.or.jp/cyanobase/Anabaena/index.html
ii) Choose SIMILARITY
SEARCH, then past in the protein sequence of your gene and click Submit.
iii) In the gene's page
under Sequence Retrieval Links: click on each link to get the protein
and nucleotide sequence of the Anabaena 7120 protein/gene and copy each
to a Word document.
iv) Go back to the gene
page and click the MAP link (either JAVA or PNG, whatever works best
for your computer). On the map, click on the upstream gene and
record its start/stop coordinates (whichever is closest to your
gene). Go back to your gene's page and determine the coordinates
of the upstream intergenic region of your gene (add or subtract 1 from
your gene's start location, and add or subtract 1 from the upstream
gene location). Type the cooridinates into Init: and Term: boxes
and click Submit to retrieve the upstream intergenic region. You
should request 6-10 extra bases extending into your gene to make sure
the upsteam intergenic region overlapps correctly before pasting them
together in your Word document.
3.
Make a multiple sequence alignment
i) While in your Word
document, prepare your sequences in FASTA format that looks like this:
> NpF2222
GATCGATCGATCGATCAATTGGCC
>Ava3333
GTACGTACGTACGGCCGGAATTCC
>Alr4444
GATCTTGGTTCCAAGGTTCCTTTCC
(Note that each sequence name has no spaces, is preceeded
by a > and ends with a return. The sequences shouldn't have
any returns between the lines.) What sequence should you use? At
first use the complete gene and upstream intergenic region. The
open reading frame (ORF) of your genes should be quite identical, and
we will be interested in anything in the intergenic region that is
identical in the 3 strains.
ii) Go to http://www.ebi.ac.uk/clustalw/#
iii) Paste all 3 FASTA
files into the window and click RUN.
iv) In the results page,
did all 3 sequences get aligned? (If not, the formatting was incorrect,
go back and fix it!) Click on the alignment file to see your
alignment. The asterices below the lines denote a conserved base
for all three sequences.
v) Copy this
alignment and past into a Word
document for use in your notebooks, presentations, theses, etc.
Annotate the consensus with colors to show where the transcriptional
start site(s) is (are) so you can see the spatial relationship between
the startsite and conserved regions.
4.
Obtain the consensus
sequence of a multiple sequence alignment
i) Go back to the ClustalW
submission form page and choose "pir" under OUTPUT FORMAT and re-run
the alignment.
ii) Open the resulting
Alignment file and copy it. (Notice the the aligned sequences are now
in separate files in FASTA format!)
iii) Go to http://hiv-web.lanl.gov/content/hiv-db/CONSENSUS/CONSENSUS_TOOL/SimpCon.html
iv) Paste the separate aligned files into the
window and choose "Pretty" as the Output format. Copy this into
your Word document for your own notes. Go back and re-run the
consensus program but this time choose "Like input" as the Output
format. (Now the consensus (>DL; CON) is in a separate FASTA
format).
v) Copy the
CON portion of the output into your Word document (we don't need the
other three separate gene alignments). Annotate the consensus
with colors to show where the transcriptional star site(s) is
(are). Please email me all of your
results. I will compile these for all our promoters to see if
there are any common motifs among our different genes.