Gene prediction and counting
Gene prediction is an important problem for computational biology and there are various algorithms that do gene prediction using known genes as a training data set. The following table shows Genome size and gene predictions between several organisms.
Organism |
No.of chromo somes |
Genome size in base pairs |
The Number of predicted genes |
Part of the genome that encodes for protein |
Bacteria Escherichia coli |
1 |
500,000 |
5000 |
90% |
Yeast Saccharomyces cerevisiae |
16 |
12,068,000 |
6340 |
70% |
Worm Caenorhabditis elegans |
6 |
100,000,000 |
19,000 |
27% |
Fly Drosophila melanogaster |
4 |
175,000,000
196,000,000 |
13,600 |
20% |
Weed Arabidopsis thaliana |
5 |
157,000,000 |
25,498 |
20% |
Human Homo sapiens |
23 |
3,000,000,000 |
20,000 - 25,000 |
<5% |
Based on your observation and analysis, answer the following questions.
1. Even if we know where the genes are in a given genome, it‟s difficult to count them due to
A. splice Variants
B. overlapping genes
C. exons
D. Both A and B
2. Which organism has the maximum part of the genome coding for the proteins?
A. Escherichia coli
B. Saccharomyces cerevisiae
C. Caenorhabditis elegans
D. Drosophila melanogaster
3. Part of the genome that encodes for protein in Homo sapiens is less than 5 % , one of the probable reason/s for this could be
A. Repeated Sequence
B. Exons
C. Both “a” and “ b”
D. SNP‟s
4. The relationship between number of chromosomes and genome size in base pairs is
A. direct
B. indirect
C. no relationship
D. correlation of 0.5
5. Computational Gene prediction is referred to as
A. In –silico Gene prediction
B. In –Vivo Gene prediction
C. In - vitro Gene prediction
D. Microarray prediction
6. After observing the table, it seems that the relationship between the intuitive complexity of an organism and the number of genes in its genome is
A. No simple correlation
B. Simple correlation
C. Inverse correlation
D. Depending on the organism, can be simple or Inverse