Computational analysis of full-length mouse cDNAs compared with human genome sequences
2001
Domon, Shūhei | Shinagawa, Akira | Saito, Tetsuya | Kiyosawa, Hidenori | Yamanaka, Itaru | Aizawa, Katsunori | Fukuda, Shiro | Hara, Ayako | Itoh, Masayoshi | Kawai, Jun | Shibata, Kazuhiro | Hayashizaki, Yoshihide
Although the sequencing of the human genome is complete, identification of encoded genes and determination of their structures remain a major challenge. In this report, we introduce a method that effectively uses full-length mouse cDNAs to complement efforts in carrying out these difficult tasks. A total of 61,227 RIKEN mouse cDNAs (21,076 full-length and 40,151 EST sequences containing certain redundancies) were aligned with the draft human sequences. We found 35,141 non-redundant genomic regions that showed a significant alignment with the mouse cDNAs. We analyzed the structures and compositional properties of the regions detected by the full-length cDNAs, including cross-species comparisons, and noted a systematic bias of GENSCAN against exons of small size and/or low GC-content. Of the cDNAs locating the 35,141 genomic regions, 3,217 did not match any sequences of the known human genes or ESTs. Among those 3,217 cDNAs, 1,141 did not show any significant similarity to any protein sequence in the GenBank non-redundant protein database and thus are candidates for novel genes.
Показать больше [+] Меньше [-]Ключевые слова АГРОВОК
Библиографическая информация
Эту запись предоставил National Agricultural Library