ArrayOme: a program for estimating the sizes of microarray-visualized bacterial genomes
journal contributionposted on 23.01.2008, 14:30 by Hong-Yu Ou, Rebecca J. Smith, Sacha Lucchini, Jay Hinton, Roy R. Chaudhuri, Mark Pallen, Michael R. Barer, Kumar Rajakumar
ArrayOme is a new program that calculates the size of genomes represented by microarray-based probes and facilitates recognition of key bacterial strains carrying large numbers of novel genes. Protein-coding sequences (CDS) that are contiguous on annotated reference templates and classified as ‘Present’ in the test strain by hybridization to microarrays are merged into ICs (ICs). These ICs are then extended to account for flanking intergenic sequences. Finally, the lengths of all extended ICs are summated to yield the ‘microarray-visualized genome (MVG)’ size. We tested and validated ArrayOme using both experimental and in silicogenerated genomic hybridization data. MVG sizing of five sequenced Escherichia coli and Shigella strains resulted in an accuracy of 97–99%, as compared to true genome sizes, when the comprehensive ShE.coli meta-array gene sequences (6239 CDS) were used for in silico hybridization analysis. However, the E.coli CFT073 genome size was underestimated by 14% as this meta-array lacked probes for many CFT073 CDS. ArrayOme permits rapid recognition of discordances between PFGE-measured genome and MVG sizes, thereby enabling highthroughput identification of strains rich in novel genes. Gene discovery studies focused on these strains will greatly facilitate characterization of the global gene pool accessible to individual bacterial species.