BLAST results were parsed and filtered using a custom Perl script with the above criteria. The Perl script also mapped the hits to the corresponding COG category, reporting the category or categories for each query sequence. Each set was analysed 1,000 times randomly sampling 75% of the query sequences to calculate the Standard Deviation (SD; Figure 1). For the characterization of OGs, each comprising one gene per genome, only genes present in the genome of X. euvesicatoria str. 85-10 were used as representative
of the OG. Taxonomical distribution of homologous sequences BLAST searches against the non-redundant protein database of the NCBI (NR) [87] were performed in order to
identify ATM Kinase Inhibitor nmr the homologs of one selleck inhibitor or more genes in other organisms, with default parameters and Expect value below 10-10. The BLAST result was subsequently parsed with a custom Perl script to extract the organisms, subsequently building a cumulative counts table and mapping these organisms to any fixed taxonomical level using the NCBI’s Taxonomy database [87]. Acknowledgements This project was funded by the Colombian administrative MCC950 ic50 department of Science, Technology and Innovation (Colciencias) and the Vice-chancellor’s Office of Research at the Universidad de Los Andes. We would like to thank Andrew Crawford, Ralf Koebnik and two anonymous reviewers for critical reading of the manuscript. We also thank Boris Szurek, Valérie Verdier, Kostantinos Konstantinidis, Catalina Arévalo and Camilo López for comments and discussion Inositol monophosphatase 1 on the conception
and development of this study. Electronic supplementary material Additional file 1: COG distribution of different taxonomical ranges. Raw data graphically presented in Figure 2. Each row corresponds to one COG functional category. Each taxonomical range is represented in two columns, the average and the standard deviation. (PDF 23 KB) Additional file 2: Concatenated sequence alignment and partitions. ZIP file containing the input alignment in Phylip format (Suppl_file_2.phylip) and the coordinates of the partitions (Suppl_file_2.raxcoords) as employed for the ML phylogenetic analysis in RAxML. Unus automatically generated these files. (ZIP 2 MB) Additional file 3: Leaf and ancestral nodes in the GenoPlast events matrix. Each row corresponds to one node, and each column corresponds to a pattern of regions, as defined by Mauve developers’ tools. The first two additional columns contain the node identifier and the node content. (CSV 598 KB) Additional file 4: Species counts in similar sequences of cluster 1. Species counts within the BLAST hits in NCBI’s NR using the genes of Xeu8 in the cluster as query. (PDF 25 KB) Additional file 5: Species counts in similar sequences of cluster 2.