We used planarians regenerating both head and tail to identify the genes specifically expressed in a tissue specific manner. Similarly, planarians www.selleckchem.com/products/Calcitriol-(Rocaltrol).html at different stages of regeneration were used in order to isolate genes with dif ferent temporal expression profiles. Irradiation destroys planarian neoblasts within 1 2 days, and the animals die within a few weeks because they cannot sustain normal cell turnover. By including irradiated animals, potential transcripts specifically expressed under those conditions will be contained in the 454 dataset. Using 454 pyrosequencing, 601,439 sequencing reads with an average length of 327 bp were obtained. After sequence cleaning to remove vector contamination, the remaining 598,435 sequences were assembled using dif ferent cut off values for sequence similarity.
In addition, our 454 sequence reads were assembled together with the 10,000 S. mediterranea UniGene set available at NCBI, using the 90% similarity criteria. This last set, which was used in most of the analyses reported, is referred to as the 90e set. Table 1 summarizes the number of contigs and singletons obtained in each of those assemblies. The similarities between the three assemblies are illu strated in Figure 1 a Venn diagram which shows that 72. 68% of the raw sequencing reads were integrated into contigs common to all three assemblies, and 20. 51% of the sequencing reads make up a shared pool of single sequencing reads. Therefore, differences between the assemblies can be explained by differential inclusion corresponding to 6. 81% of the sequencing reads.
Average GC content and sequence length and their respective distributions were similar for all three assem blies. GC content is distributed around 35%, the expected value for coding sequences in this species. The 90e length distribution shape was slightly shifted towards larger sequences. This shift was mainly due to a set of long sequences from and finally, Unigene ESTs not assembled into a contig. Mapping the 90e assembly onto the genome The 90e assembly was aligned to scaffolds from the S. mediterranea WUSL genome assembly, version 3. 1. Figure 3 shows all possible high scoring segment pair relationships between those the NCBI Unigene ESTs included in this assembly. This causal relationship was evident in the comparison of the following four subsets of sequences from the 90e set, single tons, contigs that do not contain UniGene ESTs, contigs including Unigene ESTs, two sequence sets.
From almost 30 million initial HSPs, around 7 million were selected using a combination of thresholds, as described in the Methods section. Dis carding singleton sequences in a second round of filter ing further reduced the number of HSPs to 5 million, and HSP coverage dropped from 25. 36% and 77. 24%, Cilengitide for scaffolds and 90e respectively, to 10. 57% and 37. 93%.