The second-strand cDNA was
synthesized with DNA polymerase I. Short fragments were purified with QiaQuick PCR extraction kit (Qiagen), and then were sequenced under the Illumina HiSeq™ 2000 platform at Shenzhen BGI. The full sequencing technical details can be inspected in the services of BGI (http://www.genomics.cn). This yielded approximately six million 90-bp pair-end reads for each sample (Table 1). Then pair-end reads were mapped to the Prochlorococcus MED4 genome (accession number: NC_005072) using Bowtie2 [60] with at most one mismatch. The coverage of each nucleotide was calculated by counting the number of reads mapped at corresponding nucleotide ACP-196 price positions in the genome. The number of reads that were perfectly mapped to a gene region was calculated using BEDTools [61], and then it was normalized by gene length and total mapped Selleck 4SC-202 reads, namely RPKM as the gene expression value [26]. The gene annotations for Prochlorococcus MED4 were downloaded from MicrobesOnline [62] with modifications for non-annotated
genes that were designated “HyPMM#”. New ORFs identified in this study were annotated with “TibPMM#” (Sheet 2 of Additional file 3). Sequences generated by this study are available in the Gene Expression Omnibus (GEO) under accession number GSE49517. Identification of operons and UTRs Using a priori knowledge of the translation start and stop site from Additional file 3, the coverage of ORF upstream and downstream regions was scanned to identify a point of sharp coverage
decline. To define the boundary, we applied criteria modified from Vijayan et al.[24]. Briefly, a transcript’s boundary (translation start or stop site was defined as i = 0, and “i + 1” is the upstream or downstream of position “i”) was defined when position “i” satisfied one of the following three criteria: (1) coverage(i)/coverage(i + 1) ≥ 2, binomialcdf (coverage(i + 1), coverage(i) + coverage(i + 1), 0.5) < 0.01 and coverage(i + 1) > coverage(i:(i-89))/(90 × 7); (2) Cyclic nucleotide phosphodiesterase coverage(i)/coverage(i + 1) ≥ 5 or coverage(i)/coverage(i + 2) ≥ 5, and coverage(i + 1) < coverage(i:(i-89))/(90 × 7); (3) coverage(i + 1) ≤ background. Where binomialcdf (x, n, p) is the probability of observing up to x successes in n independent trials when success probability for each trial is p. We assumed reads were uniformly distributed on position “i” and “i + 1” (p = 0.5). If a sharp coverage reduction occurred, coverage(i + 1) would be much smaller than coverage(i); that was, the success of coverage(i + 1) became a small probability event in the events of all reads mapped to “i” and “i + 1” (binomialcdf < 0.01). The strictest criterion (1) was used for highly transcribed genes.