Next generation sequencing (NGS) provides a unique opportunity to reconstruct haplotypes of clonal populations. This process is essential in discovering the structure of the populations. Although many algorithms and software are developed for reconstructing viral populations from NGS reads, reconstruction of bacterial populations remains a challenge. Comparing to viruses, bacterial populations have fewer polymorphisms. Thus, distance between polymorphic sites of bacterial genomes are long and this makes the reconstruction process of bacterial populations more complicated than viral populations. Therefore, viral reconstruction methods are not capable of segregating bacterial genomes. In this study, we propose BaFulow, which is a novel algorithm for reconstructing haplotypes of bacterial populations that have low mutation frequency. BaFulow reliably reconstruct haplotypes and outperforms the existing methods. We conducted several experiments on simulated data sets with different coverages, mutation rates and number of haplotypes to assess performance of BaFulow. Our algorithm was capable of reconstructing bacterial populations with the average F-measure 0.82 and with low susceptibility to sequencing errors.
The source code of BaFulow is available here.
The manual for running the program is available here.
Samaneh Saadat1, Haiyan Hu1, and Xiaoman Li2
1Department of Electrical Engineering and Computer Science, University Of Central Florida, Orlando, FL 32826, USA.
2Burnett School of Biomedical Science, University Of Central Florida, Orlando, FL 32826, USA.