Description of data sets used to evaluate the OptiClust algorithm and compare its performance to other algorithmsa

Data set (reference[s])Read length (nt)No. of samplesTotal no. of sequencesNo. of unique sequencesNo. of distancesNo. of OTUs
Soil (41)15018948,243143,67711,775,16740,216
Marine (42)25071,384,98875,92312,908,85725,787
Mice (40)2503602,825,49532,4476,988,3062,658
Human (39)25048920,951,841121,28138,544,31511,648
Even (34, 36)NANA1,155,80011,55829,6947,651
Staggered (34, 36)NANA1,156,55011,55829,6947,653
  • a Each data set contains sequences from the V4 region of the 16S rRNA gene. The number of distances for each data set indicates those that were less than or equal to 0.03. The number of OTUs was determined using the OptiClust algorithm. The even and staggered data sets were generated by extracting the V4 region from full-length reference sequences, and the data sets from the natural communities were generated by sequencing the V4 region using an Illumina MiSeq with paired reads of either 150 or 250 nt. NA, not applicable.