Taxa-driven functional shifts associated with stormflow in an urban stream microbial community

Urban streams are susceptible to stormwater and sewage inputs that can impact their ecological health and water quality. Microbial communities in streams play important functional roles and their composition and metabolic potential can help assess ecological state and water quality. Although these environments are highly heterogenous, little is known about the influence of isolated perturbations, such as those resulting from rain events on urban stream microbiota. Here, we examined the microbial community composition and diversity in an urban stream during dry and wet weather conditions with both 16S rRNA gene sequencing across multiple years and shotgun metagenomics to more deeply analyze a single stormflow event. Metagenomics was used to assess population-level dynamics as well as shifts in the microbial community taxonomic profile and functional potential before and after a substantial rainfall. Results demonstrated general trends present in the stream under stormflow vs. baseflow conditions across years and seasons and also highlighted the significant influence of increased effluent flow following rain in shifting the stream microbial community from abundant freshwater taxa to those more associated with urban/anthropogenic settings. Shifts in the taxonomic composition were also linked to changes in functional gene content, particularly for transmembrane transport and organic substance biosynthesis. We also observed an increase in relative abundance of genes encoding degradation of organic pollutants and antibiotic resistance after rain. Overall, this study provided evidence of stormflow impacts on an urban stream microbiome from an environmental and public health perspective. Importance Urban streams in various parts of the world are facing increased anthropogenic pressure on their water quality, and stormflow events represent one such source of complex physical, chemical and biological perturbations. Microorganisms are important components of these streams from both ecological and public-health perspectives, and analyzing the effect of such perturbations on the stream microbial community can help improve current knowledge on the impact such chronic disturbances can have on these water resources. This study examines microbial community dynamics during rain-induced stormflow conditions in an urban stream of the Chicago Area Waterway System. Additionally, using shotgun metagenomics we identified significant shifts in the microbial community composition and functional gene content following a high rainfall event, with potential environment and public health implications. Previous work in this area has been limited to specific genes/organisms or has not assessed immediate stormflow impact.

genera that increased significantly (p <0.05, t-test, FDR corrected) in relative abundance in the 204 after rain microbiome. As was observed with 16S rRNA amplicons in all samples (above), the 205 urban signature bacteria Arcobacter increased by >50% in relative abundance following rain, 206 though the increase was not statistically significant (Fig. 2B). Legionella, Pseudomonas and 207 Arcobacter have all been previously associated with effluent contamination of urban waterways 208 including Francisella tularensis, Candidatus Nitrospira defluvii, Legionella longbeachae and 215 Enterococcus faecalis, were rare (<0.1% of the total sequences characterized by MyTaxa) in the 216 before-rain microbiome but increased in relative abundance after rain to > 0.1% (Table S3). Most 217 of these species are not common freshwater bacteria and are indicative of contamination. 218 219 Population-level changes in response to rainfall in the North Shore Channel 220 We followed population-level trends for abundant organisms that exhibited large changes in their 221 relative abundance after rain. Organisms most similar to Legionella pneumophila increased 10-222 fold in relative abundance after rain and also comprised the largest fraction of characterized 223 species (11%) in the after-rain microbiome. Reads were recruited to the longest contig assigned 224 to L. pneumophila in the rain-associated samples with roughly equal similarity (about 90-100% 225 nucleotide identity) from each sample, suggesting the presence of the same population both 226 before and after rain that increased substantially after rain (Fig. S4). This was supported by 227 similarities in the average amino acid identity (AAI) of predicted protein coding genes from L. 228 pneumophila before and after rainfall contigs (60% and 63%, respectively) to the genome 229 sequences of the environmental isolate L. pneumophila strain LPE509 and the clinical isolate L. 230 pneumophila subsp. pneumophila str. Philadelphia 1. The AAI between genes attributed to L. 231 pneumophila in the before and after rain metagenomes was 83%. Although genome pairs for the 232 same species typically exhibit higher AAIs (~90%) (29, 30), 83% still signifies close genetic 233 relatedness and not necessarily distinct populations. Overall, these results indicate that the before 234 and after rain Legionella are members of the same species, but different from any currently 235 known, sequenced members of Legionella. The discordance between our Legionella-like 236 organisms and well-characterized L. pneumophila strains also makes it unclear if the 237 corresponding populations are pathogenic, although a few predicted genes (1 and 3 for the before 238 and after rain metagenomes, respectively) had high identity matches (>90%) to known L. 239 pneumophila virulence genes in the virulence factor database (http://www.mgc.ac.cn/VFs/). 240 Organisms within Legionella have been associated with artificial aquatic environments such as 241 water distribution systems and cooling towers in buildings (31, 32) as well as WWTP effluent 242 (20), thus their dramatic post-rain surge is not surprising. 243 Another notable increase in relative abundance after rain (~16-fold) was attributed to 244 Francisella tularensis, an organism with known soil-and water-borne pathogenic subspecies 245 (26, 32). Using a similar approach as above, AAIs between genes attributed to F. tularensis in 246 before and after rain samples and a reference genome of pathogenic subspecies F. tularensis 247 subsp. tularensis SCHU S4 were 47% and 54%, respectively. Similar AAI values were observed 248 between the metagenomic sequences and genomes of low virulent subspecies of this organism.

12
The AAI between the before and after rain F. tularensis genes was 68%. Thus, sequences 250 classified as F. tularensis in our samples likely share the same taxonomic order Thiotrichales, 251 but are different species from the known F. tularensis and might represent different populations 252 within the same genus in the before and after rain samples, although the low number of 253 sequences in the before rain dataset could bias in AAI calculation. 254 We also evaluated the population dynamics for species that dramatically dropped in 255 relative abundance after the rain. Actinobacterium SCGC AAA027-L06 is a member of the 256 ubiquitous freshwater Actinobacteria lineage acI-B (33), and the relative abundance of contigs 257 affiliated with this organism decreased dramatically (43-fold) after rain. Read recruitment 258 indicated similarity between the before and after rain populations, with reads from each sample 259 sharing ~90-100% nucleotide identity to the largest contig of this organism, although fewer reads 260 mapped to the contig from the after rain samples (Fig. S5). As with the L. pneumophila 261 population, the 84% AAI between the before and after rain sequences indicates close genetic 262 relatedness between the two populations. Furthermore, the AAIs with respect to the 263 Actinobacterium SCGC AAA027-L06 draft genome were similar for the sequences from the 264 before and after rain microbial communities (81% and 83%, respectively), indicating close 265 genetic relatedness to this organism. Members of the acI-B lineage have been detected in diverse 266 freshwater habitats (20, 35, 36, 38) and tend to prefer oligotrophic environments due to their 267 small cell-size and oligotrophic life strategies (19, 36). Their decrease in relative abundance after 268 rain likely reflects the reduced influence of freshwater flow from Lake Michigan due to 269 increased wastewater flow. 270

271
Overall functional gene content in before and after rain microbial communities 272 13 Functional gene profiles revealed taxa-driven shifts in the microbial community functional 273 potential after rain. Although many abundant Gene Ontology (GO) terms related to 274 housekeeping functions, such as nucleic acid and small molecule binding, did not significantly 275 change in relative abundance after rain (data not shown), we observed an increase of >50% of 276 functions within the broad terms of transporter activity and carbohydrate metabolism after rain. 277 These were primarily related to transmembrane and substrate-specific transporter activity and 278 carbohydrate biosynthetic and metabolic processes, respectively (Fig. 3A). Genes related to 279 multi-organism processes such as pathogenesis and conjugation were >50% more abundant after 280 rain while the before-rain microbiome had >50% more functions related to catabolic process, 281 amine metabolic process and phosphate containing compound metabolic process (Fig. 3A). 282 Within the broad GOs, genes related to photosynthesis, biosynthesis of organic compounds such 283 as amines, vitamins and pigments as well as the activity of enzyme groups oxidoreductase 284 (acting on the CH-NH 2 group of donors) and ligase (forming phosphoric ester bonds) were twice 285 as abundant in the before-rain microbiome (Fig. S6). 286 Within the broad GO term of transporter activity, genes related to substrate-specific 287 transmembrane transporter activity, specifically organic acid and ion transmembrane transporter 288 activity, doubled in relative abundance after rain from an average of 0.06% to an average of 289 0.12% (Fig. S6). Genes encoding all transmembrane transporters were primarily attributed to 290 Actinobacteria (31% of the identified sequences at phylum level) and unclassified Proteobacteria 291 (22%) before rain, whereas unclassified Proteobacteria (39%) and Gammaproteobacteria (16%) 292 were the major groups encoding transporters after rain (Fig. 3B). Gammaproteobacteria 293 harboring transporter genes increased by 51% after rain while Actinobacteria encoding these 294 genes exhibited more than 9-fold decrease, mirroring the shifts observed for the overall 295 taxonomic profiles for these groups (Fig. 2, 3B). Genera contributing to the increase in 296 Gammaproteobacterial sequences included Legionella, Francisella and Pseudomonas, exhibiting 297 a pattern similar to the shifts in their relative abundance in the overall microbial community. 298 Furthermore, as with the overall microbial community, Actinobacterium SCGC AAA027-L06 299 (unclassified at genus level) contributed the largest fraction of sequences encoding 300 transmembrane transporter activity genes within Actinobacteria in the before rain community. 301 Interestingly, based on the functional gene content of organisms with dominant shifts in their 302 relative abundance, those organisms that increased after rain had a higher proportion of their 303 genes affiliated to transporter functions compared to those that dropped in abundance after rain. 304 For instance, 3.7% and 6.8% of the L. pneumophila and F. tularensis genes, respectively, were 305 associated with transmembrane transport,whereas Actinobacterium SCGC AAA027-L06 and the 306 genus Pelagibacter had ≤ 2%. Thus, the increase in transporter functions following the rain 307 appears to be directly associated with an increase in the relative proportion of a subset of the 308 organisms that harbor these functions rather than an increase in the distribution of these genes 309 across the community. Organisms with transmembrane transporter genes, especially for organic 310 substrates like organic acids, may be more suited to take advantage of the heterogeneous 311 environment resulting from stormflow conditions. 312 Further evidence that changes in community composition drove the overall changes in the 313 metabolic capacity came from genes that decreased in relative abundance after rain, such as 314 those encoding biosynthesis of organic substances, which mirrored the overall shifts in taxa ( Fig.  315 2); Actinobacteria (39% of the identified sequences at phylum level) and unclassified 316 Proteobacteria (31%) were the major taxa encoding organic substance biosynthesis before rain 317 and unclassified Proteobacteria (45%) and Gammaproteobacteria (13%) after rain. The short-318 term nature and lack of gene expression data makes it difficult to know about the viability and 319 activity of these organisms, but taxa-driven shifts in community functional potential were 320 recently observed in another river in response to sewage and terrestrial-derived organisms (15). 321 322 Biodegradation and antibiotic resistance gene abundance before and after rain 323 In addition to the GO-based functional analysis, we examined how rainfall impacted 324 biodegradation and antibiotic resistance gene content. Predicted ORFs from both the before and 325 after rain metagenomes were searched against a compiled database of protein sequences of 326 microbial enzymes involved in the degradation of 12 different compounds associated with 327 wastewater contamination, stormwater runoff, and WWTP effluent input (Fig. 4A). We detected 328 biodegradation genes (BDGs) in both the before and after rain samples for 8 out of the 12 329 contaminants tested, but observed a significant increase (p < 0.05, t-test) in the relative 330 abundance of genes involved in the degradation of nicotine, phenol, 1,4-dichlorobenzene and 331 pentachlorophenol and a decrease (p < 0.05) in cholesterol degrading genes after rain (Fig. 4A). 332 Additionally, the total relative abundance of all BDGs was significantly higher in the after rain 333 sample (p < 0.05, t-test). BDGs before rain were primarily affiliated with unclassified 334 Proteobacteria and Actinobacteria (35% and 30% of the identified sequences at phylum level, 335 respectively), with the profile shifting to unclassified Proteobacteria and Betaproteobacteria 336 (49% and 19%, respectively) as the dominant members of the community after rain, similar to 337 the overall taxonomic shifts described above. These results reflect the increase in effluent flow 338 from the WWTP as well as the suspected presence of these compounds in untreated wastewater 339 and CSOs (3,38,40,42) (Fig. 4A). 340 evaluated using the Comprehensive Antibiotic Resistance gene Database (CARD). As only a few 342 ORFs (~10 per library) could be classified as ARGs from both the time points, we queried the 343 unassembled paired-end reads against CARD. This resulted in several hits for various ARG 344 categories in both time points (0.04% and 0.07% of the total number of reads for before and after 345 rain samples, respectively) and revealed notable increases in the relative abundance of several 346 ARG classes after rain (Fig. 4B), including significant increases in aminocoumarin and 347 polymyxin resistance genes (p < 0.05, t-test). As with the BDGs, the total relative abundance for 348 all ARGs pooled together for each time point was significantly higher in the after rain sample (p 349 < 0.05, t-test). Increases in ARGs with urban-impacted stormflow was recently observed 350 elsewhere as well (14), indicating that this could be a significant and underexplored effect of 351 stormflow. Reads with high matches to ARGs were queried against metagenomic contigs, 352 revealing that unclassified Proteobacteria and Firmicutes were the abundant ARG-carrying phyla 353 (40% and 23% of the identified sequences at phylum level, respectively) in the before rain 354 microbiome whereas unclassified Proteobacteria (50%) and Gammaproteobacteria (24%) were 355 the dominant groups after the rain. This further supports the importance of taxa driven changes 356 on gene content. 357 The results for both community composition and functional gene analysis provide 358 evidence for the significant influence of WWTP effluent input on the microbial community, 359 particularly from increased effluent flow-rates associated with heavy rain. Overall, this study 360 revealed a shift in microbial community composition following rain from organisms frequently 361 associated with freshwater systems towards organisms associated with urban impacted waters (9, 362 20, 21) as well as a shift in functional gene content. The increased relative abundance (and 363 possibly actual abundance) of BDGs and ARGs along with the increase in genes associated with 364 conjugation and pathogenesis in the after rain microbiome highlight the environmental and 365 public health implications of stormflow in urban waterways. The extent to which these changes 366 in gene content are expressed metabolically and persist is unknown. Although the WGS 367 metagenomic analysis of a single rainfall event limits the scope of interpretations that can be 368 drawn, our results provide substantial insights into microbial community dynamics in an urban 369 stream during stormflow conditions, highlighting the need to investigate the urban stream 370 microbiome with longer temporal scales and systematic sampling design to better predict the 371 impact of rain associated stormflow events. 372 373

Materials and Methods 374
Site description and sample collection 375 The North Shore Channel (NSC) is a 12.3 km long man-made stream of the Chicago Area 376 Waterway System that receives input from the O'Brien Water Reclamation Plant, a WWTP that 377 serves over 1.3 million people residing in a 365 km 2 area and releases effluent into the NSC 378 (http://www.mwrd.org/irj/portal/anonymous/waterreclamation). Our study site is approximately 379 1 km downstream of the WWTP outfall (Fig. S1). The NSC also has 48 CSOs along its course, 380 six of which are located within about 1 km upstream of WWTP, and two are located within 1 km 381 downstream of the WWTP. These release excess stormwater mixed with untreated sewage into 382 the river when the transport and storage capacity of the city's sewage network is exceeded 383 following high rainfall (http://www.mwrd.org/irj/portal/anonymous/overview) (Fig. S1). Water 384 from the selected NSC site was sampled five times between 2013-2015 (0-1 m depth): three 385 represent stream water during base flow (dry weather) conditions, and the other two represent 386 stormflow (<24 hours after rainfall) conditions. We also sampled the WWTP effluent in October 387 2013 during baseflow conditions. Additional sample metadata and water chemistry are in Table  388 S1. 389 Water was collected using a horizontal sampler (Wildco, Yulee, FL, USA) and passed 390 on-site in succession through ~1.6 μm pore size glass fiber filters to remove larger particles 391 (Whatman, Pittsburgh, PA, USA) and collected on a 0.22 μm pore size polycarbonate membrane 392 filters (EMD Millipore, Billerica, MA, USA). WWTP effluent was collected from the WWTP 393 outlet where the released effluent mixes with stream water. About 10L of water was filtered in 394 duplicate for each sample and ~20 ml of the filtrate was transported back to lab for chemical 395 analysis. Water Temperature, pH, conductivity and total dissolved solids were measured on-site 396 using a portable water quality meter (Hanna Instruments, Woonsocket, RI, USA). Additional 397 water chemistry analysis is described in Table S1. trimming. Trimmed, paired-end reads were merged using Pear (50), but owing to low yield of the 422 merged reads, likely due to issues related to the MiSeq V2 kit chemistry, further analysis was 423 only performed on the trimmed forward reads. Reads were analyzed using QIIME version 1.8.0 424 (51). Library statistics are summarized in Table S2. Chimeric sequences were removed using 425 identify_chimeric_seqs.py with usearch61 denovo method and filter_fasta.py. Filtered sequences 426 were clustered into operational taxonomic units (OTUs) at a 97% identity level using scripts 427 pick_otus.py and pick_rep_set.py based on usearch61 denovo OTU picking. Representative 428 OTUs were assigned taxonomy based on the Greengenes reference database (May 2013 version) 429 using assign_taxonomy.py with uclust. OTUs occurring as singletons or with sequences from just 430 one library were excluded from analyses. Community taxonomic composition and alpha 431 diversity was performed using summarize_taxa.py and alpha_diversity.py, respectively, with a 432 random subsample of 17,384 sequences per sample to avoid bias arising from variation in 433 sequencing depth. Good's coverage for each library was estimated using alpha_diversity.py and 434 OTUs that included singletons, subsampled to an even depth of 18,289 sequences per library, the 435 smallest library size. 436

Metagenomic sequence assembly and phylogenetic classification 437
Raw metagenomic sequences were quality filtered using a Phred average per sliding window 438 with quality threshold Q ≥ 20 and not allowing any N's. Quality filtered coupled reads for each 439 metagenomic library were assembled as described in (46). Coupled reads were first assembled 440 into contigs with Velvet (52) and SOAPdenovo2 (53) separately, and input to Newbler 2.0 to 441 obtain longer contigs with better N50 values (54). Additional metagenomic library statistics are 442 provided in Table S3. Gene calling was done with MetaGeneMark (55). Due to uneven data 443 yields from sequencing, we used assemblies from the first sequencing run for each sample as the 444 representative sequences for annotations, and mapped the coupled reads from both the replicate 445 libraries to these contigs for each sample to calculate the contig coverage in each library. The 446 predicted protein coding genes for each dataset were used for phylogenetic classification of the 447 corresponding contigs using MyTaxa (28) with a database of all sequenced bacterial and archaeal 448 genomes (http://enve-omics.ce.gatech.edu/data/mytaxa) using DIAMOND blastp in the sensitive 449 mode (56). Reads were mapped to contigs using blastn with cutoffs ≥ 50% alignment length, 450 identity ≥ 97% and e-value ≤ 10 -10 . Contig coverage (sum of lengths of reads mapping to 451 contig/contig length) was used as a proxy for in situ abundance in each library and calculated 452 using the BlastTab.seqdepth_nomedian.pl script from the Enveomics bioinformatics toolbox 453 (57). The script aai.rb from the same toolbox was used to calculate average amino acid identity 454 (AAI) between any two sets of protein coding genes. 455 Predicted metagenomic genes were searched against the SwissProt database (58) using blastp 457 and cutoffs of at least 40% sequence identity, 70% coverage of the query sequence and e-value ≤ 458 10 -10 . The SwissProt match for the best hit for each query sequence was mapped to its 459 corresponding Gene Ontology (GO) term (59), followed by binning the characterized genes at 460 various depths (distance of a GO term from the parent node) of the GO database using in-house 461 scripts. To evaluate the functional profile at a specific depth, in situ abundance for these GO 462 terms was calculated using gene coverage (described above), and relative abundance for each 463 GO term was obtained as a fraction of the total abundance of genes with identified functions in 464 that library. The taxonomic affiliation of genes classified within a specific GO term was 465 evaluated using MyTaxa, as described above. 466 To specifically evaluate the presence and abundance of genes involved in biodegradation 467 of select wastewater contaminants in the rain-associated metagenomes, we created a database of 468 protein sequences of enzymes related to degradation of select contaminants that are commonly 469 found in WWTP effluent and sewage: testosterone; ibuprofen; caffeine; nicotine; cholesterol; 470 1,4-dichlorobenzene; methyl-naphthalene; pentachlorophenol; phenol; N,N, diethyl-3-toluamide; 471 tetrachloroethylene and phthalate (3, 38-43). The enzymes were selected based on their role in 472 the degradation pathways for these compounds (60), as well as the sequence availability in 473 NCBI. This database is available from the corresponding author upon request. The predicted 474 ORFs were searched against this database using blastp and the best hits were filtered at same 475 thresholds used for SwissProt (above). Coverage estimates were used for calculating the in situ 476 abundance for each BDG class, and normalized for each library by dividing the abundance of    Table S3: Sequencing statistics for the metagenomes used in the study. 712 Table S4: Rare species in before rain microbiome that were in the abundant fraction after rain. 713 corresponds to data around the rain event (10/5/2013), which is the focus of this study. No data 723 was available for 9/17/2013 as the rain gauge was out of service.  Phyla and genera highlighted with star symbol represent taxa with significant difference in relative abundance between the before and after rain microbiota (p < 0.05, t-test, false discovery rate corrected). 'Innominate organism' comprises contigs classified as organisms that either belonged to no known phylum/genus or a candidate phylum/genus.