Virioplankton Assemblage Structure in the Lower River and Ocean Continuum of the Amazon

The Amazon River forms a vast plume in the Atlantic Ocean that can extend for more than 1,000 km. Microbial communities promote a globally relevant carbon sink system in the plume. Despite the importance of viruses for the global carbon cycle, the diversity and the possible roles of viruses in the Amazonia are poorly understood. The present work assesses, for the first time, the abundance and diversity of viruses simultaneously in the river and ocean in order to elucidate their possible roles. DNA sequence assembly yielded 29,358 scaffolds, encoding 82,546 viral proteins, with 15 new complete viral genomes from the 12 river and ocean locations. Viral diversity was clearly distinguished by river and ocean. Bacteriophages were the most abundant and occurred throughout the continuum. Viruses that infect eukaryotes were more abundant in the river, whereas phages appeared to have strong control over the host prokaryotic populations in the plume.

possible roles. DNA sequence assembly yielded 29,358 scaffolds, encoding 82,546 viral proteins, with 15 new complete viral genomes from the 12 river and ocean locations. Viral diversity was clearly distinguished by river and ocean. Bacteriophages were the most abundant and occurred throughout the continuum. Viruses that infect eukaryotes were more abundant in the river, whereas phages appeared to have strong control over the host prokaryotic populations in the plume. specific databases through virome cross-assembly can circumvent this issue (20,33) and also can improve understanding of the influence of environmental parameters on viral communities (20,33,34). The diversity and structure of viral assemblages in river systems worldwide, and particularly in the Amazon River, are poorly understood (35). Previous studies have addressed specific virus taxonomic groups by the use of PCR (36) and cultivation (37); all of those studies were restricted to a very limited geographical range in the Amazon region. Despite advances in our knowledge regarding the microbial diversity in the Amazon plume (7,14,38), only a few studies have addressed the role of microbes along the continuum, especially in the lower Amazon River (5,12,39), and the roles of viruses have been mostly overlooked. Thus, a more comprehensive understanding of the complete virus diversity along the Amazon continuum is lacking, especially in the river's lower reaches and plume and also in the ecological context of environmental parameters (40).
The aim of this study was to elucidate the diversity and assemblage of planktonic viruses in the Amazon River-plume continuum. We performed the first broad viromics analysis of this system using a shotgun approach to define the major taxonomic and functional groups along the continuum and to characterize how environmental parameters, possible viral hosts, and geographical locations shape the composition of the viral assemblage in this vast and relevant geographic area.

RESULTS
Water physical-chemical and biological analyses. The water physical-chemical parameters (Table 1) and biological parameters (cell and viral particle counts and chlorophyll values) ( Table 2) that were investigated revealed distinct environmental conditions along the continuum. Principal-component analysis (PCA) of the physicalchemical data (see Fig. S1A in the supplemental material) revealed three major groups of samples: river samples, plume samples, and samples from a transition region between them, which is formed by locations near the river mouth (station 10 [St10] and St11). The lower river locations (Tapajós, Óbidos, north Macapá, south Macapá, Belém) were warmer, with higher concentrations of inorganic nutrients and dissolved organic carbon (DOC) and with lower pH and lower surface dissolved inorganic carbon (SurfDIC) and oxygen concentrations. The plume locations (St6, St4, St3, St1, and St15) exhibited strong temperature and salinity (Sal) gradients between the mouth and the outer region, with lower nutrient and organic matter concentrations overall and higher pH and higher SurfDIC and oxygen concentrations. Finally, the transition locations (St10  S1A). Viral particle abundance was higher in the plume, but bacterial abundance was higher in the river. Thus, virus-to-microbe ratios (VMR) were higher in the plume (with the exception of St6) than in the river ( Table 2). The levels of chlorophyll, cyanobacteria, picoeukaryotes, and nanoeukaryotes corresponded to the different river origins, exhibiting higher values in the samples from the rivers from Brazil's central region (Tapajós and Belém) and lower values in the samples from the main Amazon River course (Óbidos, north Macapá, and south Macapá); however, large variations were observed across the plume. The brownish waters of the main river course also had more fine suspended sediment (FSS) and particulate lignin than Tapajós and Belém (Table 2). During the sampling period, the Óbidos River showed a water level that was normal with respect to historical measured levels (5, 41) (Fig. S2).
Virome yield and dinucleotide frequency analysis. Virome sequencing yielded 146,022 (St11) to 2,964,975 (St4) reads, with a mean read size of 230 bp (Ϯ50 bp) and mean GC content level of 44% (Ϯ8.3%) (see Table S1 in the supplemental material). The PCA of dinucleotide frequency revealed two groups separated by PC1 (Fig. S1B). One was dominated by the river samples and consisted of Belém, north Macapá, south Macapá, Óbidos, and Tapajós and also of transition locations St10 and St11; the other was dominated by the plume samples and included St1, St3, St4, St6, and St15. The GC content of the river-dominated group was higher (49.4% Ϯ 6.2%) than that of the plume-dominated group (36.4% Ϯ 2.5%) (P ϭ 0.0009 [t test]).
New viral genomes and proteins discovered in the Amazon viral community. Virome cross-assembly resulted in 29,358 scaffolds longer than 1 kbp, amounting to 71.8 Gbp of data (N50 ϭ 2,709). Among these, 15 were circular and longer than 10 kbp, likely representing new complete viral genomes. Together, the scaffolds encoded 82,546 proteins, but only 35,381 (43%) exhibited similarity to entries in the NCBI nr database (viruses, 13,158; bacteria, 21,103; archaea, 357; eukaryotes, 702; unclassified, 61), often with low identity levels (mean identity of 60% Ϯ 22.5%), highlighting the novelty of this data set (Table S2). The proteins included both typical viral structural and information processing proteins (e.g., capsid proteins and DNA polymerases) and transduction and auxiliary metabolic proteins encoded by genes carried by viruses that are involved in diverse pathways that are important for host physiology (e.g., photosynthesis and nutrient transporters) ( Table S2). The classification of Amazon scaffolds with VirSorter provisionally confirmed 3,266 to be viral sequences (for complete phage contigs, highly certain confirmation (pretty sure), 623; moderately certain confirmation (quite sure), 2,634; for prophages, highly certain confirmation, 1; moderately certain confirmation, 8), representing 11.12% of the total number of scaffolds obtained. In addition to 3,692 genomes from the literature, the custom database contained 6,958 sequences in total. The VirFinder analysis of the Amazon viral scaffolds validated by VirSorter returned a mean score of 0.79 Ϯ 0.23 (mean P ϭ 0.05 Ϯ 0.09), while the VirFinder analysis of all Amazon scaffolds returned a mean score of 0.62 Ϯ 0.31 (mean P ϭ 0.15 Ϯ 0.21). The functional profile obtained with HUMAnN2 showed that genes related to biosynthesis of nucleosides and nucleotides were abundant in all locations (Fig. S3). The transition and the plume had a more diverse functional profile, including genes related to fatty acid and lipid biosynthesis and to respiration and other groups of genes not related to common viral functions. The river's north Macapá and Belém locations did not return any results (Fig. S3).
Viral community abundance profiles. The nonmetric multidimensional scaling (NMDS) of the abundance profile of the custom database in the Amazon viromes showed a separation of the rivers from the plume stations, according to NMDS axis 1 (Fig. 1A), indicating a separation of freshwater and saline waters, with the exception of St10 (a river mouth station), whose samples grouped with the saline samples. This pattern of separation by salinity was also observed in the PCA of the dinucleotide frequency profiles (Fig. S1B), with the exception of the samples from the brackish-water station (St11), which grouped with the riverine samples. The salinity influence was not detected in the dendrogram analysis of the same custom database, as the St15 plume grouped with the rivers and the other plume stations and the transition region formed another group (Fig. 1B). In addition, the heat map showed that the contribution of reference viral genomes in the Amazon viromes was lower than that seen with the Amazon scaffolds (Fig. 1B). Succession of possible viral hosts along the continuum. The reference levels of viral genome abundance along the continuum showed distinctive patterns according to their hosts ( Fig. 2A): in riverine samples (Tapajós, Óbidos, north Macapá, south Macapá, and Belém), viruses of eukaryotes (e.g., pandoraviruses, megaviruses, and mimiviruses) were more abundant; in the transition plume (St10 and St11), phages of heterotrophic bacteria increased in abundance accompanied by a decrease in the abundance of eukaryotic viruses; and in the plume (St6, St4, St3, St1, and St15), a trend of more cyanophages, prochlorophages, and synechophages than phages of heterotrophic bacteria was observed, with the exception of St6 and St1, where pelagiphages had higher relative abundance. Analysis of the individual abundance patterns of the reference viral genomes corroborated this pattern (Fig. 2B). The majority of cyanophages, pelagiphages, prochlorophages, and synechophages were more abundant in the plume than in the river, whereas eukaryotic viruses and most phages infecting heterotrophic bacteria (other than Pelagibacter sp.) were more abundant in the river. In these analyses, pelagiphages, prochlorophages, and synechophages were separated from the common groups because of the abundance and importance of their respec- The bar graph shows the relative abundances of the reference viral genomes according to their respective hosts. A succession of patterns from river to ocean is observed, where the river locations (brown), including Belém (Bel), north Macapá (NMac), south Macapá (SMac), Óbidos (Obi), and Tapajós (Tap), are dominated by viruses of eukaryotic organisms; the transitions (black), including transitions St10 and St11, show an increase in the levels of heterotrophic bacterial viruses; and the plumes (blue), including plumes St1, St3, St4, St6, and St15, possess more viruses that infect autotrophic organisms. (B) Scatterplot displaying the median abundances of sequences in samples from Amazon River (x axis) and plume (y axis). Each point represents a reference viral genome (color coded as described for panel A). The sizes of the points are inversely proportional to the false-discovery-rate (q) values, meaning that larger points display more-significant changes in abundance between the two sets of samples. Data corresponding to both axes are shown in log 10 scale; the black line represents a 1:1 ratio. tive hosts in marine waters: ЉCandidatus Pelagibacter ubiqueЉ (42), Prochlorococcus, and Synechococcus (43).
Viruses most important for river and plume separation. The random forest analysis identified 21 viral sequences (corresponding to 16 VirSorter-validated Amazon scaffolds and 5 reference genomes) whose abundance was most important for river and plume separation (Fig. 3). Four Amazon scaffolds were more abundant in the river (riverine), while the 17 others abounded in the plume (oceanic) (Fig. 3). Amazon scaffold Seq_3963 (riverine) was the scaffold most indicative of river-plume separation. This scaffold corresponds to a replication-associated protein from a sewage-associated circular DNA virus, representing a protein family that is associated with single-stranded DNA (ssDNA) viruses of animals (Circoviridae) and plants (Nanoviridae, Geminiviridae) ( Table S2). None of the other genes from the riverine scaffolds had similarity to genes encoding proteins listed in the GenBank nr protein database. Overall, the majority (77.7%) of the genes had no identifiable function, and the identifiable genes (22.3%) encoded proteins for cellular metabolism (DNA, proteins), especially from Bacteria. In addition, possible viral AMGs encoding proteins related to nitrogen fixation (one in Seq_71) and oxidoreductases [six in total, including two Fe(II)-dependent oxygenases and two tryptophan halogenases in AP013490 (uncultured Mediterranean phage), one thioredoxin in Seq_642, and one thioredoxin in AP013379 (uncultured Mediterranean phage)] were detected ( Fig. 3 and Table S2).
Viral richness and diversity in the continuum. The Shannon diversity index data indicated higher diversity for some Amazon River samples (Óbidos, north Macapá, and south Macapá) and lower diversity for the remaining rivers, as well as for the plume locations (Table S3). This trend was corroborated by rarefaction curves inferred from the abundance profiles of these samples (Fig. S4), which revealed that Óbidos, north Macapá, and south Macapá were much further from reaching saturation than the remaining river and plume samples. The Simpson index data showed that the plumes corresponding to St6, St15, and St4 were the most dominant locations; the richness values indicated that plumes St4 and St3 and also transition St10 were richer (Table S3). The Shannon index of viral functions, annotated by the Metagenomic RAST server (MG-RAST), indicated lower values for the turbid rivers plus St10 (which also has turbid waters) (mean, 1.4 Ϯ 0.2) and higher values for the plume plus St11 and the Tapajós River (clear water river) (mean, 2.8 Ϯ 0.3) (Table S3).
Automated metagenome annotation: summary, classification, and canonical analysis of principal coordinates (CAP). The number of validated sequences remaining after MG-RAST quality control (QC) was performed ranged from 133,414 (St11) to 1,519,118 (St4) ( Table S1). These sequences were classified as corresponding to rRNA genes (with levels ranging from 0.48% in St11 to 7% in St4), annotated proteins (6.23% in St1 to 72.31% in St11), unknown proteins (26.6% in St11 to 86.28% in St1), and unknown sequences (0% to 4.52% in north Macapá) ( Table S1). The number of sequences classified as small ribosomal subunits (SSU) ranged from zero in south Macapá to 206 (0.034% of the valid sequences) in St10, whereas the number of large ribosomal subunit (LSU) sequences ranged from zero in north Macapá, south Macapá, and Belém to 754 (0.124%) in St10 (Table S1). Of the total number of available annotated proteins, 0.3% (St11) to 61.9% (north Macapá) were classified as viruses at the domain level after MG-RAST annotation (Table S1).
The taxonomical classification of viral sequences at the family level indicated that a large fraction (11% to 20.6%) represented unclassified sequences of the viral domain (Fig. S5A). The most abundant identifiable families were Microviridae (9.58% to 18.18%) and Myoviridae (6.08% to 17.18%). Other abundant families were Circoviridae, Podoviridae, Phycodnaviridae, and Siphoviridae (Fig. S5A). The functional classification of subsystems at level 1 (collections of functionally related protein families) (44) showed that sequences of phage, prophage, transposable element, and plasmid (PPTP) subsystems (which included viral gene sequences of capsid, neck, tail, packaging machinery, phage replication, and phage lysins, among others) were most abundant in all locations but were even more abundant in the turbid rivers and St11 (transition) (mean abundance of 72.4% Ϯ 20.8%) than in St10 (transition), plume, and Tapajós (16.9% Ϯ 5.5%) (Fig. S5B). Other less-abundant subsystems (such as cofactors, vitamins, prosthetic groups, and pigments; regulation and cell signaling; cell wall and capsule; photosynthesis; and others) were detected only in the transition, plume, and Tapajós.

DISCUSSION
The structures of the virioplankton assemblages are distinct between the river and plume of the Amazon. Despite the continuum formed by the Amazon River extending from land to ocean, the river and plume represent different ecosystems, characterized by distinct patterns of viral assemblages and water parameters and separated by a transition plume formed by locations St10 and St11. This trend was clearly observed with the PCA of the physical-chemical parameters, for which the river ecosystem was characterized by higher levels of respiration and organic matter (DOC). In contrast, the plume ecosystem demonstrated more photosynthetic processes and the production/release of organic nitrogen forms (DON), indicating waters that are more autotrophic and oligotrophic, a pattern that corroborates the results of a previous study (14). This pattern is reinforced by the increased presence of genes of photosynthesis in the plume, as observed in the functional profiles of HUMAnN2 and MG-RAST. The transition displayed features that were intermediate between those of the river and plume (e.g., low salinity and SurfDIC, like the rivers, and lower temperature and PCO 2 , like the plume).
The virioplankton data corroborate the main separation of the river and plume ecosystems, as observed with the %GC content and the dinucleotide frequency of the viromes as well as with the annotation-dependent approaches: the mapped profile of viral abundance (NMDS and dendrogram analysis), the abundance and distribution of possible hosts inferred from the reference viral genomes, the most abundant and important viral scaffolds and genomes in the river and plume, and the CAP ordination of the viral families. A few exceptions were observed, as in the case of the grouping of St15 with the rivers in the dendrogram, probably caused by the lack of some Amazon viral scaffolds discarded by VirSorter. This program can be less sensitive in analyzing small viral genomes with few predicted genes (45), an important limitation considering the great abundance of small ssDNA viruses observed in the Amazon continuum. However, the good VirFinder scores obtained from all Amazon scaffolds and from the VirSorter-validated ones indicate that our viral database is reliable. The grouping of Tapajós (clear water river) with the plumes in the CAP might represent selection of groups from similar hosts (e.g., cyanophages, prochlorophages, and synechophages) and their related viruses, as these locations possess similar environmental conditions, such as higher light penetration, which favors photosynthetic organisms. The occurrence of genes related to photosynthesis subsystems in Tapajós and in the plume reinforces this hypothesis. Considering the Óbidos River water level to be a proxy that is representative of the whole continuum and since conditions were normal during the sampling period, it is expected that the pattern presented here can be reproduced during the periods of falling water levels in the Amazon River.
Environmental parameters may regulate the virioplankton community structure when these viral particles are free in the water (46). Factors such as temperature, salinity, pH, UV light, and nutrients (nitrogen, phosphorous) can interact directly, enhancing or reducing virion viability in marine environments (22). In the Amazon continuum, the most important parameters for structuring virioplankton assemblages were Sal, pH, PCO 2 , and SurfDIC, according to the viral CAP results. Although PCO 2 has been found to be related to viral and bacterial abundances in an Amazon tributary (40), the possible direct effects of the presence of gaseous and dissolved forms of CO 2 in virions remain unknown. The influence of geographical location and environmental conditions, such as salinity, on marine virioplankton has been well documented (46)(47)(48). The patterns presented here suggest that intrinsic physical-chemical and biological parameters of the water bodies along the Amazon continuum may have a major impact on the viral community composition, leading to patterns of separation of viral groups and possible hosts, thus shaping the similarities among geographical locations.
A clear shift in viral assemblage composition occurred along the river-plume continuum, where the viruses were grouped according to the reference phage host types. Riverine samples (Tapajós, Óbidos, north Macapá, south Macapá, and Belém) were dominated by eukaryotic viruses, likely as a consequence of the elevated concentrations of autotrophic nanoeukaryotes and picoeukaryotes measured at those sites as well as of the larger heterotrophic protists and of the land contribution of plant and animal cells. At the transition zone, phages that infect heterotrophic bacteria became increasingly abundant, while the abundance of eukaryotic viruses declined. The widespread changes in environmental conditions in this zone may lead to the selection of more-tolerant organisms such as heterotrophic bacteria and their viruses. Toward the ocean, the abundances of bacteria and microalgae decreased, but the abundances of cyanobacterial and viral particles drastically increased, leading to enrichment of the waters in phages of cyanobacteria and also of Pelagibacter. This pattern of possible viral hosts is reinforced by the eukaryotic sequences, where rivers contained 9% of the reads and 60% of the transcripts (39), in contrast to an overall lower contribution of eukaryotic reads in the plume (38).
The majority of the members of cosmopolitan viral families are bacteriophages. The widespread occurrence of Microviridae (small ssDNA phages) indicates that their hosts may have a similarly broad distribution, surviving throughout the continuum, as reinforced by the occurrence of phages of heterotrophic bacteria (Microviridae, Myoviridae, Podoviridae) along the continuum. Although some genetically similar viruses are widespread, most viruses are constrained to specific environmental conditions where their hosts can survive and reproduce (26). The high abundance of bacteriophages in Amazon freshwaters is consistent with a previous report (49).
Tailed viruses have been reported to be more resistant to changes in ionic strength (22,50). In addition, bacteriophages and archaeoviruses isolated from environments with a wide range of ionic strengths have been found to be more resistant to variations in ionic strength than their hosts (22). As the cosmopolitan viral families in the Amazon infect bacteria or archaea and as two of them are members of Caudovirales (Myoviridae and Podoviridae) (51), it is plausible that these tailed viruses can move between river and plume. Recent viral metagenomic studies of a rural river in Australia (52) and of the estuary of the Jiulong River in China (53) indicate that Caudovirales (e.g., Myoviridae, Siphoviridae, and Podoviridae) were the most abundant viruses. In the Amazon River, members of Caudovirales were also abundant, but the higher abundance of Microviridae, and also of Circoviridae, suggests that the ssDNA families, which are significant pathogens of the phytoplankton and microzooplankton in marine food webs (54), are also very important along the continuum.
The higher viral diversity of the samples from Óbidos, south Macapá, and north Macapá, which have higher Shannon values and rarefaction curves, revealed the effect of different water origins and forest influences, as the main course of the Amazon River receives higher inputs from the forest, upstream waters, and many river tributaries. The enormous export of terrestrial plant and animal material from the Amazon forest into the river may allow certain viral families to proliferate. This phenomenon can be observed on the basis of the abundance of animal-and plant-associated viral families such as the Circoviridae that infect animals and of plant viruses such as Nanoviridae and Geminiviridae, and the data are strengthened by the occurrence of an ssDNA viral genome (Seq_3963), probably related to these viral families, that was abundant and characteristic of riverine waters. Similarly, the virioplanktons of Arctic lakes were dominated by ssDNA viruses such as Circoviridae (55), thus reinforcing the idea of the importance of the ssDNA viruses in aquatic environments.
Dynamics of viral particles and organic matter in the continuum. Viral structural genes (encoding virion proteins and nucleic acids) and life cycle genes (associated with packaging machinery, phage replication, and phage lysins) were the most abundant in the continuum. However, atypical viral genes (associated with, e.g., cofactors, vitamins, prosthetic groups, and pigments; regulation and cell signaling; cell wall and capsule; fatty acid and lipid biosynthesis; and photosynthesis) were more common in plume and transition localities and also in the Tapajós River, according to the MG-RAST functional profile. The genes that encode other functions in addition to viral structural and nucleic acid replication may be carried by the virions as an effect of the viral horizontal gene transfer. The higher viral particle counts seen in plume locations may enhance the rate of encounters with possible hosts, thus increasing the possibility of transduction processes and subsequently promoting viral diversification. Indeed, the plume, St10 (transition), and Tapajós had more possible viral hosts, according to the reference genomes, which may have led to its more diverse functional profile.
The more-diverse functional profile in the plume was also observed with the most abundant and important viral scaffolds and genomes. The genomes of the viruses from the plume were larger and contained more genes; thus, they can carry more viral enzymatic genes than the compact riverine viral genomes, which might pertain to small ssDNA viruses that have more balanced numbers of viral structural and enzymatic genes. A similar pattern of higher occurrence of large viruses (with respect to capsid and genome size) in estuarine and coastal waters than in freshwater was observed, although small viral particles were dominant along this salinity gradient (56). This trend also explains the identified cellular genes (the majority from Bacteria) in these sequences, which were related to basic cellular functions and could represent products of viral transduction in the plume. In addition, one genome carried a protein related to nitrogen fixation (encoded by the rnf gene), and three others had oxidoreductases [Fe(II)-dependent oxygenase, tryptophan halogenase, and thioredoxin, which are enzymes that promote oxidative reactions of proteins, forming cascades of signalization], which could represent possible AMGs that help the plume's viruses during infection, especially in the presence of nitrogen fixation phytoplankton in the plume (14).
Some viral groups may affect the carbon balance in the continuum by infecting photoautotrophic organisms. In rivers, the presence of Geminiviridae can facilitate the release of plant organic matter (e.g., lignin and cellulose), which may be degraded by lignocellulolytic bacteria and eukaryotes, being possible drivers of the lignin degradation observed along the river (5,12,57). In the plume and in Tapajós, the presence of Phycodnaviridae and Mimiviridae could decrease the total amount of primary production by their photosynthetic hosts, resulting in less carbon uptake from the atmosphere. The high concentration of humic substances (DOM) in water captures viral particles by adsorption, which reduces viral infectivity in copiotrophic waters, favoring lysogeny (58). The river and the transition plume had DOC values that were 3-fold higher than those seen with the outer plume, likely due to the presence of allochthonous organic matter from the forest. This organic matter and sediment in suspension can adsorb more viral particles, removing them from the water column (30), a process that may be enhanced by the release of extracellular polysaccharides by bacteria and phytoplankton (22). Additionally, the grazing of viral particles is more significant in eutrophic than in oligotrophic waters (30,59), which may further increase the removal of viruses from copiotrophic river waters. Previous reports showed that freshwater ecosystems tend to have higher VMR (60,61) or can have similar VMR, as observed in the Charente River, where viral particles counts decreased while salinity increased (62). The possible relation of virioplankton to the presence of organic matter and suspended sediments reported here explains the lower viral counts, especially in the turbid rivers, indicating that the Amazon River has a particular viral-to-microbe ratio dynamics, with a lower VMR in the rivers and a higher VMR in the plume.
Additionally, the pattern showing a lower VMR and lower viral functional Shannon diversity with higher microbial host densities indicates a more lysogenic lifestyle in the river and transition; an opposite scenario was detected in the plume, making the lytic lifestyle more common (19). We thus hypothesize that the widespread changes in water parameters between river and plume may trigger the lytic cycle toward the ocean. Considering this hypothesis, the lack of microbial lysis in the river leaves microbial cells intact for grazing; thus, the organic matter enters the classical food web to nourish higher organisms. In contrast, the lytic lifestyle of the viruses in the plume promotes the viral shunt such that the organic matter is redirected to the microbial communities. However, the suppression of lysis at high microbial cell densities may not be explained by an increase in the prevalence of lysogeny (63). Further studies are needed to elucidate this hypothesis of a lysogenic river and a lytic plume observed here and to perform measurements of the viral contribution to the destiny of organic matter in the Amazon continuum.
Conclusions. This is the first study of viromics in the Amazon River continuum to have provided knowledge concerning the diversity and possible ecological roles of viral assemblages in this region. Clear discontinuities were observed throughout the vast Amazon River and plume continuum. Despite the spatial connectivity mediated by the river, the viromes form distinct groups (in rivers, transitions, and plumes), which, together with environmental parameters, indicate that river and plume are different ecosystems. Despite this separation, some bacteriophages are widely distributed throughout the continuum, which indicates that the river-to-ocean transition is a barrier to the distribution of some, but not all, viral families. The viral families are distributed according to a combination of host occurrence and the physical-chemical characteristics of the waters, especially salinity. Knowledge of the current state of the virioplankton of the largest river in the world provides a foundation for understanding how future global warming, or other forms of anthropogenic impact, can influence the microbiota of riverine ecosystems. These changes in microbiota can modify, for example, the river and plume biodiversity and the carbon cycle and sequestration system of the Amazon River continuum, with local (South Atlantic Ocean) and global consequences.
Physical-chemical analyses and chlorophyll measurements. All environmental parameters were determined using standard riverine or oceanographic methods (5,64). At least three replicates were analyzed for each parameter. Samples were analyzed for inorganic nutrients (SiO 3 , NO 3 ϩ NO 2 , and PO 4 ) as described previously (65). The values for the fine suspended sediment (FSS) and particulate lignin were obtained from Ward et al. (5).
Cytometry counts of viral particles and microbial cells. Triplicate water samples were collected in 2-ml cryogenic vials for each station from the river and the plume. Each of the triplicate samples was fixed with one of the three different preservatives: glutaraldehyde (25% [wt/vol]) for viruses, paraformaldehyde (10% [wt/vol]) for microalgae, or glutaraldehyde plus paraformaldehyde (0.5% [wt/vol] plus 10% [wt/vol]) for bacteria. Cryovials were homogenized, fixed at room temperature for 10 min, and stored in liquid nitrogen. The cytometry counts were performed as described previously (66). The virus-to-microbe ratios (VMR) were also calculated.
Principal-component analysis (PCA) of the dinucleotide frequency in the metagenomes and the physical-chemical data. PCA (67) was used to identify the separation patterns between locations prior to the metagenome annotation. Dinucleotide frequencies of the quality-controlled sequences were calculated based on the method described by Willner et al. (68) using homemade Perl scripts (available upon request), as described previously (69). The covariance PCAs of the dinucleotide frequencies and the physical-chemical parameters were performed using R program, version 3.0.2 (70).
Virome sampling and field processing. A 40-liter volume of water was sampled in the central channel of each river, while a 100-liter volume was sampled in the plume stations. The total amount of sampled water was prefiltered using a 100-m-pore-size mesh and was then concentrated to a volume of approximately 0.5 liters using a Tangential Filter Flow (TFF) cassette (GE Healthcare) with a pore size of 100 kDa. The concentrated water was filtered using 3-m-pore-size mixed cellulose ester membranes (Millipore) to separate larger particulate material and eukaryotic cells and then with 0.22-m-pore-size polyethersulfone cartridge filters (Sterivex; Millipore) to remove picoplankton cells (bacteria and archaea). The final filtrate (~200 ml) from each site, containing a concentrated fraction of virioplankton, was stored at 4°C in Falcon tubes protected from light until processing in the laboratory was performed, which occurred in a less than a month.
Virome DNA extraction. Viral filtrate samples were concentrated by ultracentrifugation, and DNA extraction was performed according to the method described by Gregoracci et al. (71), with the addition of ␤-mercaptoethanol during lysis and of two washing steps using 10% cetyltrimethylammonium bromide (CTAB) plus 0.7 M NaCl (72). DNA from river samples was additionally cleaned to remove PCR inhibitors, such as residual humic acids, using a OneStep PCR inhibitor removal kit (Zymo Research). Genomiphi reactions were performed using Illustra Genomiphi DNA kit v2 (GE Healthcare) following a modified protocol (73). The DNA concentration was quantified using a NanoDrop ND 1000 spectrophotometer (Thermo Scientific, DE, USA) and a Qubit Fluorometer with a Qubit double-stranded DNA (dsDNA) high-sensitivity (HS) assay kit (Life Technologies, Inc.).
Illumina library construction and sequencing. The DNA libraries were prepared using a Nextera XT sample preparation kit (Illumina). The library size distribution was assessed using a model 2100 Bioanalyzer (Agilent) and a High Sensitivity DNA kit (Agilent) and was quantified using an Applied Biosystems 7500 real-time PCR system and a Kapa library quantification kit (Kapa Biosystems). PhiX sequencing control v3 (Illumina) was added at 1%, and paired-end sequencing (2 ϫ 250 bp) was performed on a MiSeq system (Illumina).
Quality control of the metagenomes and merging of sequences. Virome reads, in FASTQ files, were submitted to the FastQC project (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) to obtain summary statistics for quality control (QC) of the data sets. The reads were quality filtered (Q Phred, Ͼ20) from the results, and artificial duplicated sequences were removed. Twenty bases from the 3= ends of the reads were then trimmed using the stand-alone version of PRINSEQ (74) to remove low-quality bases. Forward and reverse paired-end sequences with good quality were merged using the SHERA algorithm (75) to extend the size of the obtained reads.
Virome assembly. The 12 quality-controlled Amazon viromes were combined, and sequences were cross-assembled using SPAdes (76) with default parameters. Scaffolds larger than 1 kbp were then screened for DNA coding sequences, identification, and initial annotation of proteins, tRNAs, and rRNAs using Prokka (77). Predicted proteins were queried against the NCBI nonredundant protein database with DIAMOND (78) and were annotated taxonomically and functionally according to the best-hit classification (E value, Յ10 Ϫ5 ).
Custom viral database and mapping of the Amazon viromes. The Amazon scaffolds were analyzed with VirSorter (online server, with the metagenome option) (79) to remove scaffolds of possible nonviral origin. Additionally, the assembled scaffolds were also analyzed with VirFinder (45) to compute the likelihood of the assembled sequences being of viral origin through a homology-independent approach. This analysis was performed with the default parameters of VirFinder. A custom viral database was built by adding the VirSorter-validated Amazon viral scaffolds with viral genomes from NCBI viral RefSeq, marine Mediterranean phages obtained from fosmid libraries (80), and prophages mined from bacterial genomes through VirSorter (79). To ensure that this database was nonredundant, the sequences were clustered through BLASTn (81), using values of 95% identity and 40% coverage cutoff. A profile of viral abundance was produced by mapping the raw reads of the 12 Amazon viromes against this custom viral database using Bowtie2 (82) with the -very-sensitive-local and -a options. Ambiguous read counts were corrected as described previously (83). The abundance of each sequence was corrected using the total amount of mapped reads to obtain the relative abundances of viral sequences. Additionally, the complete database (all Amazon scaffolds plus the reference genomes) was also mapped, which generated a complete profile of viral abundance.
A functional analysis of the Amazon viral scaffolds was also performed. First, only the virome reads that matched reference viral genomes or Amazonian scaffolds identified as viral by VirSorter (categories 1 and 2) were selected. Next, these reads were analyzed through the HUMAnN2 (84) analysis pipeline for functional annotation using the uniref90 EC filtered database as the reference. Parameters used to run HUMAnN2 were as follows: humann2-remove-stratified-output-bypass-nucleotide-search-threads 12-evalue 0.001-memory-use maximum-translated-subject-coverage-threshold 0 -translated-Virioplankton in the Continuum of the Amazon September/October 2017 Volume 2 Issue 5 e00366-17 msphere.asm.org 13 query-coverage-threshold 20 -identity-threshold 30. The HUMAnN2 results with respect to relative abundance levels were categorized according to the corresponding superclasses of MetaCyc (85).

Statistical analyses and diversity indexes of the Amazon viral scaffolds.
To infer similarities between locations, the mapped profile, based on the custom viral database, was used to build a nonmetric multidimensional scaling (NMDS) ordination, from a Manhattan distance matrix between samples, and a dendrogram, performed with the "hclust" package and "complete linkage" method, both of which were performed in R (70). Based on the mapping only on the reference viral genomes, data corresponding to the host's groups were obtained and used to infer the occurrence of possible viral hosts.
The abundance profile of the custom viral database was also used in a random forest analysis (86) to determine the most important scaffolds and reference genomes for the separation of river (Tapajós, Óbidos, north Macapá, south Macapá, Belém) and plume (St1, St3, St4, St6, St10, St11, St15) regions. The scaffold and genome architectures were drawn using EasyFig (87), ".gbk" data files from Prokka, and InkScape (http://www.inkscape.org). The possible taxonomical domains of the proteins of the viral genomes were determined by BLASTp (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGEϭProteins). All the annotated proteins were classified in terms of their general functions according to the UniProt (http:// www.uniprot.org) and Kegg (88) databases.
The mapped profile of the complete database was used to calculate the rarefaction curves, Shannon (89) and Simpson (90) diversity indexes, and richness values, using the R program (70) with the "vegan" package (91).
Automated taxonomical and functional virome annotation and ordination analysis. The viromes were submitted to the Metagenomic RAST server (MG-RAST) (92) to obtain summaries of sequence data (metagenome yield, mean sequence size, mean %GC content) and to perform automated taxonomic binning and functional assignment. The 12 metagenomes were classified in MG-RAST against a GenBank database (E value, Յ1e Ϫ5 ) which includes Viral RefSeq. Only the sequences that were assigned as pertaining to the viral domain were used for functional analysis against the Subsystems database of MG-RAST, through the "Workbench" tool. The Shannon diversity index of viral functions was calculated using the R program (70) with the "vegan" package (91). The viromes were also analysed using the SILVA database (https://www.arb-silva.de/), through MG-RAST, to assess the number of SSU (small ribosomal subunit) and LSU (large ribosomal subunit) sequences, to evaluate possible cellular contamination.
The ordination of the results of canonical analysis of principal coordinates (CAP) (93) was performed in the R program (70) with the "vegan" package (91). The MG-RAST viral taxonomic matrix was log transformed [log 10 (x ϩ 1)], the Bray-Curtis distance was calculated, and the data were compared to a constraint matrix of 10 chosen physical-chemical parameters (Table 1). These constraints were not correlated and were most important for the viral family distribution, based on a "bioenv" analysis, also performed in R. A permutational multivariate analysis of variance (PERMANOVA) (94), based on the Bray-Curtis distance and performed in R, was used to calculate the statistical significance of the CAP ordination data. Considerations regarding the methodological approaches adopted here are reported (see ЉCaveatsЉ [Text S1 in the supplemental material]).
Data availability. The viromes are available in the MG-RAST server (project "AmazPluma," number mgp8766) under the following accession numbers: for Belém, mgm4559916. 3

ACKNOWLEDGMENTS
We are very grateful to Cecília Pereira for her work during the river samplings. We thank CNPq, CAPES, and FAPERJ for support. We also thank CNPq for a PVE grant