Gut Virome Analysis of Cameroonians Reveals High Diversity of Enteric Viruses, Including Potential Interspecies Transmitted Viruses

Despite the availability of diagnostic tools for different enteric viral pathogens, a large fraction of human cases of gastroenteritis remains unexplained. This could be due to pathogens not tested for or novel divergent viruses of potential animal origin. Fecal virome analyses of Cameroonians showed a very diverse group of viruses, some of which are genetically related to those identified in animals. This is the first attempt to describe the gut virome of humans from Cameroon. Therefore, the data represent a baseline for future studies on enteric viral pathogens in this area and contribute to our knowledge of the world’s virome. The studies also highlight the fact that more viruses may be associated with diarrhea than the typical known ones. Hence, it provides meaningful epidemiological information on diarrhea-related viruses in this area.

D iarrhea is the second most common cause of death worldwide and accounts for about 8 to 9% of the 5.9 million yearly deaths in children under the age of 5 (1, 2). Most of these deaths occur in Southeast Asia and sub-Saharan Africa (3,4). The chances of infection with enteric viruses are higher in developing countries than developed countries, probably due to suboptimal sanitation and hygienic conditions and low quality of drinking water, especially in rural areas (5). In Cameroon, a limited number of studies have investigated the prevalence of enteric pathogens as the cause of gastroenteritis in humans. These studies mainly focused on the epidemiology of a limited number of pathogens such as rotavirus, norovirus, and enteroviruses, revealing significant differences in the prevalence of these viruses in different settings and time periods (4,6,7). In parts of Cameroon, a high prevalence of several enteric viruses such as enterovirus, norovirus, rotavirus, and adenovirus was found in children and adults (8). Generally in Africa, many episodes of gastroenteritis remain unexplained as no etiological agent is determined (9,10). A proportion of the unexplained gastroenteritis cases are likely due to other known viruses, for which no tests were performed. However, a part of these gastroenteritis cases could also be caused by novel viral agents.
Transmission of these enteric viruses is predominantly fecal-oral, and humans are constantly exposed to these viruses through various routes (11). One of these routes is zoonosis from reservoirs in wild or domestic animals, either by insect vectors or by exposure to animal droppings or tissues. One rich but, until recently, underappreciated reservoir of emergent viruses is bats. Of the ϳ5,500 known terrestrial species of mammals, about 20% are bats (12). Several viruses pathogenic to humans are believed to have originated in bats over the last several years, including severe acute respiratory syndrome (SARS)-and Middle East respiratory syndrome (MERS)-related coronaviruses, as well as filoviruses, such as Ebola and Marburg viruses, or henipaviruses, such as Nipah and Hendra viruses (13)(14)(15)(16)(17)(18).
In the Southwest region of Cameroon, bats are hunted and eaten. Such close interactions provide ample opportunity for zoonotic events to occur (19).
Previously, we identified a plethora of known and novel eukaryotic viruses in Cameroonian fruit bats using a viral metagenomics approach, including viruses known to cause gastroenteritis in humans (sapovirus, sapelovirus, and rotaviruses A and H) and those not yet associated with gastroenteritis (bastrovirus and picobirna-like viruses) (20)(21)(22)(23). In the current study, we metagenomically screened 221 human fecal samples collected in the same region (where bats are hunted and eaten), to assess (i) if any viruses of animal origin could be identified and (ii) which known human gastrointestinal viruses were present. These fecal samples were collected from children less than a year old to adults of more than 60 years who had gastroenteritis and/or were in contact with bats. Additionally, since the gut virome typically contains both eukaryotic and prokaryotic viruses (phages), of which the latter usually represents the largest fraction of the gut virome, we also analyzed the phageome of these samples.

RESULTS
Sample characterization. A total of 221 human fecal samples (131 from Kumba and 90 from Lysoka) were collected from two hospitals in the Southwest region of Cameroon, for viral metagenomics screening. From these fecal samples, a total of 63 pools were constituted in categories based on age, bat contact status, and location (see Table S1 in the supplemental material). Illumina sequencing of all the 63 human pools generated in total approximately 708 million raw paired-end (PE) reads (between 4.3 and 53.4 million reads per pool). After trimming, 67% of the reads (471 million) were retained and 86% of these retained trimmed reads (405 million) were annotated using Diamond. Of these, 18% (74 million) could be attributed as viral.
NGS viral read distribution/abundances. In each of the categories of pools, phages make up at least 84% of the total number of viral reads while the maximum proportion of eukaryotic viral reads is 16%. A similar annotation profile was observed for pools of patients in different age groups, different locations, and different bat contact statuses (Fig. S1).
Further analysis of eukaryotic viral reads revealed that at least 70% of the reads mapped to viruses of the families Astroviridae, Reoviridae, and Anelloviridae (Fig. 1A). Other viruses were also present, particularly those that are known to cause gastroenteritis belonging to the families Adenoviridae, Caliciviridae (Sapovirus and Norovirus), and Picornaviridae (of which about 60% were enteroviruses [Fig. 1B and Fig. S2]). Also, reads from viruses known to cause other human diseases (Parvoviridae) or other animal diseases (Circoviridae) or not associated with any diseases at all (Picobirnaviridae) were present in variable numbers in the different groups ( Fig. 1B to E). The rest of the viral families were either plant-or insect-associated viruses. Notably, in age groups A to D, the percentage of pools in which Picobirnaviridae viruses were present increased with age with low percentages in age groups A and B (Fig. 1C). Also, the percentages of pools positive for anelloviruses differed with respect to age, with higher percentages in young children and the elderly. Further, there were no observable trends in the percentage of eukaryotic viral presence with respect to bat contact status or location ( Fig. 1D and E). Figure 1F shows a heat map of the percentage of pools in which eukaryotic viral families were present in human and bat pools, while Fig. S3  presence in human and bats at the genus level (23). Astroviridae (Mamastrovirus), Calciviridae (Sapovirus), Picornaviridae (Parechovirus), and Reoviridae (Rotavirus), viral families known to cause gastroenteritis in humans, were identified in both bat and human pools from the same region. Also, mammalian viruses not yet established to cause gastroenteritis (Picobirnaviridae, Circoviridae, and Parvoviridae [Bocaparvovirus]) were also common in both bats and humans from the same regions ( Fig. 1F and Fig. S3). Phylogeny of eukaryotic viruses. In this study, we focused on viruses from which near-complete genomes were obtained, particularly those that are known to cause viral gastroenteritis (belonging to the Astroviridae, Caliciviridae [norovirus and sapovirus], Picornaviridae [enterovirus, parechovirus, cosavirus], Parvoviridae, Reoviridae, and Adenoviridae [human mastadenovirus]). Furthermore, we also looked at other viruses not fully proven to cause gastroenteritis in humans but which have only sporadically been associated with gastroenteritis, like Picobirnaviridae and small circular singlestranded DNA viruses.
Phylogenetic analysis was done for each of the selected viruses using the protein or nucleotide sequences of suitable conserved regions and representative members of their viral family, genus, or species.
Reoviridae. Reoviridae is a large viral family of segmented dsRNA viruses with a wide host range. They are further divided into two subfamilies and 15 genera. Genomes of viruses belonging to the Reoviridae contain 9 to 12 segments (24). In total, Reoviridae reads were found in 6 pools, and (nearly) complete genomes of 2 viruses of the family Reoviridae were obtained from pool HP55. Samples in this pool were from two diarrheic children (less than 5 years), originating from Kumba and without contact with bats.
Mammalian orthoreovirus. Mammalian orthoreoviruses (MORVs) contain 10 segments, L1 to L3, M1 to M3, and S1 to S4, coding for 12 to 13 proteins (24, 25). A MORV strain was identified represented by 16,913 reads (0.4% of all viral reads of the pool). Phylogenetic analysis based on the nucleotide sequences of each of the 10 segments of this MORV ( Fig. 2 and Fig. S4) showed topological incongruence with four distinctive patterns. Based on segments L2 and S1, this strain clustered with bat strains WIV3 and WIV5 from China with 86% and 70% nucleotide (nt) identity, respectively ( Fig. 2A and  B). For the L1 and S2 segments, the human strain clustered with the Ndelle murine strain, also from Cameroon, with 95% and 92% nt identity, respectively (Fig. 2C and D). On the other hand, segment S3 of the Cameroonian MORV strain clustered with a human strain and a civet MORV strain from China (88% and 89% nt identity, respectively [Fig. 2E]). The rest of the segments (L3, M1 to M3, and S4) did not cluster together clearly with any of the abovementioned strains (Fig. S4).
Rotavirus A. Rotavirus A (RVA) contains 11 segments coding for 11 or 12 proteins: VP1 to VP4, VP6, VP7, and NSP1 to NSP6 (26,27). We identified a near-complete RVA sequence which made up 99% (4.3 million) of the eukaryotic viral reads of that pool. The NSP3 segment was not identified in the sample. The VP7 gene of this strain was genetically most related to RVA/Human-tc/USA/Wa/1974/G1P1A [8] and RVA/Human-TC/USA/Rotarix/2009/G1P [8] (nt identity of 92 and 97%, respectively) while the VP4 gene was 90% identical to the same strains. The phylogenetic trees of the remaining segments shared the same clustering pattern (Fig. 3A and B and Fig. S5). According to the rotavirus classification scheme, this strain is a typical Wa-like G1P [8] named RVA/ Human-wt/CMR/CMRHP55/2014/G1P [8]. CMRHP55 was distantly related to bat RVA strains identified from the same regions (only 69 to 71% nt identity).
Picornaviridae. The Picornaviridae represent a large family of small, cytoplasmic, nonenveloped icosahedral ssRNA viruses consisting of 80 species, grouped into 35 genera. They have a genome of 7.1 to 8.9 kb in size and are most often composed of a single ORF encoding a polyprotein flanked by a 5= and 3= UTR (28). The members of the family Picornaviridae can cause gastroenteritis, meningitis, encephalitis, paralysis (nonpolio and polio-type), myocarditis, hepatitis, upper respiratory tract infections, and diabetes (29,30). Out of the 63 pools, 41 contained Picornaviridae reads, making the Picornaviridae the eukaryotic viral family of which reads could be identified in the highest number of pools.
Enterovirus. The genus Enterovirus (EV) consists of 15 species: Enterovirus A to L and Rhinovirus A to C. EV A, B, C, and D are found in humans; E and F in cattle; G in pigs; H, J, and L in monkeys; K in rodents; and species I in dromedary camels (http://www .picornaviridae.com). In this study, eighteen (nearly) complete genomes of EVs were obtained. The strains were named EV/Human/CMRHPxx/CMR/2014, here referred to as EV-CMRHPxx. All eighteen genomes were found in pools of age groups A and B (Ͻ3 and 3 to 20 years, respectively). Eight of these were identified in age group A, three (EV-CMRHP1, 5A, and 5B) of which were pools consisting of samples of infants who had indirect contact with bats while the rest (EV-CMRHP14, 45, 52A, 52B, and 55) were those that had no contact with bats. The ten other strains were identified in pools belonging to age group B, three of which had direct contact with bats (EV-CMRHP8A, 8B, and 9), 5 indirect contact 4,35A,35B,and 39) and two with no contact . Based on the phylogenetic analysis of the VP1 nucleotide sequences, the EVs found in this study were quite divergent from each other, belonging to three different species of Enterovirus, A, B, and C (Fig. 4A). Most of the strains belonged to the Enterovirus C clade (EV-CMRHP1, 3, 4, 8A, 8B, 9, 14, 18, 35A, 52A, and 55), while EV-CMRHP35B, 39, and 45 clustered within the Enterovirus B genotype, and EVCMRHP5A, 5B, 52B, and 58 in the genogroup Enterovirus A. Some pools had multiple strains of EV present, and some of these clustered together (CMRHP8A and 8B: vaccine type PV-3), whereas other pools contained distinct EV species (EV-CMRHP35A and 35B; 52A and 52B). The presence of vaccine strains  in pool HP8 probably indicates recent vaccination events of the infants in this pool. Apart from EV-CMRHP39 (which clustered with 11C52_CMR), all the EV strains identified here were distantly related to those previously identified in the Far North region of Cameroon (31). Furthermore, none of the human strains from Cameroon were related to any of the animal EV strains (from chimp or gorilla). A summary of the detailed classification of these EVs using an online typing tool (32) is shown in Table 1.
Parechovirus. The genus Parechovirus is comprised of two species, Parechovirus A (human parechovirus [HPeV]) and Parechovirus B (Ljungan virus, isolated from bank voles) (33). HPeV is subdivided into 19 types (HPeV1 to -19). HPeV is associated with mild gastrointestinal or respiratory illness; however, severe disease conditions, such as meningitis/encephalitis, acute flaccid paralysis, and neonatal sepsis, may occur (34)(35)(36). Here, three (nearly) complete HPeVs were identified in pools HP2, HP46, and HP48 with sequence lengths of 7,142 bp, 7,202 bp, and 7,219 bp, respectively, collected from children less than 3 years old (age group A). In terms of bat contact status, they were in pools of those either in indirect contact with bats (HP2 and HP48) or without contact (HP46). They were all distantly related to each other, with HPeV-CMRHP46 and HPeV-CMRHP48 having the highest identity (76% and 86% nt and aa identity, respectively). Phylogenetically, HPeVs in HP46 and in HP48 fell into a clade of type 1 HPeVs (Fig. 4B). The HPeV in HP46 clustered together with HPeV1/Harris strain with 76% nt identity, while CMRHP48 clustered closely with Japanese and Norwegian strains A1086-99 and NO-3694 (84 to 90% nt identity). Furthermore, HPeV-CMRHP2 clustered distantly with type 16 HPeVs from China and Bangladesh with only 70 to 71% nt identity. Considering the 75% identity demarcation for HPeV types (37,38), this strain potentially represents a novel type.
Cosavirus. The genus Cosavirus consists of five species (Cosavirus A, B, and D to F), which have been associated with gastroenteritis in children (39). Six near-complete  [8] strains. Red, Cameroonian human RVA strain identified in this study; blue, Cameroonian bat RVA strains. Trees were constructed human cosavirus (HCoSV) genomes were identified: 1 from children less than 3 years old (HP49), 3 from those between 3 and Ͻ20 years old (HP6A and HP6B, HP57), and 2 from pools of individuals between 20 and Ͻ60 years old (HP44, HP24). Some of these pools had direct or indirect contact with bats (HP6, HP24, and HP44), while others had no contact with bats (HP49 and HP57). Phylogenetic analysis (Fig. 4C) showed that cosaviruses from HP6B, HP49, and HP57 formed a clade with two other strains from Australia and Nigeria (HCoSV/E1/AUS and HCoSV/NG385/NGA) in species HCoSV E. Meanwhile the strains in HP6A, HP24, and HP44 clustered with HCoSV in species A, D, and B, respectively. Therefore, it seems that humans in Cameroon host a diverse range of cosaviruses.
Cardiovirus. The genus Cardiovirus consists of three species, Cardiovirus A to C. Species B includes Saffold virus (SafV) infecting humans. It has been found in cases with acute flaccid paralysis, respiratory tract infections, and diarrhea in China (40)(41)(42). Here, we found a near-complete genome of a SafV in one pool (HP35) belonging to the age group between 3 and Ͻ20 years old who had indirect contact with bats. The VP1    Phylogenetic trees were based on the nucleotide sequences of the VP1-P2A region for the species Hepatovirus A and the VP1 region for the rest of the genera. All the trees were constructed using the GTRϩGϩI nucleotide substitution model using RAxML, with the autoMRE flag, which enables a posteriori bootstrapping analysis. Only bootstrap values greater than 70% are shown. Bars indicate nucleotide substitutions per site. Red, novel strains from this study; blue, human Cameroonian enterovirus strains from other studies; green, animal enterovirus strains from Cameroon. segment of the identified SafV was 72 to 74% and 78 to 80% identical (on nt level) to SafV strains in types 5 and 6, respectively. Phylogenetic analysis based on the VP1 region confirmed the clustering of the novel strain between types 5 and 6 with more phylogenetic relatedness to type 6 ( Fig. 4D). Hence, this novel SafV strain may be a distant member of type 6 or represent a new type.
Hepatovirus A. Hepatitis A virus (HAV), now Hepatovirus A, belongs to the genus Hepatovirus, which consists of nine species (Hepatovirus A to I). The Hepatovirus A species is comprised of a single serotype, HAV, subdivided into human and simian viruses (43). It causes acute hepatitis throughout the world (44). There were three (nearly) complete HAV genomes in pools HP2, HP4, and HP6, all of which were pools from those in direct (HP6) or indirect (HP2 and HP4) contact with bats. These strains were either from infants less than 3 years old (HP2) or from children between 3 and Ͻ20 years old (HP4 and HP6). Based on the VP1-P2A region, the nt identity between these strains was 98 to 99%. Strains in HP4 and HP6 were 99% identical to BRAB13, isolated from a patient from the Netherlands in 2001, who was staying in a hippie community with visitors from all over the world and under primitive living conditions (45). On the other hand, the HAV strain in HP2 was closely related to strain G2B1-VP from France (98% nt identity). Therefore, all strains identified here are genotype IIA (Fig. 4E), increasing the number of completely sequenced genotype II strains to five (the other two strains are BA/ITA/2012 and CF53/Berne).

FIG 5
Phylogenetic trees based on the nucleotide sequences of the RdRp (A) and capsid (B) genes of the AstVs identified in this study and representative strains from GenBank. Trees were constructed using the GTRϩGϩI nucleotide substitution model using RAxML, with the autoMRE flag, which enables a (Continued on next page) formed a clade (in the RdRp tree) with other bat astroviruses from Guangxi but was distantly related to the human AstVs from the same region.
Caliciviridae. Caliciviridae are a family of nonenveloped viruses with a linear ss-RNA(ϩ) genome of 7.3 to 8.3 kb, containing two or three ORFs. The family contains five genera (47,48). In total, Caliciviridae reads were found in 16 pools belonging to either the Norovirus or Sapovirus genus.
Norovirus. This genus consists of a single species, Norwalk virus (NV), divided into 5 genogroups. Genogroups I, II, and IV infect humans, whereas genogroup III infects bovine species and genogroup V has been isolated from mice (49). Three nearcomplete NVs were present in the 16 pools that contained Caliciviridae reads (HP1, HP18, and HP59), from people who had indirect (HP1 and HP59) or no (HP18) contact with bats, and from age group A (HP1), B (HP18), or C (HP59). The phylogenetic tree (Fig. 6A) showed that the four NVs belonged to two genogroups: I (NV_CMRHP18, genotype I.3) and II (NV_CMRHP1 and NV_CMRHP59, genotypes II. 12 and II.13,respectively). The novel strain NV_CMRHP18 was more than 98% similar to strain C13/ 2009CMR_GI.3 (a partial sequence [JF802509]) isolated from the Littoral Region of   Cameroon in 2009, whereas strains of genogroup II from the same study (II.4,II.8,II.17) were distantly related to those identified here (II.12 and II.13) (7). Strains from this previous study were not included in the phylogenetic analysis because only 200 to 300 nt of the capsid region was available in databases. Sapovirus. The genus Sapovirus (SaV) consists of a single species, Sapporo virus. It has been detected in humans, pigs, minks, dogs, sea lions, bats, chimpanzees, rodents, and carnivores (50,51). Three near-complete SaV genomes were present in pools HP4 (age group B), HP15 (age group A), and HP22 (age group D) from people who were in indirect contact, were not in contact, and were in direct contact with bats, respectively. Phylogenetic analysis (Fig. 6B) showed that SaV from HP22 could be classified as a GIV genotype, and the SaVs HP4, HP53, HP46, HP56, and HP15 belonged to genotype GII. The phylogenetic tree showed that the bat SaVs found in Cameroon (in blue) (22) clustered together and formed a clade with other bat SaVs from China and Hong Kong but divergent from these human SaVs, indicating no evidence of interspecies transmission of SaVs in this region.
Picobirnaviridae. Picobirnaviruses (PBVs) belong to the family Picobirnaviridae, genus Picobirnavirus, and are small bisegmented dsRNA viruses with a total genome size of about 4 kb. Segment one encodes a polyprotein, containing the capsid protein, and segment two encodes the RdRp. Based on the RdRp gene, PBVs are classified into two genogroups. Although PBV is genetically highly diverse and has been found in stool samples of a broad range of mammals, its true host(s) remain(s) enigmatic. The disease association is unclear, but PBV infection has been associated with gastroenteritis in both animals and humans (52,53). Up to 28 out of the 63 pools contained reads annotated as Picobirnaviridae with most of the positive pools from individuals in age groups above 20. We could obtain 37 (near-complete) RdRp sequences of PBVs from these 28 pools. Phylogenetic analysis based on RdRp (Fig. 7) revealed the clustering of the novel strains in four different clades: in genogroup I (26 strains), in genogroup II (9 strains), and in 2 clades (3 strains) of uncharacterized picobirna-like viruses that use an alternative mitochondrial invertebrate genetic code (Lysoka picobirna-like virus CMRHP9 and CMRHP10B and Kumba picobirna-like virus CMRHP21A). Interestingly, a wolf PBV strain from Portugal (ANS53886) from genogroup I clustered together with human strains from Cameroon with an aa identity of 76% with strains CMRHP26A and CMRHP35. Likewise, in genogroup II, strains CMRHP34A, CMRHP63B, and CMRHP26C clustered closely (75 to 76% aa identity) with a Portuguese feline strain (AGZ93689). Intriguingly, the Cameroonian human picobirna-like viruses CMRHP9 and CMRHP10B were 99% identical to a Cameroonian bat strain picobirna-like virus, P11-300, suggesting a possible interspecies transmission. However, their true host has not yet been determined. It could be that the true hosts of PBVs are found in both humans and bats and that this therefore explains their presence in both.
Small circular, Rep-encoding, ssDNA (CRESS-DNA) genomes. (i) Smacovirus. Smacovirus (SCV) is a relatively recently described virus with a small circular DNA genome with a size of about 2,529 bp. It belongs to the smacovirus group and is an unclassified eukaryotic virus of unknown origin (54). In this study, we identified two SCV sequences, one complete genome (HuSCV-CMRHP10) and a near-complete genome (HuSCV-CMRHP03). They were identified in pools of patients belonging to age group B, coming from Lysoka and having direct (HP3) or indirect (HP10) contact with bats. These strains shared 99% amino acid identity. Their replicase genes were 94% and 95% identical to chimpanzee (KP233190) and human (HuSCV3, KT600069) strains from the United States, respectively. Based on the capsid region, these novel Cameroonian strains were 98 to 99% identical to the chimp strain and only 85% identical to the human strain HuSCV3. The close genetic relatedness of human strains to a strain found in a chimpanzee sample suggest that these viruses infect a host shared between chimps and humans, and if indeed smacovirus infects mammals, this could be a case of interspecies transmission (55). Phylogenies of the replicase (Fig. 8A) and the capsid genes ( Fig. 8B) indeed showed a cluster of these Cameroonian strains with a human Genogroup I    and pecovirus (C and D) of the replicase and capsid genes, respectively. The trees were constructed using the LGϩGϩI substitution model using RAxML, with the autoMRE flag, which enables a posteriori bootstrapping analysis. Only bootstrap values greater than 70% are shown. Bars indicate amino acid substitutions per site. Red, Cameroonian human strains; blue, previously known human smacoviruses or pecovirus. and a chimpanzee strain from the United States. However, the topological inconsistency in the replicase and capsid trees may suggest a recombination event between these strains in the (distant) past.
(ii) Pecovirus. Pecoviruses (Peruvian stool-associated circo-like viruses [PeCVs]) are CRESS-DNA genomes that were first identified in the feces of a patient during an outbreak of acute gastroenteritis in the Netherlands and later in samples of Peruvian children (55,56). Subsequently, they were identified in other humans, pigs, a dromedary camel, and a seal (55)(56)(57)(58). Here we identified a genome sequence (HuPeCV-CMRHP60) of 2,937 bases made up of two ORFs that code for a capsid (372 aa) and a replicase protein (336 aa). Unlike other human PeCVs, the Cameroonian strain shared the same canonical nonamer (NANTATTAC) atop the predicted stem-loop structure with seal, dromedary, and porcine PeCV strains. The Rep showed 31 to 42% aa sequence identity to all other Rep genes, and a Rep-based phylogenetic analysis (Fig. 8C) showed that HuPeCV-CMRHP60 clustered together with pecovirus genomes from a seal and 3 human strains. Based on the cap protein (Fig. 8D), the Cameroonian strain was only 22 to 42% identical to all other pecoviruses and clustered only distantly from the seal strain, a porcine strain, and the human strains. This demonstrates the existence of a high level of genetic diversity within this group of circular DNA genomes, pointing to the possible existence of multiple species in this clade. Furthermore, we identified 2 incomplete sequences related to sewage-associated circular DNA molecules recovered from a sewage treatment oxidation pond in New Zealand (59), with only 38% aa identity on the Rep protein, further expanding the great diversity of CRESS-DNA genomes in the Cameroonian population.
Bacteriophages. Bacteriophages are viruses that infect and replicate within bacteria. Their presence therefore reflects the gut microbiota of the patients. Because most of the obtained viral reads were classified as bacteriophages, we further investigated the bacteriophage composition of the human samples. With VirSorter (60), a tool developed to identify highly divergent dsDNA phages from metagenomics data, 5,905 of the contigs in our data set were identified as phages. From these, the tool Diamond (61) annotated 2,647 as bacterial, 21 as metazoan, and only 606 (ϳ10%) as viral, while 1,309 contigs remained unannotated. From the contigs annotated as viral by Diamond, most were phages belonging to the Myoviridae (236 contigs), Podoviridae (95 contigs), Siphoviridae (145 contigs), and Microviridae (36 contigs) families. To get insight into the differences in the bacteriophage communities, we compared the VirSorter-identified bacteriophage richness between the different age groups (Fig. S6A), locations (Fig. S6B,) and bat contact status (Fig. S6C), all of which showed no significant differences. To identify the potential bacterial hosts of these phages, we searched for bacterial CRISPR spacer sequences in the phage contigs, to identify its potential host. The search revealed that the most likely hosts of these phages are bacteria of the families Bacteroidaceae, Bifidobacteriaceae, Enterobacteriaceae, Enterococcaceae, Erysipelotrichaceae, Eubacteriaceae, Lactobacillaceae, Odoribacteraceae, Streptococcaceae, and Veillonellaceae (Table S2).
Network analysis of human and bat phageomes. In order to visualize the genetic relatedness between the human and bat gut phageome, a recently developed bioinformatics tool (vConTACT) was used. It groups phages based on their genome sequences into viral clusters which correlate rather well with viral genera as defined by the International Committee of Taxonomy of Viruses (ICTV) (62). A total of 30,875 protein clusters were predicted using the prokaryotic and archaeal RefSeq combined with the proteins predicted from the phage contigs identified from the human and bats pools using VirSorter. Using a network analysis approach (Fig. 9), 792 viral genome clusters were predicted of which 173 contained reference phages together with bat or human phage contigs, whereas the rest contained only bat, only human, or bat-human clusters. Figure 9 shows that both Cameroonian human and bat phage contigs identified in our studies are spread across the known phage sequence space. However, several of the phage contigs constituted completely new clusters (indicated by filled gray ovals), completely unconnected to phages in the reference database. Also, the genetic diversity of several previously known phage subclusters was significantly expanded (as indicated by open ovals) while some clusters (in brown ovals) were made up of only bat and human phages identified in this study.

DISCUSSION
Recently, we thoroughly investigated the gut virome of fruit bats from Cameroon (20)(21)(22)(23)63) and showed the presence of many novel and divergent eukaryotic viral families, including viruses known to cause gastroenteritis in humans. The aim of the current study was to investigate the gut virome of humans (n ϭ 221) from Cameroon and to further determine if bat viruses are possible causative agents of gastrointestinal infections in humans.
Rotavirus A (RVA), a common viral gastroenteritis-causing agent, was identified only in a limited number of pools. This was previously observed in Cameroon, and possible reasons for the low prevalence could include the acute nature of rotavirus infections or seasonal changes in rotavirus infections (6). Of note, rotavirus vaccination was introduced in Cameroon in April 2014, coinciding with the period of sample collection of this study (February to September 2014); however, the vaccination campaign had not started in the sampling locations within this period, and therefore, the result represents a prevaccination rotavirus prevalence status. The identified rotavirus strain showed 3% nt differences with the vaccine strain, further suggesting that this was a wild-type RVA strain, rather than a vaccine-derived strain.
Uncommon human gastroenteric virus: mammalian orthoreovirus (MORV). This first MORV strain from Cameroon showed topological incongruence in its phylogeny, thereby pointing to possible reassortment events in the past. The phylogenetic clustering of some segments to strains from animals (rodents and bats) could be an indication of a zoonotic event or could also be due to the absence of related strains in databases from an unknown host. Given that this strain was from a pool of samples from two children suffering from severe diarrhea, it is not unlikely that this strain might have contributed to the disease. Therefore, MORV might be playing a greater role in diarrheal diseases in this region than was previously known. Hence, extensive epidemiological studies in different regions and in different hosts are required to fully delineate the prevalence, genomics, and interspecies transmissibility of MORV.
Viruses not (yet) associated with gastroenteritis: Picobirnaviridae, smacovirus, and Anelloviridae. Apart from the above-mentioned gastroenteritis-related viruses, several other viruses with unelucidated gastroenteric roles were also identified in this study. First, we observed fewer reads of picobirnaviruses (PBVs) in pools from children than in adults. Previous studies also detected a relatively low percentage of children with PBVs (67,68). This therefore adds up to the notion that PBVs are likely to be absent in infants and young children and only start to increase with age and potentially a changing diet, though this needs to be further proven (69,70). Interestingly, the genetic relatedness of a human picobirna-like virus with one that was found in a bat pool from the same region suggests an interspecies transmission. However, these picobirna-like sequences are translated using an alternative mitochondrial codon, indicating that their hosts may not be mammals. A principal component analysis of the codon usage bias of different known mitochondrial genome sequences, mitoviruses, and PBVs seems to suggest that they may have the same lifestyle as mitoviruses known to infect fungal mitochondria (71). However, the recent identification of a bacterial ribosomal binding site in PBV genomes suggests prokaryotes as a potential host (72). Given that the mitochondria have descended from ancient eubacterial endosymbionts (73), this may explain the clustering of these PBVs with mitoviruses. Therefore, the question about the true host of PBVs remains controversial.
Second, for the first time, two strains of African smacovirus (SCV) were identified in Cameroonian samples. Their genetic relatedness to a chimpanzee strain (isolated from a captive chimp in a zoo in San Francisco) and a strain from a child from the United States (54,55) indicates either an interspecies transmission event or the presence of a shared viral host in both humans and chimps. Although the role of smacovirus in gastroenteritis has not been elucidated, their presence in cases of unexplained diarrhea in French patients seems to indicate a potential role in gastroenteritis (54); hence, these could be instances of interspecies transmissions.
The percentages of pools positive for anelloviruses were higher in age categories of children and the old and lower for the middle-aged groups. Given the well-established notion that infants and the elderly have reduced immunity (74,75), this could be in line with previous studies that suggest a link between the burden of anelloviruses and host immune competence (76)(77)(78). Despite their ubiquity, Anelloviridae have an undefined implication in hosts' health and are thought to be probably asymptomatic (harmless) or even beneficial. However, they have been associated with hepatitis, pulmonary diseases, hematologic disorders, myopathy, and lupus, but it is not clear if their presence is the cause or the result of disease progression (79)(80)(81)(82).
Human viruses and interspecies transmission from bats. In bats from the same area, we were able to identify gastroenteritis-related and nonrelated viruses. Here, the corresponding viruses identified from the families Astroviridae (astrovirus), Caliciviridae (sapovirus), and Reoviridae (RVA) are genetically diverse from those identified in bats from the same region, indicating no evidence of recent interspecies transmissions between bats and humans (63). However, genetic relatedness of human MORV to animal strains showed the possibility of zoonosis between humans and not only bats but animals in general. Additionally, the presence of some Cameroonian strains of SCV and PBV in bats or other animals would indicate interspecies transmissions if their infectivity in these animals is fully elucidated.
Human and bat phageome. In this study, we detected a huge phage community with a great diversity beyond the range of known bacteriophages in reference databases, potentially representing the gut microbiome diversity in the patients (83,84). Overall, this further supports the idea that the full phageome richness is still to be completely elucidated (85). Furthermore, network analysis indicates the presence of completely novel phage groups and that phage genera in the gut microbiota might be shared between humans and bats.
Conclusion. Several diverse viruses were discovered in the gut virome of Cameroonians. Some of these were already known to be the causative agent of gastroenteritis, whereas others are likely to be the cause of gastroenteric problems in the patients. Further screening of patients for these viruses will be needed to establish their prevalence in the population, allowing for more appropriate measures and treatment and prevention of viral gastroenteritis. Also, to be able to completely elucidate the role of the novel viruses like pecovirus and smacovirus, more studies are required. Further attention should also be given to newly identified viruses (for example, MORV) and their potential as emerging pathogens in the human population.

MATERIALS AND METHODS
Ethical authorization. Ethical authorization for the use of human samples was obtained from the Cameroon National Ethics Committee, Yaoundé. All human experiments were performed in accordance with the Ministry's National Ethics Committee guidelines.
Sample collection and preparation. Human fecal samples were collected between February and September 2014, after informed consent was obtained from patients in two different hospitals (Lysoka Health District and Kumba District Hospital of the Southwest region of Cameroon). This region was chosen because here bats are hunted, sold, and eaten. Diarrheic patients and/or people who came into contact with bats directly (by eating, hunting, or handling) or indirectly (if a family member was directly exposed to bats) were eligible for sampling. A total of 221 samples were collected from subjects between age 0 and Ͻ3 years (age group A, 80 samples), 3 and Ͻ20 (age group B, 63 samples), 20 and Ͻ60 (age group C, 65 samples), and 60 and older (age group D, 13 samples). All the samples were from people who had symptoms of gastroenteritis, except 2 from age group C who had contact with bats. Samples were then placed into labeled tubes containing universal transport medium (UTM), placed on dry ice, and stored at Ϫ20°C, until being shipped to the Laboratory of Viral Metagenomics, Leuven, Belgium. The samples were stored at Ϫ80°C until used (63).
Fecal samples were first diluted using UTM, and equal volumes of the dilutions were pooled based on the location, age, and bat contact status (direct, indirect, or none). Each pool contained two to five samples, and for the different age groups (A to D) we had 22, 17, 20, and 4 pools, respectively. The pools were then treated according to the NetoVIR protocol (86). Briefly, the pools (10% [wt/vol] fecal suspensions) were homogenized for 1 min at 3,000 rpm with a Minilys homogenizer (Bertin Technologies) and filtered using an 0.8-m PES filter (Sartorius). The filtrate was then treated with a cocktail of Benzonase (Novagen) and micrococcal nuclease (New England Biolabs) at 37°C for 2 h to digest free-floating nucleic acids. Total nucleic acids (both RNA and DNA) were extracted using the QIAamp viral RNA minikit (Qiagen) according to the manufacturer's instructions but without addition of carrier RNA to the lysis buffer. First-and second-strand synthesis and random PCR amplification for 17 cycles were performed using a slightly modified whole-transcriptome amplification (WTA2) kit procedure (Sigma-Aldrich). WTA2 products were purified with MSB Spin PCRapace spin columns (Stratec), and the libraries were prepared for Illumina sequencing using a slightly modified version of the Nextera XT library preparation kit (Illumina), which is described in detail in reference 86. Samples were pooled in an attempt to obtain an average of approximately 10 million paired-end reads per pool. Sequencing was performed on a NextSeq 500 high-output platform (Illumina) for 300 cycles (2 ϫ 150-bp paired ends).
Genomic and phylogenetic analysis. NGS reads were analyzed as described in the work of Yinda et al. (20,63). Briefly, raw reads were trimmed using Trimmomatic (parameters: HEADCROP:  and FastUniq to remove identical reads. The de novo assembly or reads and annotation of reads were performed using SPAdes (with the meta flag) and Diamond (with the sensitive option using the GenBank nonredundant database), respectively (61,87,88). Open reading frames (ORFs) of contigs of interest were identified and further analyzed for conserved motifs in the amino acid sequences using NCBI's conserved domain database (CDD) (89). Nucleotide and amino acid alignments of viral sequences were done with MUSCLE implemented in MEGA7 (90) or MAFFT (91). Substitution models were determined using ModelGenerator (92), and phylogenetic trees were constructed using RAxML (93), with the autoMRE flag, which enables a posteriori bootstrapping analysis. All trees were visualized in FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and midpoint rooted for purposes of clarity.
Phageome analysis. Contig annotation with DIAMOND is dependent on the accuracy of the database used, and in most databases, phages are poorly annotated. However, VirSorter uses a manually curated database of virus reference genomes augmented with metagenomic viral sequences sampled from freshwater, seawater, and human gut, lung, and saliva. Hence, for further identification of bacteriophages, scaffolds Ͼ1 kb were classified using VirSorter (decontamination mode [60]). Only scaffolds assigned to categories 1 and 2 were considered bacteriophage contigs and were filtered for redundancy at 95% nucleotide identity over 70% of the length using Cluster Genomes (94). Then, trimmed reads from each pool were mapped using Bowtie 2 (95) to the bacteriophage contigs, and the generated BAM files were filtered to remove reads that aligned at Ͻ95% identity using BamM (http://ecogenomics.github .io/BamM/). Abundance tables were obtained and normalized for total number of reads of each sample. For the richness comparison, Mann-Whitney tests were used, and for the clustering, an Adonis test was performed. All downstream analyses were done in R (96) using the vegan package (97). Furthermore, to identify the potential corresponding bacterial host, a database of these contigs was made to which a nucleotide BLASTN search (100% identity without gaps) was performed using a fasta file of CRISPR sequences (98) as query. These sequences correspond to different bacterial hosts, and their presence in the phage genome highlight the potential host of the phage.
To see if the phage community of these humans is related to those of the bats from the same locality, a visualization of the network of both human and bat phageomes was performed using vConTACT (62). Initially, proteins were predicted using Prodigal (99), and combined with the Viral RefSeq of archaeal and prokaryotic predicted proteins. A database was generated from the contigs of bat pools, human pools, and viral RefSeq proteins, and BLASTp was performed against the combined proteins. The output of blast was used to run vConTACT, and the output network was visualized in Cytoscape (100).
Data availability. All sequences were deposited in GenBank under the following accession numbers: MH608285 to MH608287 and MH933752 to MH933860 (details in Table S3). Raw reads were submitted to the NCBI's Short Read Archive (SRA) under the project ID PRJNA491626.