High Genomic Diversity and Heterogenous Origins of Pathogenic and Antibiotic-Resistant Escherichia coli in Household Settings Represent a Challenge to Reducing Transmission in Low-Income Settings.

Escherichia coli is present in multiple hosts and environmental compartments as a normal inhabitant, temporary or persistent colonizer, and as a pathogen. Transmission of E. coli between hosts and with the environment is considered to occur more often in areas with poor sanitation. We performed whole-genome comparative analyses on 60 E. coli isolates from soils and fecal sources (cattle, chickens, and humans) in households in rural Bangladesh. Isolates from household soils were in multiple branches of the reconstructed phylogeny, intermixed with isolates from fecal sources. Pairwise differences between all strain pairs were large (minimum, 189 single nucleotide polymorphisms [SNPs]), suggesting high diversity and heterogeneous origins of the isolates. The presence of multiple virulence and antibiotic resistance genes is indicative of the risk that E. coli from soil and feces represent for the transmission of variants that pose potential harm to people. Analysis of the accessory genomes of the Bangladeshi E. coli relative to E. coli genomes available in NCBI identified a common pool of accessory genes shared among E. coli isolates in this geographic area. Together, these findings indicate that in rural Bangladesh, a high level of E. coli in soil is likely driven by contributions from multiple and diverse E. coli sources (human and animal) that share an accessory gene pool relatively unique to previously published E. coli genomes. Thus, interventions to reduce environmental pathogen or antimicrobial resistance transmission should adopt integrated One Health approaches that consider heterogeneous origins and high diversity to improve effectiveness and reduce prevalence and transmission.IMPORTANCE Escherichia coli is reported in high levels in household soil in low-income settings. When E. coli reaches a soil environment, different mechanisms, including survival, clonal expansion, and genetic exchange, have the potential to either maintain or generate E. coli variants with capabilities of causing harm to people. In this study, we used whole-genome sequencing to identify that E. coli isolates collected from rural Bangladeshi household soils, including pathogenic and antibiotic-resistant variants, are diverse and likely originated from multiple diverse sources. In addition, we observed specialization of the accessory genome of this Bangladeshi E. coli compared to E. coli genomes available in current sequence databases. Thus, to address the high level of pathogenic and antibiotic-resistant E. coli transmission in low-income settings, interventions should focus on addressing the heterogeneous origins and high diversity.

KEYWORDS Escherichia coli, genomic diversity, accessory genes, soils, household settings E scherichia coli is a commensal bacterium but also a versatile pathogen capable of causing intestinal and extraintestinal infections (1,2). For instance, multiple E. coli pathotypes are among the most important etiological agents of different human infections, such as enteropathogenic E. coli (EPEC) and Shiga toxin-producing E. coli (STEC) of diarrheal disease and extraintestinal pathogenic E. coli (ExPEC) of urinary tract infections (3,4). However, E. coli is not restricted to human or animal hosts (5), as evidenced by studies demonstrating that E. coli can transit, survive for long periods, and even grow in diverse environmental compartments, such as soil and water (6,7).
The diversity of E. coli lifestyles is associated with the plasticity of its genome, which is considered open (8). E. coli survival and transit through multiple hosts and environmental compartments likely shaped the evolution and population structure of the species (8). Currently, only 16% of the genes of an E. coli strain belong to the core genome, while the remaining are considered the accessory genome (9). Despite E. coli's genome diversity, the core genetic structure of the species is clonal, with clear distinction of different phylogenetic groups (phylogroups): seven are part of E. coli sensu stricto (A, B1, B2, C, D, E, and F) and the eighth is known as clade I (10,11). The prevalence and relative abundance of the phylogroups vary among different hosts, ecological niches, and geographic locations (8,12,13,70,71). However, little is known about the genomic composition of E. coli isolated from open environments (such as soils) and whether specific genetic determinants contribute to survival, adaptation outside the host, or subsequent transmission (6). For instance, some authors have found unique E. coli fingerprints from soils compared to those from animal fecal sources (14), and others have suggested the naturalization of specific E. coli genotypes to soils (7). Luo et al. reported that the genome sequences of nine strains recovered primarily from environmental sources were phylogenetically distinct from commensal or pathogenic host-associated E. coli (15). In contrast, many settings in low-and middle-income countries (LMICs) are characterized by poor or nonexistent sanitary barriers for both people and animals that lead to fecal-and thus E. coli-contamination of environmental compartments (16)(17)(18).
Direct contact and close space sharing among multiple hosts (humans, domestic animals, and livestock) in these settings contribute to increased transmission of strains between hosts and with the environment (19). For example, one study in Bangladesh showed that animal feces contribute to higher loads of E. coli in soil, water, and food (18). Contributions of animals to E. coli in soil households in rural Bangladesh were further supported by evidence of ruminant-and avian-associated microbial source tracking markers (BacR and avian-GFD, respectively) in soils (20), and an adjunct study to the water, sanitation, and hygiene (WASH) Benefits Trial in rural Bangladesh stressed the importance of animal feces containment (domestic animals were found to be the key contributors to enteric pathogens in household environments) to reduce transmission of pathogens (21). Moreover, increased prevalence and transmission of resistant E. coli variants have also been linked to the use of antimicrobials, which are often unregulated in LMICs (22,23). Understanding the dynamics of pathogen transmission is important for the design of effective WASH and One Health interventions.
The present study used comparative genomics, including phylogenetic reconstruction and pairwise differences analysis, to investigate genetic and population-level relationships between E. coli isolates from feces (cattle, chickens, and humans) and soil in households in rural Bangladesh, an area characterized by high disease transmission. E. coli isolates were further characterized by genes associated with virulence, antibiotic resistance, and plasmid replicons. The accessory genome of Bangladeshi E. coli was further analyzed in a broader context by comparison with representative E. coli genomes available in NCBI.  detected in all isolates. Genes related to the type 1 fimbria operon and flagella were also very common (Table S4). Identified virulence factor-related genes included multiple genes used as diagnostic targets for intestinal E. coli pathotypes (Fig. 2). The astA gene, which encodes a heat-stable enterotoxin and is linked to diarrheagenic E. coli caused by enteroaggregative E. coli (EAEC), EPEC, and noncategorized diarrheagenic E. coli (DEC) (24), was detected in 17 isolates (4 cattle, 3 chicken, 3 human, and 7 soil). The eae gene indicating EPEC was detected in five isolates (3 chicken and 2 human). One isolate (HH13H) was a putative enterotoxigenic E. coli (ETEC), as indicated by the presence of the eltA and eltB genes, common diagnostic markers for heat-labile ETEC, while one cattle isolate (HH08C) was a putative STEC as indicated by stx1a, stx1b, stx2a, and stx2db genes. The gene aatA (plasmid-associated and used as a diagnostic target for EAEC [25]) was detected in 23 isolates, including three cattle, 10 chicken, four human, and six soil isolates (Fig. 2).
The observed distribution of virulence factor-related genes across the four isolate sources (cattle, chicken, human, and soil) appeared random based on overall prevalence rates for all except four genes ( 2 test, df ϭ 3, unadjusted ␣ ϭ 0.05). Specifically, the adhesin tia gene appeared in eight cattle, two chicken, and two soil isolates but in no human isolates ( 2 ϭ15.1, P ϭ 0.002), and the adhesin-related cah gene appeared in seven chicken and six soil isolates but only one human and no cattle isolates ( 2 ϭ 9.4, P ϭ 0.02). Similarly, leoA, a gene linked to secretion of the heat-labile enterotoxin (26), was only present in four cattle isolates ( 2 ϭ 14.5, P ϭ 0.002); ECP_2814, encoding a hypothetical protein, only appeared in four human isolates and two cattle isolates ( 2 ϭ 8.5, P ϭ 0.036).
Antibiotic resistance gene profiles and association with phenotypic resistance. Among the 60 isolates sequenced, 23 harbored at least one antibiotic resistance gene determinant [excluding mdf(A), found in all isolates] with identity and coverage greater than 90% against the ResFinder database (Table 1; Fig. 2) (27). Two soil isolates, HH20S and HH36S, harbored the most resistance genes, with 10 and 12 different genes, respectively. Resistance to tetracycline was reportedly predominant in the sampling area (28) and within the subset of isolates selected for this study (16/60 [26.7%]) ( Table 1). Not surprisingly, the most prevalent resistance mechanism encountered was   The core genome phylogenetic tree, based on SNPs and indels, was constructed by maximum likelihood using IQ tree and visualized using the iTOL online tool. The genome of Escherichia fergusonii was used as the outgroup. The household (HH) where the isolate was collected and the source ("S" for soil, "H" for human fecal, "CH" for chicken fecal, and "C" for cattle fecal) correspond to the isolate name. The source is additionally indicated by colored circles; E. coli phylogroups are indicated on the right. the efflux-mediated resistance to tetracycline encoded by tet(A) (n ϭ 11) and/or tet(B) genes (n ϭ 4) (Fig. 2). Resistance to ampicillin was also present in these isolates (23.3%), while beta-lactamase-encoding genes were observed in only 10 isolates (Table 1). Resistance to the third-generation cephalosporins cefixime, cefotaxime, and ceftriaxone was observed in four isolates (HH08CH, HH20S, HH26H, and HH46S), explained by the presence of the extended-spectrum beta-lactamase-encoding gene bla CTX-M-15 (Table 1). Reduced susceptibility to ceftazidime, as reported for CTX-M-15 (29), was observed in these four isolates; however, only isolate HH20S, carrying also bla OXA-1 , was classified as resistant. Resistance to cefixime alone (also a third-generation cephalosporin) was observed in isolate HH13H, harboring bla DHA-1 . The sul and dfrA genes, associated with class 1 integrons (30) and encoding a dihydropteroate synthase and a dihydrofolate reductase, respectively, were coharbored by nine of the 60 isolates, with intermediate or resistant phenotypes to trimethoprim-sulfamethoxazole (Table 1). Genes associated with resistance to aminoglycosides (aadA and aph variants) were observed in eight isolates, often from chicken origin (Fig. 2). Indeed, the genes aph(3==)-Ib and aph (6)-Id appeared to not be randomly distributed across the four sources, as they were only detected in chicken but not in any of the other sources ( 2 ϭ 9.9, P ϭ 0.002). The plasmid-mediated quinolone resistance (PMQR) genes QnrS1 and QnrB4 were detected in eight E. coli; however, no clinical resistance to ciprofloxacin, based on CLSI breakpoints, was observed in these isolates, except for one soil isolate that coharbored both genes. QnrS1 and QnrB4 are known to provide a low level of resistance, while mutations in the genes encoding DNA gyrase and topoisomerase IV are associated with observable resistance to ciprofloxacin and/or nalidixic acid (31), as in the case of seven E. coli isolates of this study (Table 1). Resistance to azithromycin (macrolide), detected only in E. coli from human and soil origin, was observed in the five isolates where the macrolide-associated gene(s) mph(A) and/or ermB was detected ( Table 1). Prevalence of plasmid replicons among soils and fecal E. coli isolates from rural Bangladesh. By using an identity and coverage threshold greater than 90% against the PlasmidFinder database, the numbers of plasmid replicons detected ranged from 1 to 7 among 49 isolates (81.7%), while the other 11 isolates had no hits above the predefined threshold (Table 1 and Fig. 2). Thirty-one plasmid replicons associated with large and small plasmids were identified (Fig. 2). The most prevalent replicons were IncFIB(AP001918) and IncFII(pSFO), detected from the four sources in 32 (53.3%) and 15 (25.0%) isolates, respectively. Nine other IncF replicons were detected with variable presence across the sources (Fig. 2). Among the replicons associated with small plasmids, Col(BS512) was the most prevalent, present in 12 (20.0%) isolates with a distribution across the sources that appeared not random, as it was detected in eight soil, three human, and one chicken isolate but not cattle isolates ( 2 ϭ 8.4, P ϭ 0.038).
Phylogenetic distance and accessory genomes analyses of soil and fecal E. coli isolates from rural Bangladesh against representative and nearest E. coli genomes available in NCBI. We used Mash distance estimation (32) to study the phylogenetic distance of the 60 Bangladeshi soil and fecal E. coli against 199 representative E. coli genomes (Table S5A). The hierarchical dendrogram revealed that isolates of this Bangladeshi collection have, in general, greater sequence similarity among each other than with representatives of the E. coli phylogeny (Fig. 3A). For instance, 23 of the 36 phylogroup B1 Bangladeshi isolates clustered together in the Mash-based dendrogram with only two other genomes (isolated from feces of dogs, ASM332284 and ASM332186) forming part of this cluster. Similarly, 13 of the 17 phylogroup A Bangladeshi isolates formed a cluster, indicating greater similarity among these genomes. As expected, due to the low prevalence of other phylogroups among this Bangladeshi isolate collection, isolates from phylogroups besides A and B1 were scattered among the other genomes ( Fig. 3A and S1). The network analysis using the AcCNET (Accessory Genome Constellation Network) application (33) also revealed that the accessory genomes of the Bangladeshi collection have higher similarity among each other than with the accessory genomes of the representative E. coli genomes (Fig. 3B).  Table S5 in the supplemental material for the list of the genomes used for comparison). Accessory-genome bipartite network generated by AcCNET with the 199 representative (B) and 265 nearest-neighbor (D) accessory genomes. Proteins with a P value of Ͻ0.001 and frequency in Bangladesh data set of Ͼ50% are represented.
To identify the genomic characteristics unique to the Bangladeshi isolates, the Mash phylogenetic distance and the frequencies of the protein-coding genes observed within the respective accessory genomes were quantitatively compared to those of the 265 nearest E. coli neighbors (Table S5B). By using the nearest E. coli neighbors, which represent the 10 most closely related E. coli genomes in NCBI for each of the Bangladeshi E. coli isolates (some Bangladeshi E. coli isolates shared the same nearest neighbors), we then observed uniform distance distribution of the Bangladeshi isolates among the E. coli genomes (Fig. 3C), therefore minimizing bias in the subsequent network analyses. AcCNET identified 10,587 protein-coding genes in the accessory genomes of the 60 Bangladeshi isolates and compared the presence/absence frequency to that of 265 nearest E. coli neighbors (Fig. 3D). Of these, 1,764 (16.7%) were statistically significantly enriched in the Bangladeshi E. coli isolates relative to that in genomes of the nearest neighbors (hypergeometric test, Bonferroni adjusted P Ͻ 0.05) ( Fig. 3D and 4). Notably, the accessory genome contained a large proportion of putative or hypothetical proteins with unknown function (5,014 [47.3%]). The proportion of putative or hypothetical proteins was statistically significantly higher (z ϭ Ϫ20.9, P Ͻ 0.001) among the protein-coding genes enriched in the Bangladeshi isolates (1,235/ 1,764 [70.0%]) than the protein-coding genes shared between the Bangladeshi isolates and the nearest neighbors (3,779/8,823 [42.8%]) (Fig. 4).
The accessory genome analysis identified 84 (0.8%) protein-coding genes that were both statistically significantly enriched and present in at least half of the 60 Bangladeshi isolates ( Fig. 4; Table S6). The 84 enriched proteins included putative or hypothetical proteins with unknown function (54 [64%]), proteins coding for domains of unknown function (4 [5%]), or that were otherwise poorly defined (2 [2%]). Among the rest, nine (10%) were related to metabolism (formate dehydrogenase, 6-phospho-alpha-glucosidase, arylsulfatase, fatty acyl-CoA synthetase, peptide chain release factor 2, and carbonic anhydrase), and eight (10%) were related to environmental response, biofilm formation, and/or virulence (murein endopeptidase from DLP12 prophage, response regulators, diguanylate cyclase, fimbrial protein, adhesin-like autotransporter, and flagellar motor rotation) ( Table S6). The remaining proteins enriched in the Bangladeshi isolates relative to the nearest neighbors included four (5%) related to insertion sequences IS1, IS2, and IS3; three (3.5%) related to toxin/antitoxin systems for plasmid maintenance, one related to DNA-binding transcriptional regulator, and one related to DNA base-flipping. Notably, 13 of the proteins were not found in any of the 265 nearest neighbors, including DNA base-flipping and formate dehydrogenase H proteins present in all 60 Bangladeshi isolates, and two toxin/antitoxin proteins present in 58 (97%) and 33 (55%) of the Bangladeshi isolates (Table S6).

DISCUSSION
We assessed the genomic diversity of E. coli from household soils in a rural Bangladeshi community using WGS and performed comparative analyses with E. coli isolated from feces of potential contributors (human and animal) to shed light on probable sources and transmission patterns. Our findings are indicative of a rich phylogenetic diversity among the E. coli isolates circulating in this rural community with the E. coli isolates recovered from front yard households located in multiple branches of the phylogeny intermixed with isolates from fecal sources (Fig. 1). The high diversity observed among these Bangladeshi E. coli isolates is in line with recent studies in other rural or semirural communities in LMICs (23,34). For instance, Richter et al. found high interindividual diversity among gastrointestinal E. coli isolates in Tanzanian children and high intraindividual temporal diversity in samples from the same child during a 6-month period (34).
The placement of soil E. coli in terminal lineages of the phylogeny with fecal E. coli suggests the fluidity and lack of phylogenetic structure based on source. Humans and animals are suggested as likely contributors to the E. coli population in soils (28,35), but clonality or an estimate of the time of diversification between E. coli in soil and E. coli from the input source has not yet been established. Mutation rates are routinely used to establish time of diversification (36,37). For example, by using estimated mutation rates reported for two different E. coli clones based on the number of differences and approximate time of divergence (2.3 ϫ 10 Ϫ7 to 6.9 ϫ 10 Ϫ7 per site per year) (36,38), one would predict that 1 to 3 SNPs would arise in 1 year for an average genome size of 4.9 Mbp (the average genome size for the E. coli analyzed in this study). This value is far below the minimum number of SNPs (189 SNPs) observed between the most closely related isolates of this study. Therefore, we found no direct evidence to suggest recent clonal transmission from humans or animals to soils or vice versa. Similarly, other studies have failed to detect recent transmission events between human and animals (domestic and livestock) (13,23), even when analyzing strains with the same phenotypic resistance (23). In contrast, strain pairs of the 2011 E. coli O104:H4 outbreaks in Germany and France differed by a maximum of 6 and 19 SNPs, respectively (39). However, mutation rate estimates hold several uncertainties. For example, laboratory conditions may not resemble generation times in nature (40,41) or disregard factors such as differential mutation rates among strains, selection, recombination, and mutational bias (41,42). Furthermore, little is known on how environmental factors, ecological niches, or different host species affect the rates of accumulation of diversity (39,43). For instance, differences in diversity were reported even among two different but linked E. coli O104:H4 outbreaks (39). In addition, multiple genome sequences per source must be necessary to understand the origin and patterns of transmission and diversification (39,40) in a scenario like the one described in this study.
At the core genome level, the Bangladeshi E. coli isolates do not represent a unique population relative to the nearest E. coli neighbors available in the database. Interestingly, when we interrogated their accessory genomes against the nearest E. coli genomes, we observed that approximately one of every six protein-coding genes in the genomes of the Bangladeshi isolates was statistically significantly enriched relative to the nearest E. coli neighbors (Fig. 3 and 4). Protein-coding genes enriched in the Bangladeshi isolates were significantly more likely to code for putative or hypothetical proteins of unknown function than genes shared between the isolates and their nearest neighbors. The clustering of Bangladeshi isolates and the high rate of putative proteins indicate a potentially large pool of unknown biological functions unique to this E. coli community. Known functions enriched in this community included those linked to DNA methylation and repair as well as metabolic processes, suggesting potential adaptive strategies unique to this environment. Together, these findings indicate the cohesiveness of the accessory genomes of this Bangladeshi E. coli population relative to E. coli sequences in the NCBI database while suggesting that the diversity of the accessory genome of even an organism as well studied as E. coli is not completely explored. These findings affirm that certain geographic regions (i.e., Asia) are underrepresented in current sequence databases describing E. coli and associated biological functions, as similarly suggested with recent studies of metagenome-assembled genomes (MAGs) from the gut microbiome (44,45). In addition, the observed specialization of the accessory genome over the core genome seems to indicate the existence of evolutionary pressure for adaptation to this environment. These results highlight the wellknown but perhaps underestimated genomic plasticity of E. coli. Furthermore, the enrichment and sharedness of certain accessory genes suggest an intensive horizontal gene transfer activity among this Bangladeshi E. coli collection.
Bangladeshi E. coli isolates carried multiple virulence factor-related genes, including diagnostic markers for intestinal E. coli pathotypes. For instance, the genes aatA and astA, associated with EAEC (a pathotype identified as a common cause of child diarrhea in developing and industrialized countries [46]), were prevalent and found in E. coli from the four sources, including soil (Fig. 2). Notably, nine phylogenetically diverse E. coli (median, 37,617 SNPs), including three soil isolates (HH25S, HH26S, and HH51S), coharbored aatA and astA (the simultaneous presence of aatA and astA has been associated with prolonged diarrhea [47]), highlighting the diversity of pathogenic E. coli circulating in these rural Bangladeshi communities. The presence of astA in the absence of additional pathogenic markers, as observed in eight E. coli isolates, lacks the discriminatory power to assign these strains within any of the intestinal pathotypes, as astA has been associated with multiple intestinal pathotypes (48)(49)(50) and is also prevalent in extraintestinal (51), commensal (50), and environmental isolates (52). However, the presence of astA, even in the absence of other markers, has been associated with important diarrhea outbreaks (53); therefore, its presence in E. coli from soils should not be overlooked. Other intestinal pathotypes (EPEC, ETEC, and EHEC) were not detected in E. coli isolated from soils but were found in isolates from human, chicken, and cattle feces. Overall, these findings are indicative of the potential that E. coli isolated from soils has to cause disease in people. Furthermore, the presence of one or more antibiotic resistance genes in soil isolates (i.e., 12 genes in isolate HH36S) is indicative of the risk that soil E. coli may represent for the transmission of resistant determinants. Indeed, at least one E. coli isolate from soil carried a gene associated with each of the antibiotic resistance gene classes encountered (Fig. 2). Plasmid replicons were also present among this Bangladeshi E. coli collection (81.7%), with no significant difference in the numbers of replicons observed across the sources. Salinas et al. showed that human and domestic animals shared plasmid replicons; however, diversity in the sequences indicated that the plasmids compared were not identical (23). Similarly, soil, human, and animal E. coli of this study share plasmid replicons [i.e., IncFIB(AP001918) and ColpVC]; however long-read sequencing would be necessary to establish if the same plasmid is circulating across reservoirs. In contrast, other replicons were absent from one or more of the studied sources [i.e., Col(BS512)], which suggests that ecological factors and/or the genetic makeup of the E. coli circulating within specific hosts could affect the distribution of certain plasmids replicons. However, the apparent enrichment by sample source may be random for at least some-if not all-of the four virulence genes, two antibiotic resistance genes, and one plasmid replicon as a consequence of the large data set, liberal statistical significance cutoff, and purposive sampling. Nevertheless, the genes are discussed here to inform potential further investigations of source-specific adaptation of E. coli.
The findings have important implications for interventions intending to address the high loads of E. coli contamination in low-income settings. First, the pathogenicity potential and acquired antibiotic resistance of environmental strains reaffirm the need for interventions that effectively reduce E. coli across different environmental reservoirs. This represents a major challenge, as multiple previous studies showed no significant impact of sanitation (16), household-level water, sanitation, and hygiene infrastructure (17,28) or an integrated water, sanitation, and hygiene intervention (54) on E. coli concentrations in soils in and around households. Second, the lack of core phylogenetic signal based on source and apparent fluidity of E. coli strains across human, animal, and environmental reservoirs reaffirms the need for integrated interventions that address both human and animal fecal sources (One Health approaches) (55). Infection control interventions targeting only people, such as vaccination or traditional drinking water treatment, household sanitation, and hand hygiene services, may be insufficient to meaningfully impact zoonotic reservoirs. Overall, new approaches, potentially including those described as transformative (56,57), are needed to address the high loads of E. coli contamination in low-income settings that seek to address the heterogeneous origins and high diversity in order to reduce prevalence and transmission.

MATERIALS AND METHODS
Bacterial isolates and antibiotic susceptibility testing. A subset of 60 isolates, part of a 175-isolate collection that was previously recovered in a study conducted in households in rural villages of Mirzapur, Bhatgram, Gorai, and Jamurki in Tangail district of Bangladesh (28), were selected for this study (Table 1). These isolates were phenotypically identified as E. coli using the API-20E (bioMérieux, Marcy-l'Étoile, France). The isolates selected were recovered from 22 households and up to four different sources and included E. coli isolated from front yard soils (n ϭ 19) and fecal samples from human (n ϭ 14), chicken (n ϭ 14), and cattle (n ϭ 13) ( Table 1). For 14 households, the E. coli isolates included (n ϭ 52) were isolated from three or four of the four sources studied, while the remaining isolates (n ϭ 8) correspond to E. coli isolated from eight different household soils ( Table 1). The nomenclature indicates the household (HH) from which the isolate was collected, followed by the source: "S" for soil, "H" for human fecal, "CH" for chicken fecal, and "C" for cattle fecal (i.e., HH03C is an E. coli isolate from cattle feces in household 03). Disk diffusion against 16 different antibiotic disks was previously performed (28). In addition, susceptibility against azithromycin (AZM) (Oxoid, Basingstoke, UK) was evaluated for selected isolates and interpreted using the Clinical and Laboratory Standards Institute (CLSI) guidelines and interpretation standards (58).
DNA extraction and whole-genome sequencing. DNA was extracted from an overnight culture using the DNeasy Blood & Tissue kit (Qiagen, Hilden, Germany) according to the instructions of the manufacturer. Purity and concentration of the DNA were evaluated with a NanoDrop 2000 spectrophotometer (Thermo Scientific) and a Qubit 2.0 fluorometer (Life Technologies), respectively. Libraries were prepared with the Nextera XT kit, and paired-end sequenced was performed using the Illumina HiSeq platform (2 ϫ 150 bp) (Illumina, San Diego, CA, USA).
Phylogenetic distance and analysis of the accessory genomes. Phylogenetic distance was estimated using Mash (32), while the Accessory Genome Constellation Network (AcCNET) (33) was used to extract the accessory genome proteomes and generate a bipartite network that links the genomes that share a protein. Visualization of the network was performed using Gephi (https://gephi.org/). Analyses were performed using the 60 Bangladeshi soil and fecal E. coli isolates against 199 nonredundant E. coli genomes representative of each branch of the E. coli phylogeny and against 265 nonredundant nearest E. coli genomes, which represent the 10 most closely related E. coli strains for each of the Bangladeshi E. coli isolates, which may be shared among some Bangladeshi E. coli isolates.
Statistical analyses. Statistical analyses were performed using R, version 1.2.1335. Pairwise differences in the means of the ranks of the number of SNPs among isolates from the same household or different households and from the same source or different sources were evaluated using the Wilcoxon rank-sum test. To investigate enrichment of virulence factors, antibiotic resistance genes, and plasmid replicons by source, a chi-squared test was used and a P value of Ͻ0.05 was considered significant.

Accession number(s).
This whole-genome shotgun project has been deposited at DDBJ/ENA/ GenBank under the accession numbers VNWZ00000000 to VNZG00000000 presented in Table S1 in the supplemental material.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. FIG S1, PDF file, 1.8 MB.

ACKNOWLEDGMENTS
This work was funded by the Swiss National Science Foundation (SNSF) through grant OP157065 to Timothy R. Julian. The funding agency had no role in study design, data collection or interpretation of the results, or submission of the work for publication.
We declare no competing financial interest.