Diversity and Population Overlap between Avian and Human Escherichia coli Belonging to Sequence Type 95

APEC causes a range of infections in poultry, collectively called colibacillosis, and is the leading cause of mortality and is associated with major economic significance in the poultry industry. A growing number of studies have suggested APEC as an external reservoir of human ExPEC, including UPEC, which is a reservoir. ExPEC belonging to ST95 is considered one of the most important pathogens in both poultry and humans. This study is the first in-depth whole-genome-based comparison of ST95 E. coli which investigates both the core genomes as well as the accessory genomes of avian and human ExPEC. We demonstrated that multiple lineages of ExPEC belonging to ST95 exist, of which the majority may cause infection in humans, while only part of the ST95 cluster seem to be avian pathogenic. These findings further support the idea that urinary tract infections may be a zoonotic infection.

E scherichia coli is an important pathogen of both poultry and humans and may cause both intestinal and extraintestinal infections (1,2). E. coli bacteria that are causing extraintestinal infections are known as extraintestinal pathogenic E. coli (ExPEC). ExPEC include uropathogenic E. coli (UPEC), which is the primary cause of urinary tract infections in humans,and neonatal meningitis E. coli (NMEC), which is an important cause of bacterial meningitis in human neonates (1)(2)(3). Avian-pathogenic E. coli (APEC) is an ExPEC that causes a range of infections in poultry collectively known as avian colibacillosis, which is one of the leading causes of infection in the poultry industry throughout the world (4)(5)(6). Furthermore, a growing number of infections caused by antimicrobial-resistant human ExPEC strains has complicated treatment of extraintestinal infections (2,3).
In humans, the main reservoir of ExPEC is the host's own intestinal tract, but a number of recent studies have suggested that the external reservoir of ExPEC may be of animal origin, including broilers, where the ExPEC is transmitted via the animal food chain (2,7,8). These speculations have spawned a large number of comparative studies that have demonstrated an overlap in serogroups, sequence types (STs) and virulenceassociated genes, which have previously been suggested to be used for classification or even as a diagnostic tool of APEC and human ExPEC (5,9,10). Studies have demonstrated that some human ExPEC strains cause disease in chicken infection models, and some APEC strains cause disease in mammalian models of human disease (7,(11)(12)(13). Furthermore, identical or nearly identical pulsed-field gel electrophoresis (PFGE) profiles have been identified in specific serotype-phylogroup-sequence type groups from human extraintestinal infections and avian colibacillosis (14).
ExPEC strains belonging to ST95 have been prevalent causes of human disease (15,16). However, strains of ST95 have also been found to be among the predominant STs causing avian colibacillosis (8,(17)(18)(19). Studies have demonstrated a close relationship between ST95 APEC and human ExPEC isolates belonging to the same sero-phylo groups, including O1/O2/O18/O45:H7-B2. Certain APEC and human ExPEC have also been found with highly similar PFGE profiles (11,14,20). Additionally, comparative whole-genome sequencing-based studies have demonstrated a close relationship between the ST95 APEC O1 type strain and human ExPEC strains (21)(22)(23). However, it is evident that to accurately investigate the genetic properties of APEC and human ExPEC, comparative genomic analyses of a large and diverse collection from both humans and birds are required. A better understanding of ExPEC isolates belonging to ST95 may reveal whether certain sublineages have adapted to constitute unique host-specific "subpathotypes." The present study aimed to compare the genetic composition of 32 APEC and 291 human ExPEC genome sequenced isolates belonging to ST95. Using these data, we investigated whether distinct genomic features of ST95 strains from different hosts exist and how these may provide insight into host adaptation as it pertains to ExPEC.

RESULTS
The initial pan-genomic analysis identified 16,786 genes, of which 3,789 belonged to the core genome. Diversity within the core genome of the APEC and human ExPEC strains were characterized by 17,468 single nucleotide polymorphisms (SNPs), while differences in the accessory genome were based on the presence or absence of the genes from the analysis of the pan-genome. A purged SNP phylogeny was constructed based on the identified SNPs with recombination regions removed (Fig. 1A). Meanwhile, presence or absence of the genes in the accessory genome was used to generate an accessory-genome-derived tree (Fig. 1B). Both trees demonstrated similar clustering, with APEC strains located on several different branches mixed with human ExPEC strains. Analysis of the SNPs by univariate and multivariate analysis could not identify single SNPs or a combination of SNPs which explicitly defined either APEC or human ExPEC. No single gene or combination of genes could separate and describe either the APEC or human ExPEC genome.
Thus, APEC belonging to ST95 did not constitute a unique branch, and it was not associated with any SNP(s) or accessory genome features. APEC was located on multiple branches of mixed APEC and human ExPEC strains in both the SNP phylogeny and the accessory-gene-derived tree ( Fig. 1A and B), demonstrating a great overall genetic overlap with certain human ExPEC strains, including closely related APEC and human ExPEC strains. These branches of mixed APEC and human ExPEC strains were designated A/E-PEC. Three minor A/E-PEC clades of closely related strains, which included both APEC and human ExPEC, were identified in both the SNP phylogeny and the APEC genomes are marked with red dots on the branch ends. Subclades consisting exclusively of human ExPEC were identified in the core and accessory phylogenies and indicated by colored branches. The human ExPEC/HExPEC-1 subclade (blue) was distinct in both the core and accessory genome phylogenies. The HExPEC-2 subclade (green) was identified in the accessory-genome-derived tree, while a less distinct clustering was identified in the core genome phylogeny. Branches with mixes of both APEC and human ExPEC were designated A/E-PEC branches.
Diversity of Avian and Human ST95 E. coli accessory-gene-derived tree (clades 1 to 3) (see Fig. S1A and B in the supplemental material). SNP analysis indicated that genomes within the clades differed by no more than 32 SNPs (clade 1), 57 SNPs (clade 2), or 30 SNPs (clade 3), which further underlined the close relationship of certain APEC and human ExPEC strains. The associated metadata regarding the year of sampling and country of isolation presented no indication of the clinical relationship between the isolates but rather indicated highly dispersed clones. Two clusters exclusively containing human ExPEC strains were identified in the two phylogenies. A large cluster consisting solely of closely related human ExPEC strains (n ϭ 145) was observed in both the SNP phylogeny and accessory-genome-derived tree and designated human ExPEC 1 (HExPEC-1) and represented strains over a period of 67 years from multiple countrie sbut with an overrepresentation of strains collected in the United States (USA) in 2012 (62%) (Fig. 1A and B). An additional human ExPEC cluster (HExPEC-2) was identified in the accessory-genome-derived tree (n ϭ 48) (Fig. 1A). Strains of the HExPEC-2 cluster showed a distinct separate branch in the accessory-genome-derived tree; however, the cluster did not demonstrate the same distinct separation as in the SNP-based phylogeny (Fig. 1B). Almost all strains in the HExPEC-2 cluster (94%) were collected in North America over a period of 38 years and also demonstrated an overrepresentation of strains from the USA in 2012 (46%).
Multivariate analysis was performed to distinguish the A/E-PEC, HExPEC-1, and HExPEC-2. The analysis identified two canonical but nonsynonymous SNPs both found in the nadR gene, which separated the large HExPEC-1 cluster from the remaining strains ( Fig. 1). Furthermore, two presence/absence gene combinations were identified that could characterize and separate the HExPEC-1 cluster from the remaining genomes. The first gene combination included the copresence of yjjW and an identified gene labeled gene3477 (see Table S1 in the supplemental material), but an absence of yaeP (Table 1, gene profile A/B). Gene3477 encoded a putative transposase (GenBank accession no. CAV01995.1). The second gene combination reported the absence of the six genes belonging to the fec protein operon (fecA2, fecB, fecC, fecD, fecE, and fecI) and presence of putative gene8134, which could not be further characterized by either Prokka, NCBI BLAST, or UniProt (Table 1, gene profile C) ( Table S1). The HExPEC-2 was described by the absence of both the fec operon and gene8134 (Table 1, gene profile E). Furthermore, the HExPEC-2 could be identified by the copresence of cutC and cutD ( Table 1, gene profile F). The mixed A/E-PEC clusters could be separated from the two HExPEC clusters by the presence of the fec gene operon (Table 1, gene profile D).
FimH typing. Subdivision by FimH typing revealed a correlation between the FimH variants and the individual branches in both the SNP phylogeny and accessorygenome-derived tree ( Fig. S1A and B). The mixed branches of APEC and ExPEC belonging to A/E-PEC carried genes encoding multiple FimH variants, including FimH15, FimH27, FimH30, and FimH54, and APEC strains were represented among all these FimH variants. Three genomes encoded two novel FimH variants, FimH526 and FimH525, each differing only by one nucleotide from FimH15 and FimH30, respectively,
and grouped together with these. All strains encoding the FimH41 variant belonged to the HExPEC-1 cluster. Furthermore, all but one strain encoding the FimH18 variant belonged to the HExPEC-2 cluster. ExPEC-associated genes. All strains were investigated for 55 genes previously described to be distinct for ExPEC (Table 2). Interestingly, 10 genes of these were present in all or Ͼ95% of all investigated genomes. Six additional genes were also highly frequent and were present in Ͼ75% of all genomes. Conversely, 14 genes were not identified or found in Ͻ5% of the genomes.
A correlation was observed between the virulence-associated genes and the A/E-PEC group and the two HExPEC groups (Fig. 2). Ten genes (cvaA, cvaB, cvaC, cvi, etsA, hlyF, iucC, iucD, iutA, and ompT) were significantly more prevalent (P Ͻ 0.01) in the A/E-PEC compared to the strains from either of the two HExPEC clusters. Five genes (ireA, papA, papE, papG-II, and tia) were found in almost all and significantly more HExPEC-1 and A/E-PEC, while were absent in the strains belonging to HExPEC-2 (P Ͻ 0.001). Meanwhile, five genes (ibeA, sfaS, sfa, cnf1, and papG-III) were significantly more prevalent in HExPEC-2 (P Ͻ 0.001). The iroN gene was detected in significantly fewer HExPEC-1 strains compared to A/E-PEC and HExPEC-2 (P Ͻ 0.0001).
Antibiotic resistance. The levels of antibiotic resistance were investigated in silico (Table 3). There was no statistical difference in the number of APEC and human ExPEC resistant strains for any of the classes of antibiotics. HExPEC-2 demonstrated the highest prevalence of beta-lactam, sulfonamide, trimethoprim, tetracycline, and macrolide antibiotic resistance genes. However, only the prevalence of beta-lactam and tetracycline genes were significantly higher in HExPEC-2 than in A/E-PEC (P Ͻ 0.05). Furthermore, significantly more A/E-PEC carried sulfonamide and aminoglycoside resistance genes than HExPEC-1 (P Ͻ 0.05). Strains with genes coding for Ն3 classes of antibiotic resistance genes were considered multiantibiotic resistant (Table 3). While there was no difference in the number of putative multidrug-resistant strains between the APEC and human ExPEC, there were a significantly higher number of multiantibiotic-resistant HExPEC-2 strains compared to both the HExPEC-1 and A/E-PEC (P Ͻ 0.01 and P Ͻ 0.05, respectively).

DISCUSSION
Previous studies have reported a genetic overlap in APEC and human ExPEC strains belonging to ST95 and suggested a zoonotic potential of APEC (21)(22)(23). However, these studies were solely based on comparing the APEC-O1 type strain to human ExPEC strains. Here, using whole-genome sequencing (WGS) data on a large collection of ST95 strains, APEC belonging to ST95 did not constitute a unique branch and were not associated with single SNPs or specific accessory genome features. The APEC genomes were generally diverse and located on multiple branches together with closely related human ExPEC, including nearly identical APEC and human ExPEC strains in both the SNP phylogeny and accessory-genome-derived trees. These results clearly show that APEC and human ExPEC belonging to ST95 could not be distinguished. The large overall genetic overlap and identification of dispersed clones of closely related APEC and human ExPEC support the previous findings and demonstrate that certain ST95 ExPECs do not exhibit host specificity or have host specificity against multiple hosts, including humans and birds. Furthermore, the genetic diversity between the investigated APEC genomes also reveal that no single ST95 APEC strain can be used as a ST95 APEC type strain to represent the entire ST95 APEC pathotype. Two clusters consisted entirely of human ExPEC strains, the HExPEC clusters. There was a strong overrepresentation of strains collected in the USA in 2012 in the two HExPEC clusters compared to the mixed human ExPEC and APEC cluster, A/E-PEC (62% of HExPEC-1, 46% of HExPEC-2, and 24% of A/E-PEC) (P Ͻ 0.001), Thus, the clustering could be the result of skewed overrepresentation of clonally related strains. However, both HExPEC-1 and HExPEC-2 represented strains from multiple countries over a 67-year and 38-year period, HExPEC-1 and HExPEC-2, respectively, and resulted in closely related clusters in the SNP-based phylogenies. Though human ExPEC and APEC A/E-PEC genomes demonstrated a great genetic overlap suggesting low host association, these findings suggest that certain subclades of genetically stable ExPEC belonging to ST95 could have adapted a degree of host predilection.
The HExPEC-1 cluster was distinctly separated from the remaining genomes in both the SNP phylogeny and accessory-genome-derived tree, while the HExPEC-2 cluster demonstrated a clear separation only in the accessory-genome-derived tree. The HExPEC-1 was distinguishable from other strains by two canonical SNPs as well as several accessory genome features, while the HExPEC-2 cluster constituted a distinct subclade in the accessory genome, defined by the presence of two cut genes. The two canonical SNPs that differentiated the HExPEC-1 were both in the nicotinamide ribose kinase domain of the nadR gene and were both nonsynonymous. NadR is a trifunctional protein involved in regulation of NAD and associated fermentation pathways and  anaerobic growth (25,26). Two combinations of genes were found to be unique to the HExPEC-1. The first consisted of yjjW, yaeP, and gene3477 (Table 1, gene profile A/B), and the second consisted of gene8134 and the fec operon (Table 1, gene profile C/D/E). The YjjW protein has been associated with the activation of YjjI, a protein hypothesized to be involved in nitrogen metabolism. The yaeO gene has been suggested to be expressed through translational coupling with yaeP (27). YaeO has been shown to be a Rho-specific inhibitor of transcription termination, an essential process for the regulation of bacterial gene expression (28). The gene3477 gene encodes a putative transposase. The fec genes are signaling and transport genes for the binding and uptake of diferric dicitrate (29). Evidence indicates that fec genes are acquired by horizontal gene transfer and have been reported to be located in pathogenicity islands in Shigella flexneri. Meanwhile, gene8134 could not be annotated by Prokka, NCBI BLAST, or UniProt. For HExPEC-2, the presence of cutC and cutD genes was unique. The cut gene cluster has been described and associated with disease-associated microbial metabolic activity in humans and has been suggested to be horizontally transferred between bacteria in the human gut (30). The prevalence of 55 genes, which previously have been suggested to demonstrate APEC or human ExPEC association were investigated (Table 2) (5,9,10). Interestingly, in this study, nine genes were found to be present in all investigated genomes (csgA, fyu, ompA, iss, feoB, fimH, usp, kpsMT-K1, and irp2). Additionally, six genes (malX, papC, papF, vat, bor, and sitA) were identified in Ͼ75% of all genomes and did not show any host or cluster association (Fig. 2). All investigated genomes originated from isolates from diseased birds or were clinical human isolates; thus, a degree of virulence potential in the strains was expected. However, several of the 15 frequently identified genes have previously been suggested to be host associated. The iss, papC, vat, and sitA genes have been suggested to be APEC associated, while csgA and fimH have been suggested to have stronger association with human ExPEC (5,9,10,31). Additionally, fyu and irp2 have been suggested to be associated with both host types. These results demonstrate that these frequently identified genes possibly have lower host association in ST95 ExPECs than previously predicted. Meanwhile, 14 of the 55 genes were not identified or found in very low numbers in the investigated strains, including the astA, tsh, iha, and afa genes which have been suggested to be host associated (9, 31) and generic for ExPEC; however, these genes appear not to be significant for ExPEC belonging to ST95.
The 10 A/E-PEC-associated genes (cvaA, cvaB, cvaC, cvi, etsA, hlyF, iucC, iucD, iutA, and ompT) were identified in the majority of included APEC isolates (84.4 to 96.6%) and seemed essential for the avian virulence potential ( Table 2). These 10 genes were also identified in the majority (77.7 to 86.2%) of the human ExPEC strains belonging to the A/E-PEC clusters, supporting their close relatedness to APEC. Previous studies have described the function and pathogenic significance of the 10 A/E-PEC genes (5,9,10). Furthermore, the four ColV plasmids, pAPEC-O2-ColV, pAPEC-O1, p1ColV5155, and pAPEC-O78-ColV, and closely related variants have been described to be closely related, with large strongly conserved core genes, including the 10 A/E-PEC-associated genes (32)(33)(34). ColV plasmids were identified in all APEC and 70% of the human ExPEC strains belonging to the A/E-PEC clusters, corresponding to 78% of all A/E-PEC strains containing ColV plasmids. These results further underline the significance of these 10 genes and the ColV plasmids which have previously been described to be associated with pathogenicity (5,9,10,(32)(33)(34). Meanwhile, 11.7% of the HExPEC-1 strains were sporadically identified to harbor the pAPEC-O78-ColV plasmid. It is generally accepted that transconjugation of virulence-associated plasmids can induce the recipient's virulence potential, and transmission of pAPEC-O2-ColV into commensal E. coli strains has been demonstrated to significantly increase the virulence potential in in ovo experiments (12). This suggests that the avian virulence potential can possibly be introduced into human clones by the uptake of such plasmids. The identified ireA, papE, tia, papA, and papG-II virulence genes encoded by HExPEC-1 could be host species related or associated. However, the majority of the A/E-PEC also carried these genes, which emphasizes A/E-PEC's potential to infect humans as well as birds. While few HExPEC-2 carried the A/E-PEC-and HExPEC-1associated genes, five genes (ibeA, sfaS, sfa, cnf1, and papG-III) were significantly (P Ͻ 0.001) and almost exclusively found in the strains of the HExPEC-2 cluster compared to the A/E-PEC and HExPEC-1. These findings could indicate a lower virulence potential of HExPEC-2 strains toward birds, but an increase in the clinical virulence potential in a putative human subclade (Fig. 2). Interestingly, the papG-II variant was exclusively identified in the HExPEC-1 and A/E-PEC, while papG-III was almost only identified in the HExPEC-2. One may speculate that the P-fimbria adhesin variant PapG-II might have a significant role in the pathogenicity in both birds and humans, while one might suggest that the PapG-III variant had less avian pathogenicity potential.
Antimicrobial resistance genes toward the beta-lactam, sulfonamide, and trimethoprim classes were the most prevalent. Beta-lactams are among the most frequently used antimicrobial agents in both poultry and humans, which consequently can be reflected by high levels of beta-lactam resistance (35). Trimethoprim-sulfonamide combinations are broad-spectrum antimicrobial agents frequently used in poultry production as well as for treatment against urinary tract infection (UTI) (35). The prevalence of beta-lactam-and tetracycline-resistant HExPEC-2 strains was significant compared to that of A/E-PEC (P Ͻ 0.05), while A/E-PEC demonstrated significantly more sulfonamide-and aminoglycoside-resistant strains compared to HExPEC-1 (P Ͻ 0.05). The overall prevalence of antimicrobial resistance was observed in similar or lower levels as currently observed in poultry and clinical cases and did not show any strong correlation with any of the observed clusters (35). The aminoglycoside resistance gene aac3Via was the only gene found in significantly higher numbers of APEC compared to human ExPEC (P Ͻ 0.01) (see Fig. S1 in the supplemental material). Furthermore, the prevalence of strA, strB, tetA, tetB, sul2, dfrA5, and aac3VI genes were identified in significant higher numbers in A/E-PEC than in the HExPEC clusters. Tetracyclines are commonly used antimicrobials in poultry as well as in other animal species, which could explain the high prevalence of the tetA and tetB genes (35).
Typing of E. coli using concatenated FimH and MLST sequences has been suggested to have a greater discriminatory power than MLST alone (36). FimH typing correlated highly with the clustering of the accessory-derived tree, the presence of virulenceassociated plasmids and additional virulence genes. APEC and human ExPEC strains belonging to A/E-PEC were found to have genes encoding FimH15/FimH526, FimH27, FimH30/FimH525, and FimH54 variants which correlated highly with the subclades of the A/E-PEC. While the strains encoding the FimH18 and FimH41 variants belonged to the HExPEC-2 and HExPEC-1, respectively, possibly this was due to the result of the strong overrepresentation of North American strains in the two clusters; the A/E-PECassociated FimH groups were represented by temporally and spatially diverse strains (Table S2 and Fig. S1A and B).
While limited information is available on FimH15 and FimH54, previous studies have  (37,38). One might suggest that combined MLST and FimH typing could give a greater overview of the subclades of ST95 E. coli strains capable of infecting both birds and humans. In summary, APEC belonging to ST95 did not constitute a unique cluster, and these strains exhibited high genetic diversity. A large genetic overlap was observed between APEC and certain human ExPEC, including closely related clones. These results suggest a genotype that is not host specific and may be pathogenic in both birds and humans. Ten ExPEC genes were highly associated with avian pathogenicity. The genes were encoded on transferable plasmids, which were closely related to APEC-associated pAPEC-O2-ColV, pAPEC-O1, p1ColV5155, and pAPEC-O78-ColV. Additionally, aminoglycoside and sulfonamide resistance, including specific antimicrobial resistanceassociated genes, tended to be more prevalent in A/E-PEC than in HExPEC.
Meanwhile, the large human exclusive ExPEC cluster, HExPEC-1, was identified by both SNP phylogeny and accessory-genome-derived tree, which represented a genetically stable human-specific clone. The additional HExPEC-2 cluster was identified in the accessory-genome-derived tree. Neither of the HExPEC-2 genomes harbored any of the virulence-associated plasmids related to avian hosts and carried fewer ExPECassociated virulence genes in general. However, the geotemporal relationship of the strains of the HExPEC-1 and HExPEC-2 clusters suggests that these human ExPEC clusters do not necessarily reflect specific large human host-adapted genotypes but could merely be clonally related isolates from historical outbreak investigations. Though HExPEC might denote a lower virulence potential toward birds, transmission of virulence-associated genes carried on plasmids, including A/E-PEC-associated genes, are capable of introducing avian virulence potential as well as increasing the human virulence potential. The high genomic plasticity and rapid transmission of genes in E. coli further underline the challenges when defining host associations.
These findings suggest that multiple lineages of ExPEC belonging to ST95 exists, of which the majority may cause infection in humans, while only part of the ST95 cluster seem to be avian pathogenic. These results add s to a growing body of evidence that suggests that urinary tract infection may be a foodborne disease.

MATERIALS AND METHODS
Bacterial strains and genomes. The genomes from 323 E. coli ExPEC strains belonging to the ST95 lineage were selected in accordance with the method of Johnson et al. (39) by carrying Ն2 ExPECassociated genes. In total, 291 human ExPEC genomes and 32 APEC genomes were included in this study (see Table S2 in the supplemental material). A total of 146 genomes that fulfilled the criteria were obtained from EnteroBase (http://enterobase.warwick.ac.uk/species/index/ecoli), whereas 17 genomes were from GenBank. A total of 141 genomes were obtained from a 1-year prospective study in Flagstaff, Arizona. Seven genomes were obtained from the National Veterinary Institute, Technical University of Denmark. These seven isolates were previously included in another study, and the raw reads have been made available in the NCBI SRA under the study accession number SRP092633 (isolation IDs: E25, E28, E33, E37, E41, E42, and E43) (40). An additional 13 bacterial isolates were sequenced in-house, five E. coli isolates were from the University of Georgia, and eight isolates were from the strain collection at the Department of Veterinary and Animal Sciences, University of Copenhagen. The assemblies have been made available on NCBI under the BioProject accession number PRJNA431453. The 13 bacterial isolates were sequenced by 2x250-bp paired-end sequencing on a MiSeq instrument (Illumina) and assembled using SPAdes v3.5.0 (41). Initial analysis on all isolates was performed with MLST, PlasmidFinder, SerotypeFinder, FimTyper, and ResFinder (42)(43)(44)(45).
Analysis of the core genome. Single nucleotide polymorphisms (SNPs) of the core genome were identified using NASP (46). All raw reads were aligned, and the sequence data of each genome were analyzed against the E. coli reference APEC O1 genome (GenBank accession no. CP000468). All positions with coverage less than 10ϫ or if the variant was present in Ͻ90% of the base calls were excluded. For the univariate and multivariate analyses, all bases identical to the reference were set at "0," while bases differing from the reference base were set at "1." Dense areas of polymorphisms due to recombination or misalignment were removed using CleanRecomb by identification of stretches of consecutive sites with identical SNP profiles (47). An approximately maximum likelihood phylogeny was inferred using the FastTree 2.1.5 implementation in Geneious 9.1.8 (Biomatters, Auckland, New Zealand) on the purged SNPs. The tree was rooted using E. coli ST140 IMT5155 (GenBank accession ID CP005930) strain as an outgroup. All trees were visualized using iTol (48). Univariate analysis of the identified SNPs was performed by Fisher's exact test. Significant P values (Ͻ0.05) were corrected for multiple testing using the false-discovery rate (FDR) method. Multivariate analysis of SNPs was performed by a discriminant analysis of principal components (DAPC) in R (version), using the adegenet package (49).
Analysis of the accessory genome. Open reading frames were identified and annotated with Prokka 1.12-beta with default settings, using barrnap 0.7 for rRNA prediction (50). Unannotated CDS were designated "geneXXXX." The pan-genome of annotated bacterial genomes was investigated using Roary 3.7.0 (51). The binary presence/absence data of accessory genes produced in Roary was used for the construction of an accessory binary tree as well as for univariate and multivariate analyses.
ExPEC-associated genes and plasmid analysis. The content of 55 previously described ExPECassociated genes was investigated using BLASTn as implemented in Geneious 9.1.8 (10,14,21,52). Reference genes were obtained from GenBank (Table 2). Genes were considered present with Ն85% coverage and Ն90% identity. Significant associations were analyzed using the Fisher exact test.
The genomes were investigated for large virulence-associated plasmids using PlasmidSeeker, with APEC-O1 as the reference (GenBank accession no. CP000468) (24). Identified plasmids with at least 80% Kmers found were considered a closely related variant of the respective plasmid.