In Silico Identification of Three Types of Integrative and Conjugative Elements in Elizabethkingia anophelis Strains Isolated from around the World

Elizabethkingia anophelis is an opportunistic human pathogen, and the genetic diversity between strains from around the world becomes apparent as more genomes are sequenced. Genome comparison identified three types of putative ICEs in 31 of 36 strains. The diversity of ICEs suggests that they had different origins. One of the ICEs was discovered previously from a large E. anophelis outbreak in Wisconsin in the United States; this ICE has integrated into the mutY gene of the outbreak strain, creating a mutator phenotype. Similar to ICEs found in many bacterial species, ICEs in E. anophelis carry various cargo genes that enable recipients to resist antibiotics and adapt to various ecological niches. The adaptive immune CRISPR-Cas system is present in nine of 36 strains. An ICE-derived spacer was found in the CRISPR locus in a strain that has no ICE, suggesting a past encounter and effective defense against ICE.

Elizabethkingia miricola, an isolate obtained from condensation water from Space Station Mir (3). A third species, Elizabethkingia anophelis, was proposed in 2011 (4) based on the description of the type strain R26 T that was originally isolated from the midgut of Anopheles gambiae mosquitoes maintained in Stockholm University in Sweden (5). In 2015, a new species, Elizabethkingia endophytica was proposed (6), but whole-genome sequence (WGS)-based genome comparison revealed it to be a homotypic synonym of E. anophelis (7,8). Additional species have since been added to the genus (8).
In 2011, the first human infection attributed to E. anophelis was documented in Central Africa Republic, neonatal meningitis caused by the bacterium (9). Later, an E. anophelis outbreak in an intensive-care unit in Singapore in 2012 was reported (10), followed by the worrying account of E. anophelis transmission from a mother to her infant in Hong Kong (11). In 2016, a large E. anophelis outbreak occurring in Wisconsin in the United States was unusual in that a substantial proportion of patients were not already hospitalized and were instead admitted directly from their homes. No indication of human-to-human transmission was found (12)(13)(14). Human infection cases of E. anophelis have been reported with increasing frequency around the world (13,(15)(16)(17), aided in part by improved identification methods. Several strains previously described as E. meningoseptica were determined to actually belong to the E. anophelis species (18). In addition to human-derived strains, multiple strains of the species have been isolated from its namesake, the mosquito, and their genomes have been sequenced (19)(20)(21).
Genome comparison of pathogenic bacteria in general has greatly increased our understanding of the evolution, pathogenesis, and epidemiology in many pathogen outbreak investigations (22)(23)(24)(25)(26). Genome analysis of E. anophelis strains has revealed substantial genetic diversity (11,16,(27)(28)(29)(30)(31), much of it mediated by mobile genetic elements (MGEs). Horizontal transfer of MGEs between bacterial strains occurs frequently and is a major source of genetic variation in bacteria (32). Therefore, an improved understanding of the MGEs in E. anophelis should facilitate the genomic epidemiological studies of pathogenic strains of E. anophelis in the future.
One type of modular mobile genetic elements, known as integrative and conjugative elements (ICEs), is capable of transferring between bacteria horizontally via conjugation. ICEs integrate into the host chromosome and replicate along with the genome, and they can excise, form a plasmid, and relocate to another site in the genome or transfer to another bacterial cell many generations later (33). The cargo genes that are brought in by ICEs may endow the recipient bacteria with new phenotypes (32)(33)(34)(35). For example, in Pseudomonas syringae pv. actinidiae, the resistance to copper, arsenic, and cadmium was attributed to the resistant genes that were carried in ICEs (36). In Helicobacter pylori, ICE tfs4 has been identified in certain strains mediating gastric disease. The presence of the tfs4 ICE has been associated with virulence, although mechanisms behind the virulence phenotype remain unknown (37). Several ICEs have been identified in taxa of Bacteroides, including CTnDOT (38,39), CTnERL (40,41), and CTnGERM1 (42). In E. anophelis, an ICE named ICEEa1 has been identified in the strains associated with the outbreak in Wisconsin and in unrelated strains from the outbreak in Singapore (43).
In this study, we searched for ICEs in 13 complete genomes and 23 draft genomes of E. anophelis strains around the world. Based on the architecture of conjugation modules and associated signature genes, three types of ICEs were recognized. As E. anophelis is recognized as an environmental bacterium, the high prevalence and diversity of ICEs as well as the mosaic integration pattern in different strains suggest that ICEs play a significant role in shaping its genome to adapt to different environmental niches.

RESULTS
Origin and geographic distribution of the strains. In this study, 13 complete genomes and 23 draft genomes of E. anophelis were compared. These strains were collected from mosquitoes, human patients, and environment globally across Asia, Europe, Africa, and North America between 1950s to the present (Table 1 and Fig. 1).
Core genome-based phylogeny of the strains. The pan-genome of a bacterial species is comprised of a core genome and an accessory genome. The core genome is a set of genes that are conserved and shared by all strains, while the accessory gene set is shared by only some strains (44,45). Because the core genome contains essential genes and their inheritance is necessitated, they carry more reliable evolutionary information than accessory genes do for inferring phylogenetic relationships. Hence, core genome-based single nucleotide polymorphism (SNP) typing has become a recognized method for accurate evolutionary reconstructions (46,47). The phylogenetic relationship of the core genome from these strains was reconstructed using Parsnp from the Harvest suite (47). Across the 36 genomes, Parsnp recognized 37,781 anchors and 985 maximal unique matches (MUMs) that anchored the genome align- ment. The genome alignment revealed that 68% of the genome regions were shared by all 36 genomes, so these regions were set as core genome. A phylogenetic tree was constructed using the SNPs in the core genome. The tree topology appeared similar regardless of which complete genome-R26 T , 0422, CSID3015183678, or NUHP1-was used as the reference for core genome alignment (data not shown). Based on the tree topology, the strains were sorted into three clusters ( Fig. 1 and Table 1). Cluster I contained 14 strains with the genome of strain 12012-2 PRCM at the basal position. Cluster II consisted of two clades, with six strains in each clade. The strains in this cluster were globally distributed and collected over decades starting in 1950s. Cluster III had two clades as well. One clade contained all five mosquito-derived strains, while the other clade contained strain 37-75 obtained from a human infection in France in 1979 and strain CSID3015183678, which was a representative of the strains associated with the Wisconsin outbreak in 2016 (43). There was no clear indication of geography-based clustering. Figure 1B presents the distribution of strains with their cluster/clade designation and collection date. Among the 14 strains in cluster I, nine strains were collected in Asia, two in Europe, and three in Africa. Among nine strains in cluster II, seven were collected in North America, one in Europe, and one in Asia ( Fig. 1 and Table 1). Closely related species that formed the clades a, b, c, d, and e were scattered in different geographic locations (Fig. 1). The clustering of these strains was corroborated by the higher percent identity in the protein sequences between the genomes. Overall, it appears that closely related strains can be widely separated both temporally and spatially.
Identification of ICEs in the genomes. A genome region was considered a putative ICE if it harbored genes encoding an integrase, a relaxase (which nicks the DNA strand at the origin of transfer oriT), a coupling protein VirD4 ATPase, T4CP, (which couples the relaxosome to the type IV secretion system [T4SS]), and several Tra proteins in the T4SS. These components are involved in integration, excision, and/or conjugation process, which have been used as markers for identifying putative ICEs in genomes (48,49). So far, only one plasmid has been isolated and sequenced (GenBank accession number CP016375) which is associated with strain F3201. There are no conjugation genes annotated in the complete plasmid sequence, so the plasmid is not a conjugative plasmid. Therefore, the ICE signature genes identified are a part of the genome in these ICE-containing genomes. Cargo genes are carried in the unit, many are flanking the region with the signature genes. Insertion sites of putative ICE loci were recognized and mapped in genomes based on pairwise comparison with closely related genomes The phylogenetic tree was derived from the core genome SNP comparison. The type of ICE is shown in red brackets after the strain name. The five circles labeled (a) to (e) in red demonstrate the protein identity between the genomes in the corresponding clades. The color bar represents the percent identity when a genome was compared to the reference genome. (B) Geographic distribution of the strains. Clusters were color coded, and the strain name, collection time, and clade were specified in the map. The letters correspond to the clades in panel A. Strain information was presented in Table 1. lacking the putative ICE. Among the 36 strains, 31 strains contained at least one putative ICE, and a total of 52 putative ICEs were detected. There were eight ICEs in which no integrase was found in the element. Since integrase is necessary for integration and excision, the unit lacking integrase could not conduct integration or excision. The integrase could have been lost after integration into the genome. These elements were defined as degenerated ICEs (Table 2). According to the architecture of the modular genes, these putative ICEs were classified into three types (Fig. 2). For the ease of readability, we omit "putative" before ICE in the remaining text.
Type I ICEs were featured by 13 tra genes (traABBDEGIJKMNOQ) in addition to genes coding for T4CP, relaxase, and integrase. In most cases, the T4CP and relaxase genes c There are no assemblies in the NCBI database for these four strains, so we did not submit third party annotation (TPA) for these genomes to GenBank. d Two elements, ICEEaI (8) and ICEEaIII (12) in strain NUH11 combined and integrated between tRNA-Asp-GTC and beta-lactamase. These two elements may share the same integrase.
were located tandemly upstream of the tra gene cassette, and the integrase gene was located downstream of the tra gene cassette (Fig. 2). The two traB genes in tandem were independent from each other, both with complete CDS. Such traB genes in tandem were predicted in a few genomes in E. meningoseptica and Chryseobacterium gleum as well. A total of eight type I elements were found in 11 strains (Table 2). ICEEaI(1) was described previously and named ICEEa1 in strain CSID3015183678 (Wisconsin, US, 2016). It inserted into and disrupted the gene encoding MutY, which is an adenine DNA glycosylase that is required for repairing G-A mispairs, causing the strain to be more prone to mutation (43). ICEEaI(1) was found between genes in three additional strains, NUPH1 (Singapore, 2012), 3375 (South Carolina, US, 1957), and FDAARGOS-198 (Sweden, collection time unknown). Each of the four ICEEaI(1) elements has a distinct insertion site in the respective genome (Fig. 3). Other type I ICEs inserted into intergenic regions, except for the one in strain 37-75, which inserted into the gene encoding an alkyl hydroperoxide reductase (Ahp) between codons K207 and I208. This protein is a primary scavenger of H 2 O 2 in Escherichia coli (50), so its disruption by this ICE may result in a loss of function and make the strain vulnerable to oxidative stress. Integration sites of these elements were listed in Table 2 and marked in Fig. 3. Twelve type II ICEs were identified in 13 strains (Table 2). Unlike type I ICEs, the T4CP and relaxase genes in type II ICEs were separated by one or more predicted CDS. The cassette of tra genes consisted of traABDEFGIJKMNOQ. There was a gene located between traD and traE, encoding a RadC domain-containing protein (Fig. 2). All ICEEaII ICEs integrated next to a tRNA gene. In 12 strains, the elements resided at the 3= end of the tRNA-Leu-CAA. Only ICEEaII (12) in strain NUHP1 was located after the tRNA-Ser-GGA (Table 2 and Fig. 3). Integrases were found in four of the 12 elements (see below). Integrases are required for integration and excision, so the absence of an integrase in the elements suggests that these elements may have degenerated and may not be mobilizable. These eight elements were defined as degenerated ICEs.  Table 2 for strain and ICE information.
Type III ICEs were recognized in 16 strains. The structure was quite different from that in type I and II ICEs, with the presence of only seven tra genes, traAEGJKMN. The relaxase and T4CP genes flanked the tra genes. In 11 of 16 elements, a gene was present after the traN, encoding a large protein (791 to 1,177 amino acids [aa]) with a RadC domain (Fig. 2). The type III ICEs integrated after a tRNA gene. Five tRNA genes, tRNA-Arg-ACG, tRNA-Gln-TTG, tRNA-Asp-TGA, tRNA-Ser-TGA, and tRNA-Glu-TTC, were targeted by ICEEaIIIs (Fig. 3). In strain NUHP1, two type III elements, ICEEaIII (7) and ICEEaII (8), were colocated tandemly between two tRNA-Asp-GTC and separated by a tRNA-Asp-GTC. In strain NUH11, ICEEaI (8) and ICEEaIII(12) combined as one segment, which was localized between the tRNA-Asp-GTC and the gene encoding a betalactamase. In most type III elements, at least one integrase gene was present.
In sum, eight type I ICEs, 12 type II ICEs (including eight degenerated elements), and 19 type III ICEs were identified, and 15 elements in 11 draft genomes were partially assembled ( Table 2). The size of ICEs was up to 116.3 kb in complete genomes. Annotated ICEs were listed in Table S1 in the supplemental material, which included predicted CDS and gene functions. In 36 genomes examined, 17 carried one ICE and 14 harbored more than one ICE. No ICEs were found in these five strains: PW2810, B2D, 8707, CIP11067, and GTC_10754 (Tables 1 and 2).
Phylogenetic relationship of the ICEs. To track the evolutionary history of these ICEs, the nucleotide sequences of the genes encoding relaxase, T4CP, TraG, and TraJ from each ICE were compared. As shown in Fig. 4, the type I and II elements were more closely related, forming a clade distinct from the type III clade. This pattern was consistent with the type classification based on the structure of the conjugation module (Fig. 2). The sequences of type I and II relaxase and TraJ genes had less resolution than the T4CP and TraG genes to separate type I from type II elements. The conservative nature of the relaxase and TraJ genes suggests that both may have been under functional constraint with limitation for changes during evolution. The relaxase gene of ICEEaI(2) in strain FDAARGOS-198 was an outlier in the type I elements. In Fig. 1, the ICE type of each strain was marked in red brackets on the SNP core genome tree. There was no clear association pattern between ICE types and core genome clusters, and no geographic or temporal patterns were observed in the distribution of the ICE types.
Integrases. Integrases are required for ICE integration and excision. A total of 43 tyrosine recombinases (TRs) were found in 8/8 type I ICEs, 1/12 type II ICEs, and 18/19 type III ICEs. One ICE may carry up to four tyrosine type recombinases (Table 2). A phylogenetic tree of tyrosine recombinases was constructed using protein sequences from the E. anophelis ICEs as well as several Bacteroides transposons. As shown in Fig. 5, TRs in type I ICEs form a clade with IntDOT at the basal position, and two TRs of type III ICEs are included in the clade. The clade has strong bootstrap support. However, the protein sequence identity between the TRs in type I ICEs and IntDOT is only about 41%. The ICEEaI(2)FDAARGOS-198_2 is an outlier. The TRs in type III ICEs show greater sequence divergence and split into several clades, with most clades receiving strong bootstrap support. One of these clades contained IntN1 which is associated with NBU1, a mobilizable transposon, in Bacteroides (51). DDE transposases were identified in 8/19 type III ICEs and one type I ICE. In type II ICEs, tyrosine recombinase was found only in one type II ICE, ICEEaII(12) NUHP1 ( Fig. 5 and Table 2). Serine recombinases were found in ICEEaII(10) JM-87 and ICEEaII(9) NCTC10588. In addition, transposases in the IS481 family (52) are present in ICEEaII(2) 502 and ICEEaII(10) JM-87 (Table 2). So, in 12 type II CIEs, only four carry an integrase. No integrases were identified in the remaining eight elements. These elements may have degenerated and become functionally deficient.
Cargo genes. In addition to the module genes that define an ICE, there are a diverse cargo of genes that are carried in ICEs. The cargo genes in each ICE are presented in Table S1. Restriction-modification (R-M) system components such as DNA modification methylases, type I R-M systems, and type II restriction enzymes were prevalent in the ICEs of all three types. In addition, the anti-restriction protein ArdA was detected in FIG 4 Phylogenetic relationship of the T4CP, relaxase, TraG, and TraJ genes associated with ICEs. The nucleotide sequences from different ICEs were aligned, and the evolutionary history was inferred using the neighbor-joining method. The evolutionary distances (Continued on next page) three type II ICEs and one type III ICE. A beta-lactamase gene was found in two ICEs in two strains. The component genes of tripartite nodulation-division RND complex multidrug efflux pump system (53) were carried in the type I ICEs in six strains, one of the three type III ICEs in strain NUHP1, and the type II ICE in strain F3543 (Table S1). ABC transporters for various substrates, such as manganese, potassium, and oligopeptides, were found in all ICEs. TonB-dependent receptors for siderophore import or carbohydrate uptake (SusC) were present in several ICEs. There were various transcriptional regulators in the ICEs, such as AraC family, ArsR family, MarR family, HxlR family, and TetR family. The AraC family transcriptional regulators (AFTRs) were very prevalent; a total of 36 of these were found in the ICEs in 19 strains. In addition, a two-component regulatory system is present in the type I ICEs in three strains. Some ICEs also carried DNA topoisomerases, helicases, primases, and polymerases, presumably for independent replication while in plasmid form.
CRISPR-Cas loci. A type II-C CRISPR-Cas system (54) containing a CRISPR array and the genes encoding Cas9, Cas1, and Cas2 proteins was detected in nine strains (Table 1) with a prevalence of 25% (9/36). The unit is located downstream of the gene encoding a cobalt-zinc-cadmium resistance protein CzcD. The number of spacers varies between 6 in strain CIP111046 to 47 in strain CIP111067. LDVH-AR107 was isolated from the internal organ of a common carp Cyprinus carpio collected in 2004 in Montpellier, France. It has two CRISPR loci, each with 21 spacers. Interestingly, strains LDVH-AR107, Po0527107, and V0378064 share 12 spacers, despite the difference of geographic origins and sampling time ( Table 1), suggesting that these CRISPR loci may have been derived from a common ancestor. The CRISPR locus of strain GTC_10754 carried 32 spacers, and the spacer at the far end of the CRISPR array might be derived from the gene encoding a hypothetical protein located in the region between TraG and Tra J in a type III ICE. The predicted spacer (CTTATTTAAATATTATGCTAACAGAGATAA) was 30 nt long, the 2 to 30 nucleotides of the spacer, TTATTTAAATATTATGCTAACAGAGATAA, had a perfect match with the target sequence, or corresponding protospacer, in five type III ICEs, ICEEaIII(1), -(3), -(6), - (9), and - (15), and the same spacer had one mismatch with the protospacer in four other elements, ICEEaIII(2), -(4), - (13), and - (18). These protospacer carrying ICEs are present in five strains derived from mosquitoes (R26 T , Ag1, As1, AR4-6, and AR6-8), three strains derived from human infection (0422, CIP60.85, and EM361-97), and one strain derived from corn (JM-87). It is conceivable that the spacer might have been acquired from previous encounters with the type III ICEs and inserted into the CRISPR locus. Theoretically, strain GTC_10754 can target the ICEs with the protospacer. In fact, no ICEs are present in strain GTC_10754 ( Table 1). The CRISPR loci in the other eight strains do not contain any spacers derived from the ICEs in this study.

DISCUSSION
Many genomes have been sequenced for E. anophelis strains derived from different sources, including mosquitoes, human infections, hospital environments, fish, and corn stems. This genome availability enabled a comparative genomics approach to investigate the genetic architecture and repertoire of the E. anophelis population.
Core genome-based phylogeny. The core genome is shared by all strains and consists of essential genes that are vertically transmitted, while genes in the accessory genome are present only in a subset of strains. The sum of the core and accessory genomes in all strains constitutes a pan-genome for the species (45,55,56). Due to its inheritability, the core genome is intrinsically suitable for inferring phylogenetic relationships (47). In the present study, the core genome SNP analysis separated the E. anophelis strains into three clusters (Table 1 and Fig. 1); however, no correlation patterns were found between genetic relatedness and spatial distribution. The E.

FIG 4 Legend (Continued)
were computed using different models to reconstruct the phylogenetic trees with the bootstrap test using 1,000 replicates, which generated similar tree topology. The consensus trees generated using the Kimura two-parameter model were presented. The bootstrap values are shown on the nodes. Three Types of ICEs in Various Strains of E. anophelis anophelis outbreak in Singapore in 2012 was caused by isolates derived from the hospital environment (10,27). The Wisconsin outbreak in 2016 showed a different pattern; many patients were likely infected separately before they were hospitalized (12). Recently, an epidemiological genomic survey using core genome comparison was FIG 5 Phylogenetic relationship of the tyrosine recombinase proteins associated with ICEs. The protein sequences of tyrosine recombinases were aligned, and the evolutionary history was inferred using the neighbor-joining method. The evolutionary distances were computed using different models to reconstruct the phylogenetic trees with the bootstrap test using 1,000 replicates, which generated similar tree topology. The consensus tree generated using the JTT model was presented. The bootstrap values are shown on the nodes. reported on the transmission pattern of global spread of aggressive nontuberculous Mycobacterium abscessus (57). Originally, human infections were caused by M. abscessus isolates acquired from the environment. However, the dominant circulating strains within the global M. abscessus patient community may be mediated by nosocomial fomite spread (57). Some cases may result from hospital-based cross-infection (58). The core genome data from the present study suggest that E. anophelis infections have been caused by diverse strains acquired from the environment. The mosquito-derived strains from three mosquito species were clustered together and distinct from the strains derived from human infections (Fig. 1). Association of E. anophelis has been documented in several different mosquito species, including Anopheles gambiae (59), Culex quinquefasicatus (60), and Aedes aegypti (61), and a role for the bacteria in larval development has been hypothesized. The core genome phylogeny suggests that mosquito-associated strains and strains derived from human infections were not epidemiologically linked.
Diverse types of ICEs in the strains. Genome diversification promotes bacterial adaptation and evolution. ICE, a type of mobile genetic element, contributes significantly to the pan-genome reservoir with potentially adaptive genes (33,48). Several ICEs have been found in the taxa of Bacteroides (42,49,(62)(63)(64), which belong to the family Bacteroidaceae in the phylum Bacteroidetes. Of these elements, CTnDOT in Bacteroides thetaiotaomicron has been studied extensively (39,(65)(66)(67). The ICEs identified in E. anophelis are structurally distinct from CTnDOT. Conserved features of relaxase, coupling protein, and conjugative elements have been used for identifying ICEs in a wide variety of genomes (48,49). Using this approach, we categorized the ICEs in E. anophelis strains into three types based on the architecture of the elements and the phylogeny of sequences from four genes in the elements (Fig. 2 and 4). Type I elements are closer to type II elements. Diverse integrases were associated with ICEs. Tyrosine recombinases are present in all type I and III ICEs. Serine recombinase, tyrosine recombinase, or transposase was found in 4/12 type II ICEs, and the remaining eight elements lack an integrase. In that integrase is essential for integration and excision of an ICE, the absence of integrase in the ICEs suggests that these elements had degenerated and were no longer mobile. Type II and III ICEs tended to integrate adjacent to a tRNA gene, while the type I ICEs used a non-tRNA gene region for integration (Fig. 3). Integration into a gene could result in a loss of function. For example, all strains attributed to the 2015 Wisconsin E. anophelis outbreak, including strain CSID3015183678, contained an ICE disrupting the mutY gene, which is required for excision of G-A mismatch (68,69) and consequently demonstrated a high rate of nonsynonymous to synonymous substitutions compared to strains with an intact mismatch repair system (43). The presence of various element-associated integrases leads to diverse integration loci (including tRNA gene and non-tRNA loci) which in turn increases the ability of ICEs to contribute their diverse and dynamic genetic repertoire to E. anophelis pan-genome (48,64).
Among the cargo genes carried by the ICEs for which a function could be identified, most are involved with host defense, nutrient acquisition, or transcription regulation. Innate immune mechanisms like restriction-modification systems provide host defense by protecting against invading DNA (70,71). RND type multidrug efflux pumps enable the cell to defend against environmental toxins by recognizing and expelling various structurally diverse compounds (72), including antibiotics, as shown previously in strain R26 T (29) and in the strains associated with the outbreak in Singapore (27,28). Regarding nutrient acquisition, a variety of ABC transporters and TonB-dependent receptors are present in the ICEs, potentially improving the host's ability to satisfy its nutrient needs in a nutrient-limited environment. The ICEs also equip the host with a variety of transcriptional regulators, with those in the AraC family being particularly prevalent (see Table S1 in the supplemental material). Transcription regulators of this sort can sense various chemical signals involved in carbon metabolism, quorumsensing signaling, virulence, and stress response, modifying expression of a global gene repertoire (73)(74)(75)(76). Thus, the genetic flexibility conferred by ICEs enables the host to quickly adapt to eco-niche changes, which may contribute to establishment of an infection in humans as well.
CRISPR loci. The CRISPR-Cas system is an adaptive immune mechanism against invading nucleic acids. In this study, the type II-C CRISPR-Cas system was identified in nine strains of E. anophelis (Table 1). Interestingly, three strains that were collected separately share 12 identical spacers in their CRISPR loci, suggesting a common origin of the CRISPR. Besides, one of the 32 spacers in the CRISPR array of strain GTC_10754 might have been derived from a gene present in nine of the type III ICEs. The presence of this spacer suggests that the CRISPR-Cas system may have encountered and eliminated a type III ICE in the past. The defense is target specific. In fact, no ICE is present in strain GTC_10754. It has been demonstrated that a type II-C CRISPR-Cas system from Riemerella anatipestifer was able to acquire spacers from a transformed exogenous plasmid and target the plasmid (77). Similarly, the CRISPR-Cas system in Enterococcus faecalis functions as immune barriers to the acquisition of a conjugative plasmid (78). On the other hand, five CRISPR-carrying strains harbored ICEs as well (Table 1). However, the CRISPR loci in the strains do not contain any spacers derived from the ICEs in this study, so the CRISPR-Cas systems in these strains are not able to recognize and target these ICEs. The coexistence of mobile elements and CRISPR defense system demonstrate an evolutionary balance between the stability and plasticity of the genome (79,80). A well-characterized example of genome stability mediated by the CRISPR-Cas systems can be found in Flavobacterium columnare, where they were shown to play a role in host-phage interactions, driving long-term genome coevolution through arms race-like mechanisms between the host and phages (81). Conversely, the CRISPR-Cas machinery appeared not to play a role as a resistance mechanism against phage for the fish pathogen Flavobacterium psychrophilum, which suggests we still have much to learn about the dynamics of CRISPR-Cas systems. Availability of the E. anophelis strains that possess CRISPR-Cas with and without ICEs enables further studies on coevolution of mobile elements and host defense systems in the bacteria.
Conclusion. In this study, we identified ICEs in the genomes of 31 of 36 E. anophelis strains isolated from a wide array of hosts, collected at diverse geographic locations over several decades. The ICEs can be categorized into three types based on the distinctive architecture of the conjugation module and integration sites. The identification of the ICEs warrants further studies on the genetic diversity of the pan-genome and its impact on the virulence and pathogenesis of this global opportunistic pathogen.

MATERIALS AND METHODS
Bacterial strains. In this study, the complete genomes from 13 strains and draft genomes from 23 strains were analyzed (Table 1). Mosquito-derived strains were isolated from Anopheles gambiae (R26 T and Ag1), Anopheles stephensi (As1), and Anopheles sinensis (AR4-6 and AR6-8). Strain CSID3015183678 was one of the strains associated with the 2015-2016 Wisconsin outbreak, which has been characterized previously (43), and strains NUHP1, NUH6, and NUH11 were representative strains isolated from the Singapore outbreak; the genomes of these strains have been described (27). Strain LDVH-AR107 was derived from a common carp Cyprinus carpio. Strain JM-87 was isolated from maize Zea mays and was originally described as "E. endophytica" until genome comparison identified it as belonging to the species E. anophelis (7). The other strains were all derived from human patients or hospital environments, collected from 1950 to 2016. The draft genomes of strains LDVH-AR107, 8707, NCTC10588, AmMS250, and 37-75 were reassembled using Illumina reads downloaded from the SRA database. The reads were de novo assembled by CLC genomics workbench v 10.1.1. The 36 genomes were annotated using the SEED and Rapid Annotations using Subsystems Technology (RAST) at the RAST server (82).
Phylogenetic relationship of the strains based on core genome comparison. The Parsnp module in the Harvest suite (47) was used for estimating the phylogenomic relationships of the strains. The core genome that is shared by all 36 strains was identified by multiple genome alignment, and SNPs in the core genome were typed to infer phylogenetic relationships; the process was implemented by using Parsnp. Four complete genomes from strains R26 T , 0422, CSID3015183678, and NUHP1, were used as reference for the process, respectively. Each complete genome was used iteratively as the reference for tree construction, and tree topology of all four core genome trees was unaffected by choice of reference genome. The tree with R26 T as the reference was depicted in Fig. 1.

Identification of ICEs.
To identify an ICE, the RAST annotation of each genome was searched for clusters of genes coding for an integrase, relaxase, coupling protein (T4CP), and transfer (Tra) proteins, including a VirB4 ATPase (TraG) in the conjugation module. These proteins are the key components of an ICE (48,83). In type II ICEs, eight elements lack any type of integrases; these elements were defined as degenerated elements. The boundary of an element was delimited as between the two open reading frames (ORFs) that flank the ICE. The genome of the type strain R26 T was used as a reference to mark the integration sites. Each ICE was categorized into one of the three types based on its gene structure (Fig. 2). The ICEs were named based mainly on the nomenclature proposed by Burrus et al. (34): the acronym ICE was followed by the initials of the name of the bacterium (Ea), a Roman numeral as type, a strain name and a ordinal number in brackets to identify the same sequence if it is encountered in a different strain. For example, the type I ICE found in both NUHP1 and CSID3015183678 strains would be named ICEEaI(1)_CSID3015183678 and ICEEaI(1)_NUHP1, respectively. Each type has its set of numbers; for example, the four type III ICEs in strain NUHP1 are designated ICEEaIII(1)_NUHP1 through ICEEaIII(4)_NUHP1.
Inference of phylogenetic relationships of selected genes. The phylogenetic relationships of the relaxase, T4CP, TraG, and TraJ genes were inferred using nucleotide sequences. For tyrosine type recombinases, protein sequences were used for inference. The sequences were aligned using ClustalW, and the sequence alignments were used to infer evolutionary history using the neighbor-joining method. The evolutionary distance was computed using different models, which resulted in similar tree topology. The trees using Kimura two-parameter model for nucleotide sequences were presented in Fig. 4, and the tree using the Jones-Taylor-Thornton (JTT) model for protein sequences was presented in Fig. 5 A bootstrap test with 1,000 replicates was implemented to determine the percentage of clustering of associated taxa. The phylogenetic analyses were conducted in MEGA7 (84). IntDOT and IntN1 were included in the analysis of tyrosine recombinases. The GenBank accession numbers of IntDOT and IntN1 follow: InDOT-Bt, CAC47921; IntN1_NBU1, AF238307.1.
Identification of CRISPR sequences. Genome sequences were searched for CRISPR using CRISP-Rfinder (85). The spacers located were then used as query to search against ICEs to identify protospacer sequences.
Accession number(s). The genome assemblies of 31 strains were used for ICE identification and annotation. The GenBank accession numbers are listed in Table 1. The draft genomes of strains LDVH-AR107, 8707, NCTC10588, AmMS250, and 37-75 were reassembled using Illumina reads downloaded from the SRA database. The SRA accession numbers are listed in Table 1. ICEs were reannotated in the respective genomes. The nucleotide sequence data reported are available in the Third Party Annotation Section of the DDBJ/ENA/GenBank databases under the ICE TPA accession numbers BK010586 to BK010626, which are listed in Table 2.
Ethics statement. The work was conducted following the policy of New Mexico State University Institutional Biosafety Committee.

ACKNOWLEDGMENTS
This work was supported by the National Institutes of Health (SC1AI112786 to J.X.), National Science Foundation (1633330 to J.X.), and CDC program funds designated for the study of emerging infectious agents. Funding for the open-access charge was provided by the National Institutes of Health (SC1AI112786).
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, National Science Foundation, and Centers for Disease Control and Prevention. Mention of company names or products does not constitute endorsement.
J.X. conceived of and designed the study. J.X., D.P., Y.L., and Q.X. collected genome data and performed data analysis. J.X. and A.N. wrote the manuscript.
We declare that we have no conflicts of interest.