Complete assembly of Escherichia coli ST131 genomes using long reads demonstrates antibiotic resistance gene variation within diverse plasmid and chromosomal contexts

The incidence of infections caused by extraintestinal Escherichia coli (ExPEC) is rising globally, which is a major public health concern. ExPEC strains that are resistant to antimicrobials have been associated with excess mortality, prolonged hospital stays and higher healthcare costs. E. coli ST131 is a major ExPEC clonal group worldwide with variable plasmid composition, and has an array of genes enabling antimicrobial resistance (AMR). ST131 isolates frequently encode the AMR genes blaCTX-M-14/15/27, which are often rearranged, amplified and translocated by mobile genetic elements (MGEs). Short DNA reads do not fully resolve the architecture of repetitive elements on plasmids to allow MGE structures encoding blaCTX-M genes to be fully determined. Here, we performed long read sequencing to decipher the genome structures of six E. coli ST131 isolated from six patients. Most long read assemblies generated entire chromosomes and plasmids as single contigs, contrasting with more fragmented assemblies created with short reads alone. The long read assemblies highlighted diverse accessory genomes with blaCTX-M-15, blaCTX-M-14 and blaCTX-M-27 genes identified in three, one and one isolates, respectively. One sample had no blaCTX-M gene. Two samples had chromosomal blaCTX-M-14 and blaCTX-M-15 genes, and the latter was at three distinct locations, likely transposed by the adjacent MGEs: ISEcp1, IS903B and Tn2. This study showed that AMR genes exist in multiple different chromosomal and plasmid contexts even between closely-related isolates within a clonal group such as E. coli ST131. Importance Drug-resistant bacteria are a major cause of illness worldwide and a specific subtype called Escherichia coli ST131 cause a significant amount of these infections. ST131 become resistant to treatment by modifying their DNA and by transferring genes among one another via large packages of genes called plasmids, like a game of pass-the-parcel. Tackling infections more effectively requires a better understanding of what plasmids are being exchanged and their exact contents. To achieve this, we applied new high-resolution DNA sequencing technology to six ST131 samples from infected patients and compared the output to an existing approach. A combination of methods shows that drug-resistance genes on plasmids are highly mobile because they can jump into ST131’s chromosomes. We found that the plasmids are very elastic and undergo extensive rearrangements even in closely related samples. This application of DNA sequencing technologies illustrates at a new level the highly dynamic nature of ST131 genomes.


Introduction
Sample collection 110 111 Six ESBL-producing E. coli ST131 clinical strains were isolated in June-October 2015 from patients at 112 Addenbrooke's Hospital, Cambridge, as part of a study on longitudinal surveillance of antibiotic 113 resistance in the hospital (Supplementary Table 1). Five samples were from faeces, and one was from 114 blood. These were short-read sequenced in a multiplex run on an Illumina HiSeq 2500 platform and 115 processed as previously outlined [27].

117
High molecular weight DNA extraction 118 119 Frozen stocks of the six isolates were streaked onto LB agar plates and grown overnight at 37 o C. Oxford Nanopore library preparation and sequencing 127 128 DNA was quantified using a Quant-iT™ HS (High Sensitivity) kit (Invitrogen). DNA purity was 129 checked using a Nanodrop (ThermoFisher) and fragment size was confirmed by FEMTO Pulse (Nano  The short reads used in this study were created as follows: bacterial genomic DNA was extracted using 138 the QIAxtractor (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. Library 139 preparation was conducted according to the Illumina protocol and sequenced (96-plex) on an Illumina applied SPAdes v3.12 to incorporate short reads and bridge gaps. Pilon was run several times to 163 achieve the most contiguous and completed genome assemblies.

165
Genome assembly assessment and error rate quantification 166 167 The quality of resulting assemblies was assessed using Quast 3.0 [32] according to the total assembly 168 length, number of contigs, N50, GC content and degree of replicon circularization. Assembly graphs 169 were visualized with Bandage [33]. The resulting contigs in each assembly were classified as 170 chromosomal or plasmid using machine learning algorithms implemented in mlplasmids [22].

174
The read depth of each replicon was estimated by aligning the short Illumina and long Oxford 175 Nanopore reads to the completed genomes using Smalt v0.7.6 and BWA-MEM v0.7.17 (with the flag 176 -x ont2d for ONT reads), respectively. SAMtools v1.7 was used to process the SAM files to BAM 177 format, remove duplicates, and identify the coverage at each base of each assembly. The median value 178 for each replicon was noted and was normalized using the median chromosomal depth of the same 179 assembly.  To provide a phylogenetic context for these six isolates, the short

208
Oxford Nanopore long read quality control and filtering 209 210 High molecular weight DNA from six E. coli ST131 isolates was sequenced using long Oxford 211 Nanopore reads and short Illumina reads to assemble their genomes allowing for plasmid 212 reconstruction and resolution of AMR genes, MGEs and associated rearrangements. The ONT 213 GridION X5 sequencing generated 8.9 Gbases in total across 1,406,087 reads (mean length of 6.3 Kb, 214 Table 1). The number of reads generated per hour, total yield of bases over time, read length leaving 1,142,067 reads with 8.2 Gbases with a mean Q score of 10.2 and a mean length of 7.2 Kb 219 ( The initial number of reads per library ranged from 127,118 to 510,253 and these were filtered using a 235 series of steps to ensure that the reads used for each of the six assemblies had high quality. Bases were 236 successfully called at an average of 97.9% of reads (    Table 3. Total size of assemblies, chromosomes and plasmids found in each strain based on their 276 optimal whole genome assemblies using the GridION X5 long reads. Each assembly had seven or less   flanked by ISEcp1 to its 5' and Tn2 followed by IS26 at its 3' end, with another Tn2 5' of ISEcp1.
its IncFII plasmid and was flanked by a truncated ISEcp1 at its 5' end and Tn2 at its 3' end, with IS26 311 copies 5' and 3' of these segments.