Influenza A Virus Field Surveillance at a Swine-Human Interface.

Swine are influenza virus reservoirs that have caused outbreaks and pandemics. Genomic characterization of these viruses enables pandemic risk assessment and vaccine comparisons, though this typically occurs after a novel swine virus jumps into humans. The greatest risk occurs where large groups of swine and humans comingle. At a large swine exhibition, we used Nanopore sequencing and on-site analytics to interpret 13 swine influenza virus genomes and identified an influenza virus cluster that was genetically highly varied to currently available vaccines. As part of the National Strategy for Pandemic Preparedness exercises, the sequences were emailed to colleagues at the CDC who initiated the development of a synthetically derived vaccine designed to match the viruses at the exhibition. Subsequently, this virus caused 14 infections in humans and was the dominant U.S. variant virus in 2018.

M obile sequencing technology, specifically the MinION sequencing platform from Oxford Nanopore Technologies (Oxford, UK), has successfully generated rapid genomic surveillance data on-site during outbreaks of Ebola (1), Zika (2), and Lassa (3) viruses. These studies relied on the transfer of raw read data to centralized institutions harboring high-performance computational resources to perform extensive phylogenetic analyses. These analyses resulted in a deep understanding of the viral evolution during the outbreak, as well as an estimation of transmission chains. Their reliance on vast databases and computationally intensive algorithms limited their ability to influence real-time prevention strategies and potentially stop further transmission events. Furthermore, the lack of vaccines to these viruses meant that the real-time-derived sequence data could not be leveraged to identify a suitable vaccine. On the other hand, influenza viruses have multiple enzootic reservoirs, cause annual epidemics in humans, and rapidly evolve, which necessitates constant genomic surveillance and iterative vaccine development. Given this need, we have an opportunity to mitigate the risk of an influenza outbreak at its source in real time by analyzing the viral genomes present and, subsequently, developing the best-matched candidate vaccine virus (CVV).
Influenza A viruses (IAVs) circulating in swine have the potential to infect humans and are capable of causing pandemics (4), as evidenced by the 2009 H1N1 virus pandemic (Fig. 1). Since that pandemic, there have been 465 human cases of swineorigin influenza viruses (termed variant influenza viruses) in the United States resulting from exposure to swine, most commonly during attendance at agricultural fairs, and sporadic cases have been reported periodically in other countries (5). Exhibitors travel with their swine across geographical regions in the United States to attend national, state, and local exhibitions. At these exhibitions, swine of various ages and from many geographical regions intermingle. Throughout the exhibition, swine are walked around the barns, shuttled into staging pens, and exhibited in a corral simultaneously with a dozen or more swine. The swine are typically shepherded by children, as many exhibition rules require that individuals presenting a swine within the exhibition ring be under the age of 20. The exhibition itself can last up to a week, during which time swine are living in immediate proximity to others and have direct contact with humans. Children in particular may be immunologically naive to swine influenza viruses that are genetically and antigenically distinct from recent seasonal influenza viruses or vaccines to which they have been exposed (https://www.cdc.gov/flu/swineflu/variant-cases-us .htm).
Due to the sporadic nature of these infections, the immunologically naive population affected, and the persistent circulation of evolutionarily diverse IAVs in the swine host, the Centers for Disease Control and Prevention and other World Health Organization (WHO) Collaborating Centers for Influenza and Essential Regulatory Laboratories have developed prepandemic CVVs that target specific swine influenza virus subtypes and HA gene lineages that have caused variant virus infections (6). To date, 11 CVVs representing antigenically diverse groups of swine influenza viruses have been proposed or already developed for prepandemic preparedness purposes (https://www .who.int/influenza/vaccines/virus/en/).
Collecting samples in the field, shipping them to centralized laboratories, sequencing their genomes, and analyzing them often takes several weeks or months. Thus, the current surveillance process is less able to contribute real-time data during the emergence of pandemic threats (7). The increased risk at this unique interface warrants concerted proactive surveillance and simultaneously serves as a proving ground for new real-time in-field genomic approaches to help prevent zoonotic infections and potential outbreaks. Here, we describe the development and deployment of a rapid and portable IAV sequencing pipeline we call Mia (Mobile Influenza Analysis [miə, MEE-uh]) and the genomic results obtained from this surveillance.
Sample processing. Our surveillance target was a large agricultural event featuring exhibition swine. The swine began arriving on Sunday (day 0) and received an initial veterinary screening upon entry. We began scouting swine on day 3 for influenza-like illness (ILI) and noted the locations in the exhibition barn of animals with clinical signs of respiratory disease. During the day, the barns were very crowded with swine, presenters and their families, farm hands, visitors, and event staff. To avoid interfering with the event's proceedings, swabbing and sequencing were conducted during the event's off-hours. On day 4, we swabbed ILI-identified swine and their pen neighbors (n ϭ 94) for influenza A virus (IAV) with the Flu Detect swine kits. We detected seven IAV-positive samples but suspected that additional samples collected from pen neighbors might be positive by sequence analysis. At 11 p.m. on day 4, we erected the Mia platform inside the exhibition barn and began the workflow on 24 samples collected from the rapid-test-positive swine and their immediate neighbors. Overnight, we extracted RNA, amplified the full genome, barcoded amplicons, and quantified the barcoded amplicons. Sample concentrations ranged from 12.7 to 41.9 ng/l (Table S2 [https://figshare.com/s/b4cc885050283a40dfcd]). We normalized and pooled the barcoded samples to 1 g total. The final library amounted to 279 ng. The library was ready for sequencing at 6 a.m., and at this time, people were arriving at the barn to begin exhibition activities. To avoid disrupting the event, once the sequencing was initiated, we transported the sequencer attached to the laptop powered by the laptop's battery to a hotel room.
Base calling and genome assembly. Sequencing produced 99,988 reads that were base called and demultiplexed with Albacore v2.2.7. A total of 67,105 reads passed a basic quality control (QC) threshold Q value of Ն7, and 22,552 of these reads were assigned to one of 24 barcodes (Fig. S1 [https://figshare.com/s/b4cc885050283a40dfcd]). The distribution of read lengths followed an expected pattern, with peaks corresponding to IAV segment lengths (Fig. S1 at the URL mentioned above). We assembled demultiplexed reads with IRMA (10) and obtained full IAV genomes for 13 of the 24 samples, including all samples that had threshold cycle (C T ) values of Ͻ30 (C T 30 Ӎ 100 50% egg infective doses per ml [EID 50 /ml]; data not shown) ( Table 1; see also    3A).
Real-time comparison of HA proteins to candidate vaccine viruses. Mia also provided amino acid difference tables of the mature HA protein sequence versus a set of WHO candidate vaccine viruses (CVVs). A/Ohio/9/2015 and A/Ohio/13/2017 were the nearest CVVs to the H1N1 and H3N2 viruses and had 2 and 0 antigenic changes, respectively. These two viruses also had relatively low sequence coverage of the HA segments ( Fig. S4 and S19 [https://figshare.com/s/b4cc885050283a40dfcd]). A/Ohio/35/ 2017 was the most similar CVV in our database to the 11 H1N2 viruses and had 32  Table S4 at the URL mentioned above). Seven differences were identified in predicted H1 antigenic sites, including 4 differences in antigenic site A or B (16). More generally, the H1N2 virus HA genes were only ϳ93% identical to the nucleotide sequence of A/Ohio/35/2017 and, as discussed above, had NA genes belonging to different ancestral lineages. Based on the genetic differences and markers of antigenic drift detected by the analysis, the HA and NA consensus sequences, including untranslated regions (UTRs), were electronically transmitted to the CDC's Vaccine Preparedness Team in the Influenza Division to initiate the development of a synthetic candidate vaccine virus. We were able to start this intervention from the field roughly 18 h after the initial setup of Mia (Table S5 [https://figshare.com/s/b4cc885050283a40dfcd]).
Postfield analyses. To assess the MinION-derived sequence accuracy, we transported RNA and DNA amplicons created on-site to the CDC in Atlanta, GA, for confirmatory sequencing via the Illumina MiSeq platform using the standard Influenza Genomic Team (IGT) surveillance pipeline (10). On the MiSeq platform, 13 samples had completed genomes with average coverage of Ն100ϫ for each segment (IGT's standard QC threshold). Of those 13 samples, 12 had a MinION-derived mean coverage of Ն5ϫ across segments, with 8 segments having Ն20ϫ coverage across HA (Table S3 [https://figshare.com/s/b4cc885050283a40dfcd]), the primary target of protective immune responses and the most important protein for vaccine development. There was high concordance (R 2 ϭ 0.87; Fig. S31 at the URL mentioned above) of the mean segment coverage produced by MinION and MiSeq for field-derived amplicons. Consensus identities averaged 99.3% across the 13 genomes (Table S3 and Fig. S32 at the URL mentioned above).
As our real-time Mia database contained human cases of swine IAV, we performed additional postfield phylogenetic analyses using the outbreak data, the Mia database, and additional swine IAV sequences that represent the genetic diversity and evolutionary relationships of contemporary swine IAV (11,12,17). Compared to 126 swine H1 sequences, our Mia-identified swine H1N2 viruses were in a monophyletic clade with contemporary 1B.2.1 (H1-␦2) HA genes from the human seasonal lineage ( Fig. 3B and C). These viruses predominantly pair with N2 sequences that have been circulating in swine since at least 1998 as part of a triple-reassortant internal gene (TRIG) group of viruses (12, 13) ( Fig. 1) (see also Fig. S36 [https://figshare.com/s/ b4cc885050283a40dfcd]). Across all segments, MiSeq-derived data mirror the tree topology created in real time in the field from MinION-derived sequences (Fig. 3) (see also Fig. S22 to S29 and S33 to S42 in the URL mentioned above).
To compare the viral diversity we found in our MinION-derived viral sequences to what fully existed at the exhibition, we performed our standard IAV surveillance procedure in which we sampled 425 additional swine regardless of ILI. From these samples, we detected 136 (32%) IAV-positive specimens by real-time reverse transcription-PCR (rRT-PCR) and successfully sequenced 65 full genomes on a MiSeq platform. We found that the diversity of viruses collected and sequenced by more traditional surveillance methods was also detected in similar proportions by our Mia field sequencing (

DISCUSSION
We successfully developed a mobile next-generation sequencing (NGS) suite called Mia and deployed it to a high-priority swine-human interface that resulted in our identification of an influenza A virus outbreak. We created Mia by selecting and optimizing laboratory equipment that fits into a cooler and two standard-sized suitcases (Fig. S43 [https://figshare.com/s/b4cc885050283a40dfcd]), which can be set up and operated in the field by two people (Fig. S44 at the URL mentioned above) to produce a high-quality multiplexed NGS amplicon library in 7 h. Our automated analysis pipeline performs base calling, quality control, genome assembly, phylogenetic analysis, BLAST searches, and amino acid comparisons to current CVVs without an Internet connection. We are able to monitor all of these results in real time on the laptop performing the analyses. Mia will be continuously improved and adapted for use in harsher field environments (see Text S1 at the URL mentioned above).
In the field, the data generated indicated that while multiple lineages were cocirculating, the numerically dominant subtype, H1N2, was genetically different from the most similar WHO prepandemic CVV, including variation in known antigenic sites. Moreover, we determined that these viruses posed a risk of causing disease in a young population, as similar H1 viruses disappeared from seasonal circulation in humans in 2010 and were replaced by the 2009 H1N1 pandemic virus. Therefore, the sequence data were used to initiate the synthesis of an optimal CVV approximately 18 h after unpacking Mia. This proactive strategy of identifying the predominant swine influenza viruses in exhibition pigs prior to the start of annual agricultural fairs in the United States proved to be very useful, as zoonotic infections caused by the predominant A(H1N2) virus found in exhibition pigs were detected in humans exposed at fairs in July and August 2018 (https://gis.cdc.gov/grasp/fluview/Novel_Influenza.html). Had this virus caused a severe outbreak or pandemic, our proactive surveillance efforts and vaccine derivation would have provided an approximate 8-week time advantage for vaccine manufacturing.
By returning samples from the field, we were able to confirm the sequencing results with our Illumina sequencing pipeline (10). Without considering a coverage threshold for the Nanopore data, we sequenced 13 full genomes in the field. These same 13 samples were also successfully sequenced in the lab on an Illumina MiSeq platform. By comparing MiSeq-and MinION-produced sequences, we saw that the maximum MinION-derived sequence accuracy is achieved at 10ϫ to 20ϫ mean coverage (Fig. S32 [https://figshare.com/s/b4cc885050283a40dfcd]). If we apply a 20ϫ coverage threshold to the Nanopore data, only 4 genomes were successful in the field but with 8 HA segment sequences passing. Importantly, the HA sequence is the most critical for making vaccine strain determinations. Also, we have higher confidence in consensus sequences from low-coverage viruses that are near identical to other viruses with higher coverage. As we are likely sampling the same transmission chain in real time, we can use the global sequence alignment to manually estimate the true sequence, particularly in fixing the frameshift mutations that are MinION's most common error. In future experiments, we can ensure that coverage thresholds are met by monitoring coverage in real time and more deliberately deciding when to stop sequencing, rather than simply sequencing for a defined 6 h as we did here. Illumina sequencing also allowed us to confirm the accuracy of our in-field-generated consensus sequences (99.3% overall). This level of accuracy confirms the validity of the suite of analyses that we performed in the field with the Nanopore data.
Influenza A viruses (IAVs) are a perpetual threat to global health security, both as human-endemic seasonal viruses and the more insidious pandemic viruses. IAV pandemics result from zoonotic transmission of IAVs to humans followed by onward human-to-human spread in populations lacking sufficient immunity. The swine-human and avian-human interfaces are of keen interest for future in-field surveillance efforts due to the pandemic risk of avian and swine influenza viruses that circulate in these settings (18,19). Mia will be a useful tool to enhance the current centralized surveillance framework and can be deployed into resource-poor settings and operated by two technicians. This technology can serve to bolster IAV surveillance by monitoring important transmission interfaces for emerging viruses that have pandemic potential. Moreover, it might not always be possible to return samples from the field. In such cases, portable Nanopore sequencing and data transmission can take advantage of a distributed sequencing network while maintaining the advantages of a centralized database. This has the potential to greatly expand the reach of the current global influenza virus surveillance network.

Logistics.
Our inventory was finalized after three full practice runs and consisted of three main pieces of luggage to transport Mia (Table S1 and Fig. S43 [https://figshare.com/s/b4cc885050283a40dfcd]). We used a Pelican Air 1615 case (internal dimensions, 75.16 cm length by 39.37 cm width by 23.82 cm depth; Torrance, CA, USA) to transport plastic consumables, ambient temperature liquids, personal protective equipment (PPE), racks, pipettes, and power strips. This is a larger suitcase that was checked during air travel. We used a smaller Pelican 1510 case (internal dimensions, 50.3 cm length by 27.94 cm width by 19.3 cm depth) to transport the electronic equipment, including the MinION systems, thermocyclers, vortex, microcentrifuge, and Qubit fluorometer. While this case is extremely rugged and would likely fare well against the checked baggage handling, we were concerned about damage during luggage inspection and opted to carry this onto the plane. We also carried on the laptop inside a backpack. Our cold chain was maintained during transportation in an Ozark Trail 26-quart certified bear-resistant cooler (internal dimensions, 40.64 cm length by 22.86 cm width by 27.3 cm depth; Potosi, MO, USA). To simultaneously transport materials at -20°C and 4°C, the cooler was partitioned with a 7.6-cm piece of Styrofoam cut to a snug fit. All enzymes and reagents requiring storage at -20°C were stored in a cold block and surrounded with frozen cold packs in the bottom of the cooler. Primers and flow cells were stored at 4°C on top of the Styrofoam and were surrounded by refrigerated cold packs. We maintained the cold chain on-site by storing reagents in a hotel refrigerator/freezer before repacking the cooler to transport the reagents to the exhibition barn. The SuperScript IV reverse transcriptase was shipped to the barn on dry ice at the insistence of the manufacturer. Since ethanol at a concentration over 70% cannot be transported on a commercial airliner, it was shipped via ground transportation. For a workbench, we used a small folding table found at the event and purchased plastic tablecloths, Lysol, and extra garbage bags locally. To avoid disrupting the event, we set up Mia inside a horse stall near the pigs and worked overnight (Fig. S44 [https://figshare.com/s/b4cc885050283a40dfcd]).
Diagnostic screening and sampling. To confirm the presence of IAV and to identify the location of outbreak clusters, we screened 94 swine displaying ILI using the Flu Detect influenza A virus swine rapid tests (Bridgewater, NJ, USA) via nasal swabs. We performed the screening early in the morning to locate IAV-positive swine and returned to the same location for sampling and sequencing later that evening.
We selected 24 swine for respiratory sample collection and sequencing, which included the seven positives from the screen and their immediate neighbors. We sampled these swine via sterile gauze nasal wipes and submersion in 5 ml brain heart infusion (BHI) medium (20,21).
RNA extraction. Immediately after sampling, we extracted RNA from the BHI samples using the Akonni TruTip rapid RNA kit (Frederick, MD, USA), according to the manufacturer's instructions, with minor modifications (8). We extracted the samples in a 96-well deep block using a manual 12-channel 1,000-l pipette. Before lysis, we diluted the 70-l samples with 180 l of water and spiked them with 0.5 g of Qiagen carrier RNA (Venlo, the Netherlands). We eluted the samples in 70 l of water that had been warmed to 75°C.
Amplification, barcoding, library preparation, and sequencing. We amplified the full influenza A virus genome using a fast multisegment reverse transcription PCR (fMRT-PCR). For cDNA synthesis, we used a 5 M forward primer mix that is MBTuni-12 and MBTuni-12.4 combined in a 1:4 ratio to increase the affinity for the polymerase genes. The primer sequences are available in Table S07 (https://figshare .com/s/b4cc885050283a40dfcd). For primer annealing, we combined 10 l of RNA, 1 l forward primer mix, 1 l of 10 mM dinucleoside triphosphates (dNTPs), and 6 l of nuclease-free water. We heated the mixture to 65°C for 5 min and then cooled the mixture on an ice block. For cDNA synthesis, we added 4 l of 5ϫ SuperScript IV (SSIV) buffer, 1 l 100 mM dithiothreitol (DTT), 1 l RNaseOUT RNase inhibitor (40 U/l), and 1 l of SuperScript IV reverse transcriptase to the annealed RNA. We incubated the reaction mixture at 42°C for 10 min, 53°C for 5 min, and 80°C for 10 min.
For full-genome amplification, we used a 10 M mixture of MBTuni-12, MBTuni-12.4, and MBTuni-13 primers in a 2:3:5 ratio. We combined 5 l cDNA, 5 l primer mix, 25 l 2ϫ Q5 polymerase mix (New England Biosciences), and 15 l nuclease-free water. We amplified the full genome by cycling the mixture as follows: 98°C for 30 s; 5 cycles of 98°C for 10 s, 45°C for 15 s, and 72°C for 1.5 min; 30 cycles of 98°C for 10 s, 64°C for 15 s, and 72°C for 1.5 min; and a final extension of 72°C for 5 min. We cleaned the amplicons with 0.5ϫ AMPure beads (Beckman Coulter) and washed them with 80% ethanol before resuspending them in 25 l nuclease-free water.
We used three miniPCR mini8 thermocyclers (Cambridge, MA, USA) under laptop control for elution buffer heating, fMRT-PCR, barcoding, and library preparation. Following barcoding, we quantified the products using a Qubit DNA broad-range/high-sensitivity (BR/HS) assay (Thermo Fisher Scientific, Waltham, MA, USA). The quantification data allowed us to normalize and pool the barcodes and served as our only quality control check during the molecular workflow. Once pooled, we prepared the amplicons for Nanopore sequencing using the SQK-LSK 108 library kit (Oxford Nanopore Technologies, Oxford, UK). This protocol required cooling to 20°C; however, the temperature in the barn was roughly 30°C, and the fan-cooled mini8 thermocyclers could not get below ambient temperature. To solve this issue, we placed the thermocycler on a bed of ice, which effectively lowered the ambient temperature and allowed the mini8 to cool to 20°C. We sequenced the pooled samples on a MinION Mk 1B Nanopore sequencer using the flow cell FAH58363 equipped with R9.5.1 chemistry.
Real-time analyses. For real-time analyses, we used a high-performance Dell Precision 7720 laptop (Round Rock, TX, USA) with 64 Gb of ram, four-core Intel Xeon 3.00 GHz central processing unit (CPU) E3-1505M v6 (2 threads/core), and two 1-Tb solid state hard drives, partitioned separately into a Windows 10 OS and Ubuntu 16.04 OS. Mia's custom analytical application runs on the Ubuntu partition and initiates automatically via crontab upon MinION read file (fast5) origination during active sequencing. We base called and demultiplexed fast5 read files with Albacore v2.2.7 (Oxford Nanopore Technologies, Oxford, UK) and assembled subsequent fastq files into consensus sequences with IRMA v0.6.7 (10). We used a custom MinION configuration module for IRMA that changes the default "FLU" module parameters as follows: dropping the median read Q score filter from 30 to 7, raising the minimum read length from 125 to 150 bases, raising the frequency threshold for insertion and deletion refinement from 0.25 to 0.75 and 0.6 to 0.75, respectively, and lowering the Smith-Waterman mismatch penalty from 5 to 3 and the gap open penalty from 10 to 6. We checked consensus sequences against a local database containing clade-annotated IAV sequences using BLASTn v2.7.1ϩ (22) and used top BLAST matches to interpret a sample's IAV genome constellation. We aligned consensus sequences against reference sequences with MUSCLE v3.8.31 (23). We built phylogenetic trees with FastTree double precision version 2.1.8 (24) with a generalized time-reversible model, branch lengths optimized under a gamma distribution, and local support values produced from bootstrap sampling 10,000 times and visualized the trees in R v3.4.4 with GGtree (25). We determined amino acid differences to each candidate vaccine virus sequence in our database with a custom Python script that translates DNA consensus sequences and MUSCLE aligns the mature HA1 protein sequences. Analytical and processing status results were fed into a SQLite v3.11.0 database (Hwaci, Inc., Charlotte, NC) and visualized in a Web browser with an interactive R Shiny (26) application.
Mia-processed reads in 4,000-read batches for the first 5 iterations, followed by 20,000-read batches for another 5 iterations and then 40,000-read batches until reads are no longer produced by the sequencer. This stepwise batching provides the user an understanding of the run's quality within minutes of sequencing, allowing the user to decide if it is worth continuing the run while maximizing computational resources. Results seen within the first hour of a sequencing track across a 6-or even 48-h run, with the extended run time increasing coverage and resolving a scant number of consensus bases.
Postfield analyses. We transported field-derived RNA and amplicons on dry ice back to the CDC in Atlanta, GA, and processed them with the Influenza Genomics Team's standard Illumina MiSeq influenza surveillance pipeline (San Diego, CA, USA). We processed and assembled reads with IRMA's Flu-sensitive module (10) and calculated the identity of MinION-derived consensus sequences based on alignment to the sample's MiSeq-derived consensus, which is considered the gold standard. Out of 425 swine sampled through our standard surveillance efforts at the exhibition, 136 swine tested positive for IAV by RT-PCR. Complete genome sequencing via our standard Illumina pipeline was attempted on 79 of those and produced 65 full genomes.
We performed phylogenetic analyses with the MiSeq-produced sequences and an annotated set of swine IAV sequences. The annotated data set was constructed by downloading all publicly available swine IAV data collected in North America from the Influenza Research Database (27) on 1 June 2018. Each gene was aligned with MAFFT v7.294b (28), and the best-known maximum likelihood phylogeny for each alignment was inferred using IQ-TREE v1.6.2 (29), with the model of molecular evolution automatically selected during the tree search. Each gene was classified to an evolutionary lineage and/or phylogenetic clade (12,14,17). Subsequently, we selected 3 to 10 viruses from each named clade that captured the evolutionary relationships among clades and that represented the major HA, NA, and whole-genome constellations circulating in U.S. swine (12,17). We then aligned the MiSeq and annotated sequences with MUSCLE v3.8.31 (23) and inferred maximum likelihood trees using IQ-TREE v1.6.8 (29), implementing a generalized time-reversible model of nucleotide substitution with gamma-distributed rate heterogeneity across sites while allowing for a proportion of invariable sites and 1,000 ultrafast bootstrap approximations (30) (iqtree -m GTRϩGϩI -bb 1000).
Ethics approval. This study was approved under IACUC animal use protocol number 2009A0134-R3. Data availability. The raw MinION fast5 read data and MiSeq-derived consensus sequences are deposited at the NCBI under BioProject number PRJNA528211. Reference sequences reside in various public databases under the accession numbers listed in Table S6 (https://figshare.com/s/ b4cc885050283a40dfcd). The pipeline code is available at https://github.com/CDCgov/Mia_publication.