Cooperating H3N2 Influenza Virus Variants Are Not Detectable in Primary Clinical Samples

Viruses mutate rapidly, and recent studies of RNA viruses have shown that related viral variants can sometimes cooperate to improve each other’s growth. We previously described two variants of H3N2 influenza virus that cooperate in cell culture. The mutation responsible for cooperation is often observed when human samples of influenza virus are grown in the lab before sequencing, but it is unclear whether the mutation also exists in human infections or is exclusively the result of lab passage. We identified nine human isolates of influenza virus that had developed the cooperating mutation after being grown in the lab and performed highly sensitive deep sequencing of the unpassaged clinical samples to determine whether the mutation existed in the original human infections. We found no evidence of the cooperating mutation in the unpassaged samples, suggesting that the cooperation arises primarily under laboratory conditions.

generated through spontaneous mutation are important for neurotropism, innate immune suppression, and overall pathogenesis in mouse models (1)(2)(3). Other groups have identified cooperative interactions in measles virus (4), West Nile virus (5), hepatitis B virus (6), and coxsackievirus (7). These cooperative interactions have been observed primarily in cell culture or animal models rather than clinical infections.
We previously described two distinct variants of H3N2 influenza virus that cooperate in cell culture (8). The two variants differ by a single mutation at amino acid 151 of neuraminidase (NA), the protein that releases new virions from host cells. The D151 viral variant, typically encoded as GAT, predominates among clinical influenza virus samples, and it grows robustly in cell culture. The G151 viral variant, typically encoded as GGT, binds sialic acid receptors rather than cleaving them (9, 10) and grows extremely poorly in isolation. However, a mixed population of D151 and G151 viral variants outgrows either single variant in cell culture.
An important question is whether cooperation between these two viral variants is purely a cell culture phenomenon or whether the D151 and G151 variants coexist in natural infections. The D151G mutation is frequently observed when influenza virus is passaged through cell culture (9,(11)(12)(13)(14)(15)(16), but it remains unclear whether the G151 variant exists within natural human infections or is primarily a cell culture artifact. Prior groups that have performed matched clinical sequencing of unpassaged and passaged clinical samples have failed to detect the G151 variant before passaging (13,15), but those studies have used methods like Sanger sequencing and pyrosequencing, which are relatively insensitive to rare variation. More sensitive characterization of clinical samples that give rise to D151G upon lab passage can determine whether this mutation reaches high frequencies in cell culture because it is amplified from low-to modestfrequency standing diversity or whether it arises spontaneously in the lab.
We sought to determine whether the D151G mutation is present in viral populations isolated from natural human infections. We identified nine clinical samples that, on the basis of prior Sanger sequencing, consisted of a mixture of D151 and G151 viruses after passage in cell culture. We deep sequenced the original unpassaged nasal swab samples to survey the variation present prior to laboratory growth. The D151G mutation did not exceed the frequency of library preparation and sequencing errors in any of these samples. These results suggest that most of the variation observed at site 151 results from passage in cell culture rather than standing variation in human infections.

RESULTS
Most influenza virus sequences in public databases are determined by Sanger sequencing of clinical isolates that have been passaged one or more times in cell culture (17). A substantial number of recent human H3N2 influenza virus sequences in these databases contain an ambiguous nucleotide at NA site 151 because the labpassaged samples often converge to a mixture of the D151 and G151 variants (8). We compared passaged samples that contain this ambiguous nucleotide at site 151 with unpassaged samples from the same viral infections. We first identified strains from western Washington State in the Global Initiative on Sharing All Influenza Data (GISAID) EpiFlu database (18) for which Sanger sequencing had reported an ambiguous nucleotide at NA site 151 corresponding to a mixture of the D151 and G151 variants (8). On the basis of the annotations available in the GISAID EpiFlu database, most of these strains had been passaged in cell culture prior to Sanger sequencing.
We obtained original, unpassaged nasal swab samples of the nine strains in Table 1 that contained a mixture of D151 and G151 variants after passage in cell culture. These samples had been collected between 2013 and 2015 and had undergone one to three passages in cell culture prior to sequencing. We performed whole-genome sequencing of the influenza virus genome from the unpassaged clinical samples by influenza virus-specific reverse transcription and PCR (19). For each sample, we prepared sequencing libraries in duplicate, beginning from separate reverse transcription reaction mixtures (20). We sequenced each viral sample to an average depth of 100ϫ to 10,000ϫ ( Fig. 1), which allowed us to observe viral variants at frequencies below the limit of detection by Sanger sequencing or pyrosequencing.
We identified all of the minor viral variants present at a frequency of at least 3% in the viral genome in both library replicates ( Table 2). We did not observe the D151G variant in any of the nine clinical samples under these variant-calling criteria. To ensure that we were not missing extremely low-frequency variation, we calculated the frequency of D151G in each clinical sample on the basis of the frequency of G-to-A mutations at the second nucleotide position of NA site 151. We compared this frequency to the frequency of G-to-A mutations at other sites across the genome (Fig. 2). Minor-variant frequencies at NA site 151 fell well within the range of error expected through library preparation and sequencing errors. Therefore, we conclude that the D151G variant was not present at appreciable frequencies in the original clinical infections. Instead, the mutation must have arisen de novo or been enriched from an extremely low frequency during passage in cell culture.

DISCUSSION
The results of our deep-sequencing study support prior studies that failed to detect the D151G mutation in unpassaged clinical samples by Sanger sequencing or pyrosequencing (13,15). In the GISAID EpiFlu database, mixed populations of D151 and G151 viral variants are common in clinical samples that have been passaged in cell culture, but these mixed populations are rare among unpassaged and egg-passaged populations (8). It is impossible to rule out the possibility that the D151G mutation reaches appreciable frequencies in some natural human infections, but strong and repeated selection for cooperation in cell culture seems to account for its prevalence among sequences in public databases.
It is interesting to speculate about what biological factors might cause a variant that is rare in natural human infections to be strongly selected in cell culture. Influenza virus strains often acquire stereotypical mutations when they are grown in eggs (21,22), but these passage adaptations appear to be less common in cell culture, particularly in MDCK-SIAT1 cells (17,23). Nevertheless, differences in the types and distributions of cell surface receptors between MDCK-SIAT1 cells and human airways could account for some of the differences in genotypes we observe at NA site 151.
We also previously observed that cooperation is stronger at high multiplicities of infection (MOIs) (8). Viral loads can be large during natural infections (Table 1), but recent studies of natural human infections have found that the effective reassortment rate is limited, suggesting that spatial heterogeneity within the host may limit viral circulation and coinfection (24). Moreover, human influenza virus infections, as well as those in animal models (25), experience a severe transmission bottleneck that greatly limits the genetic diversity initially present in an infection (26)(27)(28). In contrast, viral populations can rapidly reach high MOIs in cell culture (29). These different growth conditions may also promote the emergence of D151G within cell culture, but natural infections may not. Our study also underscores the importance of sequencing directly from unpassaged clinical samples. Mutations like D151G accumulate in cell culture within just a few passages and affect downstream analyses like inferences of positive selection (17). Careful records of passage histories combined with deep sequencing of unpassaged clinical samples can help distinguish natural variation from that generated in the lab.

Viral samples.
We downloaded the set of 66 sequences in the GISAID EpiFlu database (18) corresponding to all of the full-length NA coding regions from human H3N2 influenza A virus isolates collected from 1 January 2000 to 26 August 2015 and submitted from Seattle, WA, or Shoreline, WA (see Table S1 in the supplemental material). We aligned each sequence pairwise with the A/Hanoi/Q118/2007 (H3N2) coding sequence (GenBank accession number CY104446) by using the program needle from EMBOSS version 6.6.0 (30). For each sequence, we determined the genotype at site 151 and designated the genotype X if there was an ambiguous nucleotide at that site. We identified sequences with ambiguous identities at site 151, suggesting the presence of mixed viral populations, and we extracted Typically, the D151 viral variant is encoded by the nucleotides GAT, and the G151 variant is encoded by GGT, meaning that D151G arises as the result of an A-to-G mutation. The red vertical line shows the proportion of A-to-G mutations at codon position 2 of amino acid site 151 of NA, which corresponds to the frequency of D151G. In cases where no A-to-G mutations were identified at this site, this red line is not shown. At each nucleotide site in the genome with consensus identity A, we calculated the total proportion of reads reporting an identity of G at that site and averaged this proportion between the two replicate libraries. As expected, A-to-G mutations make up Ͻ0.1% of the total sequencing reads at most sites in the genome and are probably errors introduced through library preparation and sequencing.
passage histories based on the metadata available in the GISAID EpiFlu database. For the nine strains described in Table 1, we were able to obtain aliquots of the original, unpassaged nasal swab samples in viral transport medium. Viral deep sequencing. We performed viral deep sequencing as previously described (19). In brief, we extracted viral RNA from unpassaged clinical samples with the QIAamp Viral RNA Minikit (Qiagen) in accordance with the manufacturer's instructions. We reverse transcribed the viral RNA with Superscript III First-Strand Reaction Mix (Thermo Fisher) and an equimolar mixture of influenza virus-specific primers 5= TATTGGTCTCAGGGAGCAAAAGCAGG 3= and 5= TATTGGTCTCAGGGAGCGAAAGCAGG 3=, which both bind to the conserved U12 region at one end of each influenza virus gene. The two primers differ by a single nucleotide to account for a known polymorphism in the region. We incubated the reverse transcription reaction mixtures at 25°C for 10 min (to help the short primer anneal), 50°C for 50 min, and 85°C for 5 min. We amplified the influenza virus genome with a mixture of 24 primers that bind to the ends of each influenza virus gene (31). For each gene, one primer binds to the conserved U13 region at one end of the gene and two primers bind to the conserved U12 region at the other end of the gene, allowing for the known polymorphism in the U12 region. We performed 35 cycles of PCR with an annealing temperature of 55°C and an extension time of 3 min. We purified the PCR product with 1ϫ AMPure beads (Beckman Coulter, Inc.) and prepared libraries for Illumina sequencing by Nextera XT (Illumina) tagmentation. We sequenced the libraries on a NextSeq 500 platform (Illumina) with 150-bp paired-end reads. We performed all library preparation and sequencing in duplicate, starting from independent reverse transcription reaction mixtures (20).
Analysis of deep-sequencing data. We first used Bowtie2 (32) to filter out reads that mapped to the human genome. Remaining reads are available in the SRA as BioProject PRJNA412675. We trimmed adapters from the raw reads with cutadapt version 1.8.3 (33). We first aligned the reads with the A/Victoria/361/2011 genome by using Bowtie2 and the --very-sensitive setting, then we used custom scripts to generate a new consensus genome sequence for each viral sample. We then realigned the reads with the corresponding consensus sequence and removed PCR duplicates with Picard version 1.43. We used custom scripts to filter out base calls with a quality score below 20, tally the total number of high-quality bases at each genome position, and annotate each variant's codon position. We performed these initial analyses separately for each replicate library. We reported only variants that were located in a protein-coding sequence.
A note on codon numbering and gene annotation. We numbered HA codons in accordance with the H3 numbering system. This HA numbering scheme assigns position 1 to codon 17 of the full HA-encoding gene, which is the beginning of the mature HA protein. The codons for all other genes are numbered sequentially, beginning with 1 at the N-terminal methionine. The M1 and M2 genes have 27 bp of in-frame overlap and 44 bp of out-of-frame overlap, and the NS1 and NEP genes have 30 bp of in-frame overlap and 251 bp of out-of-frame overlap. We annotated variants separately for each gene if they occurred in these overlap regions.
Availability of data. Sequencing reads are available in the Sequence Read Archive as BioProject PRJNA412675. The computer code that performs the analyses is available at GitHub (https://github.com/ ksxue/D151G-clinical-public).

ACKNOWLEDGMENTS
This study was funded by R01GM102198 from the NIGMS, R01AI127893 from the NIAID, a Faculty Scholar award from the Howard Hughes Medical Institute and the Simons Foundation to J.D.B. K.S.X. was funded by an NSF Graduate Research Fellowship under grant DGE-1256082 and a graduate fellowship from the Fannie and John Hertz Foundation. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.