Deep Sequencing of H7N9 Influenza A Viruses from 16 Infected Patients from 2013 to 2015 in Shanghai Reveals Genetic Diversity and Antigenic Drift

H7N9 subtype avian influenza viruses caused infections in over 1,400 humans from 2013 to 2017 and resulted in almost 600 deaths. It is important to understand how avian influenza viruses infect and cause disease in humans and to assess their potential for efficient person-to-person transmission. In this study, we used deep sequencing of primary clinical material to assess the evolution and potential for human adaptation of H7N9 influenza viruses.

I nfluenza A virus (IAV) infections are a major public health concern, including annual epidemics, epizootic outbreaks, and occasional pandemics (1). Historically, there have been at least 14 influenza pandemics in the last 500 years (2) and 4 in the last 100 years, including the "Spanish" influenza A/H1N1 pandemic in 1918 that resulted in at least 50 million deaths worldwide (3) and the most recent one, in 2009, caused by a novel swine-origin influenza A/H1N1 strain which may have resulted in up to 284,000 deaths globally (4).
Influenza epizootic outbreaks, in which animal-derived IAV infections cause localized outbreaks in the absence of sustained human-to-human transmission, can cause high levels of morbidity and mortality. Such epizootic outbreaks may also presage the emergence of a stably human-adapted IAV strain capable of causing a pandemic. One of the most significant IAV epizootic outbreaks of the past decade is the H7N9 avian China, examination of the viral quasispecies revealed several hemagglutinin nonsynonymous mutations shared with the antigenically variant H7N9 clades identified in the most recent waves of H7N9 infections in 2016 to 2017 (6,8). In addition, the deepsequencing technology revealed a number of single-nucleotide polymorphisms (SNPs) in our samples that have not been previously reported in humans.

RESULTS
Patient information. In this study, throat swabs from 20 reverse transcription-PCR (RT-PCR)-confirmed H7N9-infected patients who were hospitalized in Shanghai, China, from 2013 to 2015 were processed. Available clinical information from the patients in the study, including symptom onset, hospitalization and sampling dates, antiviral/ antibiotics usage, and patient outcome, is shown in Table 1. Among the cases, 7 were infected during wave 1 (6)(7)(8), with hospitalization admissions from 3 April to 16 April 2013. There were 2 deaths in this cohort, with a case fatality rate (CFR) of 28.6%. There were 9 cases during wave 2, with hospitalization admissions from 30 December 2013 to 20 January 2014. There were 3 cases for which outcome information was not available, while the remaining 6 patients all died; thus, for this cohort, the CFR was Ն66.7%. There were 4 cases in wave 3, with hospitalization admissions from 17 January to 3 April 2015. There were 3 deaths in this cohort, with a CFR of 75.0%. Overall, the CFR in these hospitalized H7N9-infected patients was Ն55%.
Sequence analysis of positive-and negative-control experiments. Prior to experimental sequence analysis of the patient samples, positive-and negative-control experiments were designed and performed to evaluate the WTA/enrichment method. The sequencing library for the positive control was made from RNA that had been reverse transcribed from the hemagglutinin (HA) open reading frame (KC853766.1) from A/Hangzhou/1/2013 (H7N9) cloned into pBluescript II KS(-). The transcribed HA RNA was processed by the use of the same pipeline, including linear amplification, H7N9 probe enrichment, sequencing, and analysis. Because of the high sensitivity of high-throughput sequencing technology, we also performed a negative-control experiment to determine a baseline of influenza virus contamination in the construction of the libraries. For the negative control, a sequencing library was made from HeLa cell total RNA following the positive-control experiment using the same process pipeline.
From the positive-control HA RNA library, we obtained 355,521,326 reads; 342,112,470 (96.23%) of them aligned to the H7N9 HA reference segment. Similarly,  Table 2). This provided a baseline of the noise in the library construction and sequencing pipeline. It has been reported that the error rate of the Illumina platform generally increases toward the end of the read, reaching as high as 5% (24). Therefore, to reduce the potential false-positive SNP calling rate, we decided to use 10% as the SNP cutoff level for this study. We obtained 292,901,554 total reads from the negative-control experiment, and there were 3,815 (0.0013%) reads aligned to H7N9 consensus sequences. Therefore, we used this number of sequence reads as an approximate baseline for the sequence contamination level for this study (see below). H7N9 viral cDNA library sequence analysis. Illumina deep sequencing of enriched cDNA libraries was performed on samples from 20 patients that were H7N9 RT-PCR positive. The summary of the sequencing results is shown in Table 3, and data corresponding to the average sequencing depth of each segment of the genome from all the samples are shown in Table 4. Between 42 million and 324 million reads (reflecting the sum of reads produced in both directions) were generated from the 20 patient sample libraries. After mapping to the reference human H7N9 consensus sequence from an isolate collected in China in 2013, the total number of mapped reads ranged from 3,245 reads to over 178 million reads among the patient sample libraries. Because there were 3,815 mapped reads in our negative-control experiment, samples 10 (number of mapped reads, 10,693), 11 (number of mapped reads, 3,534), 12 (number of mapped reads, 3,830), and 18 (number of mapped reads, 3,245) were eliminated from further SNP analysis to avoid potential false-positive results. Therefore, compared to the human 2013 consensus H7N9 sequence, a total of 509 SNPs were detected from these 16 remaining patient sample libraries, with 454 unique SNP positions identified (Fig. 1A). Synonymous and nonsynonymous SNPs that passed the filter criteria utilized (described in Materials and Methods) from each patient are shown in Table 3 (see also  Table S1 in the supplemental material).
Comparison of SNPs detected in this study and SNPs detected from previously sequenced H7N9 human isolates. The precomputed SNP data from human H7N9 IAVs were downloaded on 6 March 2017 from the Influenza Research Database (IRD) (25) with different SNP scores, which were generated from the modified formula as described by Crooks et al. (26). Briefly, the scores represent the normalized entropy of the observed allele distribution at each position. The least polymorphic site would have a score of 0 (no polymorphism), and the most polymorphic site would have a score of 200 (highest polymorphism). Compared to the SNPs from the IRD with a score of Ͼ1, among the 454 SNPs identified in the current study, there were 276 previously reported in the IRD. At the level of an SNP score of Ͼ10, only 195 SNPs were reported by IRD (Table S1). In addition, at the level of an IRD SNP score of Ͼ1, among the same 276 SNP positions, 255 SNP calls were the same as those observed in the current study. There were 178 SNP positions with 106 nonsynonymous changes in human H7N9 that have not been reported at the IRD.
Comparison of H7N9 viral genome SNPs between different patients. Among the 16 remaining sequenced and analyzed patient samples, a total of 509 SNPs that passed   filter criteria were detected. The number of SNP locations observed in each sample was quite variable, ranging from 3 SNPs in sample 9 to 121 in sample 17 (Table 3). The correlation (R) between the number of mapped reads and the called SNP number was low (R ϭ 0.13) ( Fig. 1A and B). For example, sample 3 had more than 45 million mapped reads but only 11 SNPs. On the other hand, the actual numbers of reads containing the SNPs at the SNP positions were better correlated to the depth at those positions (average R ϭ 0.91) (Table S2). When the coverage of the H7N9 genome was examined, the correlation (R) of numbers of identified SNP locations in samples and average genome coverages was 0.70 ( Fig. 1C). Table 3 suggests that the differences between samples in the numbers of called SNPs were not primarily due to differences in read depth or coverage between samples. Comparing H7N9 sequences from the 16 patients to the reference genome of the consensus sequence of all human H7N9 strains in China in 2013, 509 SNPs, representing 454 unique nucleotide positions, were found. Among them, there were 120 SNP positions shared in two or more samples. Selected amino acid substitutions with known or suspected functional significance with respect to the IAV H7N9 strains observed during the epizootic waves in China (7,8,11) from the 16 patient samples analyzed here are listed in Table 5.
Hemagglutinin gene sequences and SNPs from the H7N9 patient samples. As shown in Table 5, a number of nonsynonymous SNPs in the hemagglutinin (HA) gene were observed in the patient samples compared to consensus sequences created from all available epizootic H7N9 virus sequences isolated from birds in China in 2013, 2014, and 2015 or compared to all available H7N9 sequences isolated from spillover infections in humans in China in 2013, 2014, and 2015. The 2013 avian H7N9 and human H7N9 consensus HA sequences differed only at codon 235 (aligning to H3 HA codon 226), and this codon forms a part of the HA receptor binding domain (27,28), with the human consensus sequences from 2013, 2014, and 2015 containing a Q235L mutation. In addition, the avian consensus sequences from 2014 and 2015 also contain a Q235L mutation, which indicates the evolution of this site in avian hosts as well. Among the patient samples, 7 cases (cases 1, 2, 4, 13, 15, 16, and 17) had a Q235L mutation (Table 5;  see also Table S1) and 4 cases (cases 5, 6, 19, and 20) had coverage of fewer than 100 reads at this codon, but all of them included Q235L, and the remaining samples had no coverage at HA codon 235. Structural and glycan array analyses have demonstrated that 235L enhances H7 HA binding to human-type ␣2-6 sialic acids (27,28), while in silico modeling has suggested enhanced binding with Q235I (29). A recent study reported that 3 mutations, H7 HA V195K (equivalent to H3 HA V186K), K202T (H3 K193T), and G237S (H3 G228S), in the receptor binding domain of H7N9 viruses switched specificity from avian-type ␣2-3 to human-type ␣2-6 sialic acids (30). In the patient samples, all matched the consensus, V195, K202, and G237.
Many additional HA SNPs were also seen in this patient cohort (Table 5). To evaluate the relationship between our samples and other H7N9 sequences in China, we constructed approximately maximum-likelihood phylogenetic trees based on 417 unique human HA sequences from isolates collected in China from 2013 to 2017 together with the consensus sequences (with SNP calls at bases over 50%) from the current study (Fig. 2). Previous studies demonstrated evolutionary diversification of H7N9 HA sequences (7), and with the emergence of the fourth and fifth waves (6,8), two major lineages have been observed and defined as Yangtze River Delta and Pearl River Delta lineages. Yangtze River Delta lineage isolates have demonstrated reduced crossreactivity with previously produced candidate vaccine strain H7N9 viruses (6). From the HA trees, it is noted that the sample sequences diversified into different clades, including the Yangtze River Delta lineage (sample 17), despite the fact that the patients were all local residents, treated at only one hospital in Shanghai, with the latest sample from April 2015.
Notable here is the HA sequence from sample 17, representing a wave 3 case from January 2015, which is phylogenetically very similar to the antigenically variant Yangtze River Delta H7N9 sequences isolated in 2016 to 2017 in waves 4 and 5 (8). SNPs identified in sample 17 were notable for the following amino acid changes on the HA head (Table 5): S136N, A143V, R148K, and L186I. These amino acids are highlighted in the head region of the crystal structure of A/Anhui/1/2013 (H7N9) (4R8W) (Fig. 3). While antigenic epitopes on H7 subtype HAs have not been mapped, a computational  analysis of H7 HA identified several putative epitopes on the HA head. The first three mutations, S136N, A143V, R148K, are near the T130A (H7 numbering) identified in epitope A (31), while the fourth site, L186I, is between the D183S and I188V changes identified in epitopes C and D, respectively. Computational modeling also identified HA receptor binding mutation Q235L/I as an antigenic epitope (epitope D). Interestingly, these amino acid changes are seen in 2016 to 2017 Yangtze River Delta lineage sequences and are likely to contribute to the antigenic evolution observed in these H7N9 strains compared to earlier human 2013 isolates (6). Changes at H7 HA codons 394 and 396 have been detected in 8 patient samples (samples 8,13,14,15,16,17,19,20), which were located in the first alpha helix of the HA2 domain. The E396A mutation, in particular, was also observed in more recent Yangtze River Delta lineage isolates, and modeling done here suggests that this change may affect binding of broadly neutralizing stalk antibodies (Fig. 4).
The remaining viral gene sequences and SNPs from the H7N9 patient samples. A number of nonsynonymous SNPs were seen in the H7N9 sample neuraminidase (NA) sequences (Table 5), including a R148K change in the catalytic site in sample 1 (equivalent to R152K in N2 subtype NA) (32). At least 15 of the 20 patients were treated with antiviral drugs, primarily oseltamivir (Table 1); however, only one sample (sample 16) showed a change associated with neuraminidase inhibitor (NAI) resistance in these samples (33), which was the most commonly observed resistance mutation, R289K (R292K in N2 subtype NA), with this SNP present at 41% ( Table 5). The conserved N9 amino acids associated with the NA sialic acid-binding site (hemabsorption domain) did not differ in the samples studied (34).
SNPs were also observed in the H7N9 polymerase proteins. In PB2, 21 nonsynonymous SNPs were observed in the samples that differed from the reference sequence, including sample 1, which had a K627E change. It has been reported that 627K is associated with human adaptation of avian-origin PB2 (35) and that it was observed in the 1918 pandemic (36) and in some highly pathogenic H5N1 human isolates (37). The human H7N9 PB2 consensus sequence differs from the avian H7N9 consensus sequence at this site as well ( Table 5). None of the samples differed at sites 701 and 702, which have been reported to be associated with mammalian adaptation (38). One change was observed in PB1 in sample 9, and 3 changes were observed in samples 6, 7, 8, and 13 in polymerase acidic protein (PA). PA codon 100 is typically a valine in avian IAV sequences, and a V100A change is associated with human adaptation (36). Of the 2013 avian and human H7N9 consensus sequences constructed here, both correspond to V100, and no nonsynonymous SNPs were observed at this codon in all the samples sequenced here. SNPs associated with nonstructural protein PA-X (39) were observed in the samples, but the biological significance of these changes remains unclear. Several other samples had nonsynonymous SNPs in M1, M2, and NS1 (Table 5).
To determine which of the nonsynonymous SNPs could be associated with human adaptation and which were observed during two or more H7N9 epidemic waves, Venn analysis was performed on the translated proteins from nonsynonymous SNPs identified in Table 5 (see also Table S1) compared to the 2013 avian H7N9 protein consensus sequences. As shown in Fig. 5, 15% of H7N9 amino acid changes were identified only in wave 1, 36% were identified only in wave 2, and 28% were unique to wave 3. The populations of amino acid changes identified in only a single wave were dominated by sequences identified at a frequency of Ͻ10% in previous human H7N9 cases (black text in the figure), novel human mutations (green text), and one sequence, PB2 K627E (blue text), that had been previously identified in Ͼ10% of sequences in the Influenza Research Database as of 19 June 2017. Interestingly, only a single SNP (M2 P10L) was identified in both waves 1 and 2, while 3 SNPs (8% of the total) were present in waves 2 and 3, two of which represented changes in the C-terminal domain of host response regulatory protein PA-X. Finally, 4 SNPs (10% of the total) were identified in all three waves and included changes in HA receptor binding (Q235L), the HA cytoplasmic tail (N551S and G552R), and viral polymerase subunit PB2 (M535L). Of these changes, HA N551S and HA G552R were previously identified in H7N9 human sequences at low (Ͻ10%) frequencies, while PB2 M535L was previously identified in Ն10% of human H7N9 sequences. Significantly, the mutation Q235L in the HA receptor binding domain was present in all three waves and represented a change toward the human consensus sequence (denoted with red text in Fig. 5).
Evidence for reassortment in the H7N9 patient samples. Phylogenetic analyses were performed by constructing gene segment trees using the approximately maximum-likelihood method (40). Trees were inferred using the compiled H7N9 viral gene segment consensus sequences (proportion of observed SNPs, more than 50%) from the 16 patient samples along with 417 H7N9 viral genomic sequences downloaded from GISAID (http://platform.gisaid.org/epi3/frontend#5277b0). The eight trees are shown in Fig. S1 in the supplemental material. Most trees showed two major clades, often with two subclades in each major clade. Taxa were colored red in the figure for viruses possessing a Yangtze River Delta clade hemagglutinin and blue for viruses possessing a Pearl River Delta clade hemagglutinin. Evidence of gene segment reassortment is apparent among the 8 trees; for example, several viruses with Pearl River Delta clade HA genes (colored in blue) are contained in the Yangtze River Delta clade (colored in red) of the neuraminidase tree. Following the taxa within the trees, it is possible to observe frequent reassortment within each gene segment, as previously reported (41).

DISCUSSION
Human infections with avian IAV H7 subtype viruses are rare but have been reported before and are usually associated with exposure to poultry (42)(43)(44). Most of the previously reported cases of H7 IAV infection in humans have caused only mild or moderate illness (43)(44)(45), except for one fatal case resulting from a highly pathogenic avian H7N7 IAV infection in the Netherlands (46). In contrast, the current avian IAV H7N9 epizootic outbreak in China has been much more severe, with more than 90% of the confirmed H7N9 patients being hospitalized with pneumonia or respiratory failure and with a CFR of 39%. This suggests that the 2013 lineage avian H7N9 virus may be more pathogenic in humans than the H7 strains responsible for past IAV epizootic outbreaks (47). Human H7N9 virus infections have also exhibited age-specific and sex-specific morbidity and mortality patterns. The median age of the confirmed H7N9 patients was 58 years. A very few H7N9 cases have occurred in children, teenagers, and young adults. Further, most of the severe and fatal cases occurred in middle-aged and older men rather than in women (48). However, females in their reproductive years have displayed an increased tendency to die of H7N9 influenza in comparison to males (49). In this patient cohort, the observed CFR was high, with an overall fatality rate of at least 55% (some patient outcomes were not available); however, no correlation could be made between outcome and the SNPs identified.
In this study, all 20 patient samples were sequenced and mapped to the H7N9 reference genome. On the basis of the negative-control results with respect to reads mapped to H7N9 genome, 4 samples were eliminated from further analysis. Among the remaining 16 samples, the total numbers of mapped reads ranged from 105,000 to over 178 million and the range of H7N9 genome coverage was 57.7% to 99.5% (Table 3). The differences in the mapped read numbers and coverage percentages depended on the viral RNA concentration and the quality of viral RNA in the different samples. Viral RNA degradation during sample preservation and/or library preparation would also likely decrease coverage.
Previous studies have demonstrated that human host factors are likely important in determining outcome during influenza virus infections. For example, more than 200 host factors are required for influenza virus replication as demonstrated by genomewide RNA interference screening (50,51). Other studies showed that interferoninducible transmembrane 3 (IFITM3) may be critical for defending the host against IAV in vivo (52,53). Even administration of influenza vaccinations at different times of day has been shown to affect the development of host antibody responses (54). A recent study of whole-genome exon capture and sequencing of 18 H7N9 patients identified 21 genes that were highly associated with H7N9 influenza virus infection, some of which may be associated with H7N9 influenza susceptibility (55). Thus, different hosts, different health conditions, and even different times of infection could affect the evolution of IAV.
In addition, when the SNPs identified in the current study were compared to the precomputed SNPs called from human H7N9 sequences in the Influenza Research Database (25), we found that 39.2% (178/454) of the SNPs from the samples had not been previously reported. One reason for this could be that deep-sequencing technol-ogy was applied to direct sequencing of virus from clinical samples here, which allowed investigation of infected viral populations, including identification of quasispecies. These methods are now becoming more common. Historically, the majority of viral sequences in IRD or GenBank were likely initially cultured and then subjected to PCR amplification and Sanger sequencing to generate consensus genome sequences, some of which may have also acquired culture adaptations (56).
As seen in other human H7N9 isolates, the HA sequences determined in this study showed the change in the receptor binding domain, Q235L/I (H3 Q226L/I) (28,29,57,58), that has been reported to enhance binding to human-type ␣2-6 sialic acids (59). Among all the 16 samples examined here, 11 samples with read coverages possessed Q235L, whereas, among the 417 unique HA sequences downloaded from GISAID, only 9 strains possessed Q235I. It will be interesting to see if this mutation is more commonly observed in future human H7N9 isolates. Of the 3 mutations recently modeled in vitro to switch HA receptor specificity to ␣2-6 sialic acids (30), none were observed in this study.
Sample 17 possessed several nonsynonymous SNPs on the HA head domain that are shared with antigenically variant Yangtze River Delta lineage viruses isolated in waves 4 and 5 in 2016 to 2017 (Fig. 3) and likely reflect some changes related to the antigenic evolution in these H7N9 viruses compared to earlier (2013) isolates (6). Thus, compared to conventional sequencing methods, deep-sequencing strategies may aid in the identification of variant viruses with potentially significant changes associated with human adaptation or with antigenicity acquired more readily and earlier. Since there is no evidence of efficient human-to-human transmission or significant population immunity against H7N9 in humans, antigenic drift, as observed in sample 17 and related Yangtze River Delta lineage viruses, is likely occurring in the avian reservoir and may reflect widespread use of IAV vaccines in poultry in China (60).
Another nonsynonymous HA mutation identified in the samples is E396A in the HA stalk region. This mutation was observed at a rate of over 98% in samples 17, 19, and 20. Structural modeling based on the cocrystal structure of the H7 HA with the stalk neutralizing antibody CT149 (61) supports the idea that the E396A mutation could affect HA stalk antibody binding in this region (Fig. 4). Among the 417 unique HA sequences downloaded from GISAID, 171 possessed an A396; 60 of those strains were from the year 2017, 43 from 2016, 50 from 2015, and 18 from 2014. This might indicate an antigenic drift trend, but additional experiments are needed to evaluate whether this change would affect broadly neutralizing stalk antibodies.
In this study, 15 of our 20 patients were treated with oseltamivir (Table 1). It has been reported that neuraminidase mutations E119V, I222V, D197N/E, and R292K (N2 numbering) have been commonly detected in H3N2 and type B viruses (62-66) after neuraminidase inhibitor (NAI) treatment. Moreover, corresponding mutations E119V, I222K, and R292K in H7N9 reduced the inhibitory activity of oseltamivir in ferrets (67). In our samples, only R289K (N2 R292K) was detected in patient 16 (59% R and 41% K) ( Table 5), but the antiviral treatment status of this patient is unknown. The samples from 3 patients (patients 14, 19, and 20) did not have adequate read coverage at this position, but the remaining samples were all R289 despite most of the patients (except patients 10, 11, 13, and 14, who had incomplete medical records) having received oseltamivir treatment (Table 1). Even lowering the SNP call stringency to a 5% difference compared to the reference, no additional oseltamivir resistance mutations were found in the patient samples. The explanation could be the sampling took place in each case on or right after the dates of the antiviral treatments in all our patients, except patient 12, whose sampling date was 8 days after the antiviral treatment date.
It has been shown that some amino acid changes in the viral polymerase complex of IAV are very important for mammalian adaptation. For example, PB2 E627K and D701N changes are critical for mammalian adaptation (35-37, 68, 69) and the changes could increase polymerase activity and viral replication in mammalian cells and pathogenicity of H7N9 viruses in mice (70)(71)(72). It was also found that E627K and D701N mutations in H7N9 could readily occur during transmission of the virus among ferrets via direct physical contact (73). Among our samples, only sample 1 displayed a mixture of E627K mutations (68% E, 32% K). For the PB2 701 site, the samples from all patients with adequate coverage at this site possessed D701.
Interestingly, among the samples, 4 of the key SNPs were present in patients from all 3 waves (Fig. 5), including changes in the viral polymerase subunit PB2 (M535L). Other than being mapped to T cell epitopes, it is not known whether these observed PB2 changes or the change in NP would enhance replication or transmissibility in mammals. Further studies are needed to understand the significance of these changes.
Critical for prepandemic surveillance efforts is timely recognition of the emergence of mutations that indicate the appearance of a viral genotype with an increased capacity for forward transmission. In addition to the polymerase mutations that facilitate mammalian viral replication, such as PB2 E627K (35,74), as well as the changes in the HA receptor binding domain that increase the affinity for ␣2-6 sialic acids of the human upper respiratory tract (75,76), our analysis showed that the PB2 M535L change was identified in multiple patients and was present during all three H7N9 waves between 2013 and 2015. Additionally, changes in the HA cytoplasmic tail (N551S and G552R) were also identified in these three successive waves. The cytoplasmic tail of HA is a site for cysteine acylation (77,78) and plays an important role during infection by modulating membrane fusion activity during viral assembly and budding (79)(80)(81). While it is unclear what role N551S and G552R play in infection, it is tempting to speculate that these changes could be associated with increased viral budding in humans either through acylation-dependent or -independent processes.
To our knowledge, this is the first report of deep-sequencing analysis of H7N9 IAV directly from clinical samples of patients from three consecutive years without initial viral culture or influenza virus-specific RT-PCR analyses. Applying whole-transcriptome amplification methods, specific enrichment for H7N9 influenza reads, and deepsequencing methods, we were able to both investigate the consensus sequences of the viral genome and identify low-abundance SNPs. A possible advantage of the methods described in this report, including enrichment of viral RNA by specific probe hybridization, is the ability to obtain sequences from samples with low amounts of or degraded viral RNA. Although the samples analyzed in the present study were all from patients treated at a single hospital in Shanghai, SNP analyses showed that the viruses contained signatures of different evolving H7N9 clusters (6-8), including a virus isolate from a patient infected in 2015 with likely antigenic changes to the HA that are shared with more recently isolated Yangtze River Delta lineage isolates. In addition, we identified 178 SNPs with 106 nonsynonymous changes in H7N9 that have not been reported in the Influenza Research Database. However, further investigation is still needed to determine the function of these newly identified SNPs with respect to virus replication, host adaptation, transmission, or pathogenesis.

Patient information and sample collection.
The samples from 20 patients infected by avian influenza virus H7N9 studied here were collected in Shanghai, China, between 2013 and 2015. All 20 patients were diagnosed with severe community-acquired pneumonia (CAP). Throat swabs were collected from each patient during the inpatient stay and stored in viral transport medium (VTM) (Yocon, China). RNA was extracted from the swab samples using a nucleic acid extraction kit (QIAamp viral RNA minikit; Qiagen, USA). The presence of influenza A/H7N9 virus RNA was tested using the RT-PCR method recommended by the World Health Organization (82). The samples were collected and processed for viral sequence analysis following elimination of personally identifiable information. The sequence analyses of H7N9 virus genomes performed at the National Institutes of Health (NIH) were determined to represent an excluded study, i.e., one not requiring Institutional Review Board (IRB) approval by the NIH Office of Human Subjects Protections (exemption number 13268).
Target enrichment probe design. Enrichment probes were designed by Agilent Technologies (Santa Clara, CA) using A/Hangzhou/1/2013 (H7N9) as the reference sequence. The biotin-labeled RNA probes were 120 bases in length with 5-base spacing density, resulting in a total of 2,034 overlapping probes spanning the H7N9 genome.
Library construction and sequencing. For negative-control experiments, HeLa cell total RNA was obtained from TaKaRa (TaKaRa Bio USA, Inc., Mountain View, CA). For positive-control experiments, the hemagglutinin (HA) gene segment (KC853766.1) from A/Hangzhou/1/2013 (H7N9) was synthesized and cloned into the pBluescript II KS(-) SacI site by GenScript (Piscataway, NJ). Plasmid was linearized by NotI (NEB, Ipswich, MA) digestion, and RNA was transcribed using a HiScribe T7 high-yield RNA synthesis kit (NEB, Ipswich, MA) following the manufacturer's instructions. RNA from the HeLa cells, from the T7 promoter-transcribed HA plasmid, and from the isolated total RNA from patient sample throat swabs was amplified using Ovation RNA-Seq system V2 from NuGEN (NuGEN, San Carlos, CA) following the kit specifications. Each sample, containing 5 l of total RNA, was used as input for Ovation RNA-Seq system V2. The amplified total cDNAs were analyzed using an Agilent 2100 Bioanalyzer with an Agilent high-sensitivity DNA kit (Agilent Technologies, Santa Clara, CA) and sheared to 150 bp using a Covaris S2 instrument (Covaris, Woburn, MA). Following this, approximately 300 ng amplified cDNA was used to make each Illumina sequencing library using an Agilent SureSelect XT target enrichment kit (Agilent, Santa Clara, CA) for Illumina multiplex sequencing and the protocol for 200-ng DNA samples. Enriched Illumina sequencing libraries were analyzed on an Agilent 2100 Bioanalyzer using an Agilent high-sensitivity DNA kit. Libraries were then clustered on an Illumina cBot instrument and sequenced on Illumina GAIIx or HiSeq machines according to the instructions of the manufacturer (Illumina, San Diego, CA). In this study, more than 2.8 billion reads and a total of more than 323 Gb of sequences were generated. Data analysis. A consensus sequence derived from all available H7N9 human strains obtained in China in 2013 was used as the reference genome and was generated at the Influenza Virus Resource site (https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?goϭdatabase) (83) on 16 November 2017. This reference sequence is called the 2013 human H7N9 consensus in this study (the sequences are listed in Text S1 in the supplemental material). Amino acid consensus sequences from avian and human H7N9 isolates from 2013, 2014, and 2015 were generated similarly (the sequences are listed in Text S1). Reads were mapped to Bowtie2 (version 2.2.5; http://bowtie-bio.sourceforge.net/bowtie2/index .shtml) indexed to the consensus sequence of all H7N9 human strains in China in the 2013 genome using Tophat2 (release 2.0.13; http://ccb.jhu.edu/software/tophat/index.shtml) downloaded from the Center for Computational Biology, Johns Hopkins University (http://ccb.jhu.edu/) (84). SAMtools mpileup (version 2.1.0) was used to make SNP calls, and the minimum base Phred quality score was 25 (85). The reported SNP calls needed to satisfy the indicated criteria at the SNP position, similarly to those described in a previous study (86), were as follows: (i) more than 100 reads at that position; (ii) reads with SNP calls from both read directions; (iii) a SNP call not at the first or last base of all the SNP-containing reads; and (iv) different bases representing more than 10% of the aligned reads.
Multiple-sequence alignment was performed using MUSCLE (87) version 3.8.31, and midpoint-rooted phylogenetic trees using the approximately maximum-likelihood method were constructed with Fast-Tree version 2.1.8 with the Jukes-Cantor ϩ CAT model (40) and displayed in FigTree version 1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/) and included Shimodaira-Hasegawa test values. Other human H7N9 sequences from the outbreak samples collected in China in 2013 to 2017 and used in this study were downloaded from the Global Initiative on Sharing All Influenza Data (GISAID; http://platform.gisaid .org/epi3/frontend#355fbd) database on 6 April 2017. Identical sequences from the HA segments were eliminated before alignment. The strain names used in the HA tree were used to identify the sequences to make phylogenetic trees for other IAV H7N9 gene segments. In the sample sequences, at nucleotide positions with read coverage but with fewer than 100 reads, bases from the precomputed human H7N9 consensus sequence from each segment were used to assemble contigs.
Protein structure. The crystal structure of the H7 hemagglutinin from A/Anhui/1/2013 in complex with neutralizing antibody CT149 (4R8W) (61) was downloaded from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) (http://www.rcsb.org/pdb/home/home.do) (88), and HA amino acid changes were made and viewed with Swiss-PdbViewer (89) or with the protein structure viewer in the Influenza Research Database (90).
Accession number(s). All sequences have been deposited as a series with accession no. SRR7444529 to SRR7444548 at the GenBank SRA database.