Identification of Head and Neck Cancer Subtypes Based on Human Papillomavirus Presence and E2F-Regulated Gene Expression

Cancer is a complex disease that can be caused by a multitude of factors. HNSCC is complicated because some of these cancers are clearly associated with HPV, while others have no viral involvement. Determining the pathways that are commonly altered in both types of HNSCC, as well as those that are unique to viral and nonviral tumors, is important for a basic understanding of how these cancers arise and progress and critical to the development of targeted therapies. In this work, we show that all HPV-associated tumors have increased expression of E2F target genes, indicating that the tumor suppressor function of Rb is blocked. Importantly, Rb is also inhibited in a subset of nonviral tumors, suggesting that mutations present in these cancers mimic the action of the HPV E6 and E7 oncogenes.

H ead and neck squamous cell carcinoma (HNSCC) is the sixth most common cancer worldwide and is associated with exposure to various risk factors, such as tobacco or human papillomavirus (HPV) infection (1). In recent years, an increase in tumors located in the tonsil and the base of the tongue has been seen due to infection with HPV (1). Multiple subtypes of HNSCCs have been characterized by various genotypic traits, such as alterations in cell growth pathways, and clinical traits, such as the diverse locations of tumors (2,3). This emphasizes the need for the development of targeted therapies, particularly as treatment for HNSCC can be toxic and extremely difficult, with 50% relapse within 2 years of tumor removal (1). Recently, The Cancer Genome Atlas (TCGA) published a comprehensive genomewide characterization of 279 HNSCC pa-tients, identifying genetic/clinical subtypes of the cancer based on the functionality of numerous pathways (3). Specifically, they defined subtypes of HPV-positive (HPV ϩ ) and HPV-negative (HPV Ϫ ) tumors, primarily based on unique mutations and copy number variants, and analyzed how these correlated with various pathways. These studies concluded that the presence of HPV represents a distinct subgroup of the cancer with its own unique genetic signatures, specifically through its modification of the cell cycle regulatory pathway.
The Rb-E2F pathway, as a critical regulatory system for cell cycle progression, is a major target of HPV-mediated tumorigenesis (4). In normal cells, Rb proteins inhibit the transcription factor E2F, preventing E2F from upregulating a collection of genes, i.e., the E2F-responsive genes (ERGs), that are needed for cell proliferation (5). The proteins encoded by many ERGs are involved in nucleotide synthesis, DNA replication, and cell cycle progression. In some cancers, this pathway is altered so that E2F-dependent transcription occurs in an unregulated fashion (6). The HPV E7 protein specifically binds to and degrades Rb, allowing E2F-dependent transcription to occur and the cell cycle to proceed unchecked (7)(8)(9). However, other factors may regulate the Rb-E2F pathway. For instance, growth signals induce the formation of cyclin/cyclin-dependent kinase (CDK) complexes, which inactivate Rb through phosphorylation, and growth-inhibiting signals promote the expression of cyclin kinase inhibitors (CKIs), which interact with cyclin/CDK complexes and prevent them from phosphorylating Rb, thus inhibiting ERG expression (6,10). In normal cells, unregulated cell growth frequently results in apoptosis, often as a result of the stabilization and activation of the tumor suppressor p53 (11). HPV blocks this response through the E6 protein that binds p53 and stimulates its degradation (12)(13)(14).
Much of our knowledge of how HPV affects the Rb-E2F and p53 (Rb-E2F/p53) pathways is based on cell culture and animal models. The TCGA data provide an opportunity to explore the role of these pathways in the context of human tumors. Thus far, most research on the TCGA data has focused on genomewide analyses. In this study, we focus specifically on the Rb-E2F/p53 pathways to identify HNSCC subtypes.

RESULTS
Rb-E2F/p53 pathway expression patterns distinguish HPV ؉ and HPV ؊ tumors. Twenty-five genes (see Table S1 in Data Set S1 in the supplemental material) representing members of key interacting groups in the Rb-E2F/p53 pathways (cyclins/CDKS, CKIs, E2Fs, RBs, and TP53), as well as genes connecting the Rb-E2F and p53 pathways (CDKN2A/p14ARF and MDM2), were selected for analysis. We captured TCGA transcriptome sequencing (RNA-seq) expression values for 499 tumor and 43 normal tissue samples and visualized the cellular mRNA levels across HPV ϩ and HPV Ϫ tumor samples with boxplots (Fig. 1A). Using a P value cutoff of 1eϪ4, we found that the mRNA levels of 17/25 genes were significantly different between HPV ϩ and virus-negative samples (Table S1 in Data Set S1). Genes exhibiting significantly higher mRNA levels in HPV ϩ tumors included many known targets of Rb-mediated transcriptional repression, such as E2F1, E2F2, and RBL2 (p107). Interestingly, while E2F1 to -3 are all known as activators of cell cycle progression (5), the levels of E2F3 were not significantly different between virus-associated and virus-free tumors.
These results suggest that HPV-associated tumors can be distinguished from nonviral tumors based on changes in the expression of genes in the Rb-E2F/p53 pathways. To test this, we performed consensus clustering based on categorized RNA levels of genes (see Materials and Methods) in these pathways (Fig. S1). This analysis showed that HPV ϩ tumors cluster as a single group.
HPV ؉ tumors and a subset of HPV ؊ tumors have altered E2F-regulated gene expression. The combination of increased levels of E2F1 and E2F2 with HPV-mediated degradation of Rb should lead to increased expression of an array of genes whose expression is dependent on E2F. Therefore, we analyzed the RNA levels of ERGs in HNSCC tumors. We used a specific list of 325 ERGs (Table S2 in Data Set S1) (15) and categorized their mRNA levels in each tumor sample as upregulated, downregulated, or unchanged (see Materials and Methods). Then, for each tumor sample, we calculated the total number of upregulated ERGs.
In HPV ϩ HNSCC samples, there was a range of 175 to 230 (median, 202) ERGs upregulated in each tumor, with three low outliers ( Fig. 2A). The high number of ERGs upregulated in HPV ϩ tumors supports the results presented in Fig. 1 and suggests that HPV affects the Rb-E2F pathway. In HPV Ϫ HNSCC samples, there was a much larger range of 40 to 230 (median, 163) ERGs upregulated in each tumor. The two tumor groups were significantly different from each other (t test, P Ͻ 2.2eϪ16). The range in ERG upregulation suggests that three groups of tumors can be defined based on ERG upregulation: HPV ϩ , high-ERG tumors (more than 175 ERGs upregulated), HPV Ϫ , high-ERG tumors (more than 175 ERGs upregulated), and HPV Ϫ , low-ERG tumors (less than 175 ERGs upregulated).
Analysis of tumor subgroups. Since HPV Ϫ , high-ERG tumors and HPV ϩ tumors seem to be affecting the Rb-E2F pathway in similar ways, we wanted to learn which pathways the HPV Ϫ , low-ERG group may be uniquely affecting. We searched for genes regulated exclusively in the HPV Ϫ , low-ERG subset of tumors. We used an expression cutoff of 1 to remove genes that had negligible RNA levels in both normal tissue and tumors (see Materials and Methods). We then performed analysis of variance (ANOVA) on the categorized RNA levels of the remaining genes that had measurable expression levels, searching for significant differences in RNA levels between the three ERG tumor classes: HPV Ϫ , low ERG; HPV Ϫ , high ERG; and HPV ϩ , high ERG. ANOVA identified 2,513 genes with significantly different mRNA levels (P Ͻ 1eϪ10) between the three groups. The Tukey test was applied to the ANOVA results to calculate the differences between pairs of groups for each gene. Out of these 2,513 genes, 586 genes (Table S3 in Data Set S1) were identified as having significantly different RNA levels in the HPV Ϫ , low-ERG tumors than in the HPV ϩ and HPV Ϫ , high-ERG tumors (P Ͻ 0.01) but similar RNA levels in HPV ϩ and HPV Ϫ , high-ERG tumors (P Ͼ 0.01).
These 586 genes were grouped by clustering, and the results are displayed as a heatmap in Fig. 2B. The HPV ϩ and HPV Ϫ , high-ERG tumors clustered together and apart from the HPV Ϫ , low-ERG tumors. For the HPV ϩ tumors, the different HPV species that we detected did not cluster together (data not shown). In the HPV Ϫ , low-ERG tumors, a group of 45 genes with unique upregulation was identified (Table S4 in Data Set S1).  Table S1 in Data Set S1 in the supplemental material) are represented by boxplots. Data for HPV ϩ samples are colored in teal, and data for HPV Ϫ samples are in red. (B) The Rb-E2F/p53 pathways in HPV ϩ tumors. The CDKN2A gene produces differentially spliced transcripts for the p16 and ARF proteins. Red, upregulated; green, downregulated; black, no significant difference.

Identification of HNSCC Subtypes Based on HPV and ERGs
A subsequent analysis further supported the identification of this group, where these genes were in the top 58 genes with the highest percentages of upregulation in the low-ERG group and smallest percentages of upregulation in the high-ERG group. Subsequent annotations of these genes using DAVID (16) revealed a weak association with cell processes, such as ectoderm development (P ϭ 9.0eϪ6) and keratin (P ϭ 4.3eϪ2), cytoskeleton (P ϭ 5.4eϪ2), and protein complex (P ϭ 1.3eϪ1) assembly. Genes uniquely upregulated in the HPV Ϫ , low-ERG tumors were also analyzed with  (Table S2 in Data Set S1) was quantified for each patient. HPV Ϫ patient data are marked by red circles, HPV ϩ patient data by blue circles, and outliers by black circles. Three groups are defined based on the number of ERGs upregulated and the presence of HPV: HPV ϩ , high-ERG tumors; HPV Ϫ , high-ERG tumors; and HPV Ϫ , low-ERG tumors (the latter group is bounded by the rectangle). The t test was used to compare the difference between the results for HPV ϩ and HPV Ϫ samples. (B) Analysis of ERG-defined HNSCC groups shows a subset of genes uniquely upregulated in the HPV Ϫ , low-ERG group. The global mRNA levels in the three HNSCC groups were analyzed. Gene expression was categorized with a cutoff of 1 (see Materials and Methods), and an ANOVA/Tukey test was run to find genes differentially regulated between the HPV Ϫ , low-ERG group and HPV Ϫ or HPV ϩ , high-ERG tumors. The resulting 586 genes were clustered, and the results displayed as a heatmap. Genes that are uniquely upregulated in HPV Ϫ , low-ERG tumors are highlighted with a rectangle.
Ingenuity Pathway Analysis software (see Materials and Methods), which revealed two major networks: dermatological diseases/inflammatory response and cell death/ survival/development (Table S5 in Data Set S1). We were unable to identify transcription factor signatures in these genes using ENRICHR.
The ERG-defined groups were further analyzed by clinical phenotypes (Table S6 in Data Set S1). Both HPV Ϫ , high-ERG and HPV ϩ tumors were more associated with higher tumor grades and clinical N grades (lymph nodes) than the HPV Ϫ , low-ERG group. Previously identified HPV ϩ HNSCC clinical associations were also verified, such as a high presence in the base of the tongue and the tonsil, a better vital status, and a higher occurrence in males (1). We did not find clinical phenotypes significantly correlated with the HPV Ϫ , low-ERG group. The ERG-defined groups were also analyzed for mutational signatures, but no significant patterns were found due to the small mutational overlap across patients (Text S1 and Tables S7 and S8 in Data Set S1).
HPV ؉ tumors have lower levels of pRb and higher levels of p16 protein than HPV ؊ tumors. Next, we examined the levels of the phosphorylated pRb and p16 proteins in HNSCC tumors using normalized reverse-phase protein array (RPPA) data from TCGA. Due to HPV E7's ability to cause degradation of Rb, we expected to see low Rb levels in HPV ϩ tumors, and we found that phosphorylated pRb is present at very low levels in HPV ϩ tumors compared to the levels in HPV Ϫ tumors (t test, P ϭ 8.0eϪ8) (Fig. 3A). We were unable to determine the levels of total or hypophosphorylated pRb, Identification of HNSCC Subtypes Based on HPV and ERGs since RPPA data were unavailable. The protein expression of the cyclin-dependent kinase inhibitor p16 (CDKN2A gene) was also analyzed to determine whether the frequent upregulation seen in HPV ϩ tumors at the RNA level (P ϭ 2.8eϪ18) ( Table 1) translates into increased protein levels. We found that p16 protein expression is higher in HPV ϩ tumors than in HPV Ϫ tumors (t test, P ϭ 5.9eϪ7) (Fig. 3B). Previous studies have suggested an epigenetic explanation for the p16 addiction seen in HPV ϩ tumors, specifically by the demethylase KDM6 (17)(18)(19). We analyzed CDKN2A upregulation in relation to the RNA levels of KDM6-regulated genes, as well as the levels of KDM6 demethylase itself (Fig. S2). No significant associations were found between HPV ϩ samples and the presence of KDM6, suggesting that a different mechanism may be responsible for the HPV ϩ tumors' unique CDKN2A upregulation. We also analyzed p53 protein levels to determine whether p53 is degraded in HPV ϩ tumors. However, we observed that the p53 protein levels in HPV ϩ tumors were similar to the median protein levels of p53 in HPV Ϫ tumors (data not shown).
Mutational profiles are consistent with active HPV E7 and E6 gene functions. E7 inactivates Rb function by binding it and stimulating its degradation. Accordingly, the presence of the HPV E7 protein and upregulated ERGs in HPV ϩ tumors indicates that Rb-mediated transcriptional repression is impaired in these samples. Furthermore, the HPV E6 protein present in these tumors should block p53 function. This is consistent with the gene expression profiles described above.
Therefore, we hypothesized that there would be no selective pressure to mutate RB1, CDKN2A, or TP53 in HPV ϩ HNSCC tumors. To test this, mutation data from TCGA were analyzed for each tumor. If a gene had one or more nonsynonymous mutations, it was scored as mutated. Otherwise, it was considered wild type. All genes in the Rb-E2F pathway were analyzed (Table S9 in Data Set S1), but only CDKN2A, TP53, and RB1 had significantly different mutational profiles (P Ͻ 0.05). No mutations were detected in CDKN2A in the presence of HPV (0/68), but CDKN2A mutations were found frequently in HPV Ϫ HNSCC tumors (113/440, P ϭ 1.8eϪ8) ( Table 2). While these mutations are based on the p16INK4a transcript, we also found that 70 of the 113 patients with CDKN2A mutations also have an altered p14ARF transcript. Similarly, only one HPV ϩ tumor (1/68) had a TP53 mutation, whereas the majority of HPV Ϫ tumors (359/440) had a TP53 mutation (P ϭ 1.2eϪ40) ( Table 2).
Interestingly, RB1 appears to be mutated in almost 10% of the HPV ϩ tumors (6/68) and in a few of the HPV Ϫ tumors (12/440, P ϭ 0.02) ( Table 2). The higher RB1 mutation frequency seen in HPV ϩ tumors compared to the frequency in HPV Ϫ tumors is  unanticipated, due to HPV's ability to degrade pRb. We hypothesized that the mutations may be nonfunctional, occurring in the presence of HPV solely due to random chance, and thus, we investigated whether the mutations in RB1 had a functional effect. However, many of the mutations were frameshift deletions or missense mutations, significantly changing the amino acid characteristics in important regions (Fig. 4A).
Thus, it appears that all of the mutations present in HPV ϩ tumors would inactivate pRb function. Next, we analyzed the LXCXE motif, the HPV E7 region that binds to pRb, in these tumor samples. We found that the E7 gene was wild type in all samples (data not shown). Finally, we explored the idea that E7 may not be highly expressed in tumor samples harboring RB1 mutations. We compared E7 RNA levels between HPV ϩ , RB1 mutant samples and HPV ϩ , RB1 wild-type samples. The results show similar levels in both types of HPV ϩ tumors: the RB1 mutant, HPV ϩ tumors have an E7 mean value of 244.2, while the RB1 wild-type, HPV ϩ tumors have an E7 mean value of 240.7 (t test, P ϭ 0.98) ( Fig. 4B and C). Similarly, there were no significant differences in the RNA levels of the other HPV proteins in the two types of HPV ϩ tumors (Fig. 4C).

DISCUSSION
We have examined the effects of HPV on the Rb-E2F/p53 axis in human squamous cell carcinomas of the head and neck. We captured DNA sequencing, RNA-Seq, mutation, and protein data representing 499 patients from TCGA. HPV ϩ tumors were identified by aligning DNA and/or RNA-seq reads to sequences in HPV databases. Consistent with previous reports, we found that 14.2% of HNSCC tumors contained HPV (B) RNA levels of HPV E7 in HPV ϩ RB1 mutant (red) and RB1 wild-type (blue) HNSCC tumors were analyzed using a boxplot. Data for outliers are in black. (C) The t test was used to compare the average RNA levels for other HPV proteins (E4, E5, E6, and E7) between the two groups (HPV ϩ RB1 mutant and RB1 wild-type HNSCC tumors), to check for any significant differences between these two groups. (3,20). The RNA-Seq data indicate that all HPV-positive tumors express the HPV E7 and E6 oncogenes. Thus, the Rb and p53 tumor suppressors should be inactive in these tumors. To test this, we compared the mRNA levels and, in some cases, protein levels of components of the Rb-E2F/p53 pathways and their downstream targets in HPVassociated tumors with the levels in tumors that did not harbor HPV, as well as the levels in normal tissue.
We examined several key components of the Rb-E2F/p53 pathways in HPV ϩ and HPV Ϫ tumors. Tumors containing HPV consistently had lower levels of phosphorylated pRb protein than virus-negative tumors. This is consistent with the action of the HPV-encoded E7 protein, which is known to stimulate pRb degradation.
Surprisingly, there was no significant difference in p53 protein levels between virus-associated and virus-negative tumors, although p53 mRNA levels were elevated in the virus-positive group. It is unclear if this indicates that the HPV E6 protein does not degrade p53 in these tumors or if p53 protein levels are universally low in both HPV ϩ and HPV Ϫ HNSCCs. The latter interpretation is consistent with data reviewed below that strongly indicate that p53 is inactive in nearly all HNSCCs (3). Since the p53 antibody used came from Cell Signaling Technology, Inc. (catalog number 9282), and was used for all TCGA RPPA assays (direct communication with the RPPA core), we believe these results are not due to the failure of the antibody to detect both mutant and wild-type forms of p53. Finally, consistent with a number of reports, the protein levels of p16 were consistently elevated in HPV-associated tumors (19,21,22).
Gene expression profiling analysis also indicated that the pRb and p53 tumor suppressors were inactive in HPV-associated HNSCCs. First, the mRNAs encoding several cyclin-dependent kinase inhibitors, including members of both the INK4a (p16, p18, and p19) and CIP1/WAF1 (p27 and p57) families, were elevated in HPV ϩ tumors. These proteins share the ability to antagonize the activity of the cyclin-dependent kinases responsible for pRb inactivation (6). Furthermore, the levels of cdk6 and cyclin D1 mRNAs were lower in HPV ϩ tumors. These observations suggest that pRb should be active in HPV-associated HNSCCs. However, the presence of the HPV E7 protein leads to pRb degradation, thereby bypassing these brakes to cell proliferation. Consistent with this interpretation, HPV ϩ tumors also contained elevated levels of mRNAs encoding members of the E2F transcription family and of a number of genes whose expression is known to be E2F dependent ( Fig. 1 and 2A).
If the pRb and p53 tumor suppressors are inactivated by HPV, there should be little or no selective advantage for these genes to be mutated during the course of tumorigenesis. Similarly, there would be no selective advantage to mutate p16, since HPV E7 has eliminated its downstream target, pRb. In fact, an analysis by the TCGA consortium reported that while the vast majority of nonviral HNSCCs have mutations in p53, HPV-associated tumors do not (3). We confirmed this observation and extended it to include p16. Similar to p53, we found that p16 was frequently mutated in nonviral HNSCCs, while none of the HPV-associated tumors carried mutations in this gene. Surprisingly, we found that RB1 was mutated in about 10% of HPV ϩ tumors and in 2.7% of HPV Ϫ tumors. In these cases, the nature of the mutations suggests that they would inactivate RB1 function. Examination of the HPV E7 proteins in these cases indicated that they were wild type with an intact LXCXE motif, indicating they should be fully capable of inactivating pRb. One interesting possibility is that the mutations of RB1 occurred before HPV infection and integration. In such cases, the RB family members p130 and p107 might compensate for the loss of pRb. The subsequent appearance of E7 would then add an additional selective advantage by eliminating p130 and p107. Alternatively, the E7-mediated inactivation of pRb in these tumors may be inefficient for some unknown reason, thus providing a selective pressure for mutation. While early studies showed no evidence of mutations in these pathways, interestingly, pRb mutations were found in HPV ϩ cervical carcinomas in a recent study (23)(24)(25).
Rather than defining tumors through a global analysis of gene expression, this study took a targeted approach by comparing HNSCC samples based on the presence or absence of HPV and focused specifically on gene expression and mutational patterns in the Rb-E2F/p53 pathways, which are disrupted by the E7 and E6 oncoproteins from HPV. We demonstrated that HPV ϩ HNSCCs form a distinct group when clustered by Rb-E2F/p53 component analysis and that the properties of this group of tumors largely followed the current paradigms for HPV action on this pathway. Interestingly, this study revealed two distinct types of HPV Ϫ HNSCCs based on the expression of E2F-regulated genes (Table 3). One group showed elevated levels of ERGs similar to those seen in HPV ϩ tumors. The second group showed little or no alteration of ERG expression. This is consistent with previous reports that indicate normal cell proliferation and tumorigenesis can occur in the absence of activator E2Fs under some circumstances (26)(27)(28). An understanding of the different patterns of expression in these subsets will move us closer to learning the mechanisms by which cancer progresses in each HNSCC subtype.
Categorizing the expression of genes. Genes were categorized as being as upregulated, downregulated, or unchanged in tumors relative to their expression in normal tissue. For each gene, both the median expression and the median absolute deviation (MAD) of the 43 normal samples were calculated. Median was chosen instead of mean because outlier values could skew the mean significantly. Then, the tumor expression of each gene in 499 HNSCC patients was compared to the median of its expression in normal tissue and categorized accordingly: upregulated if the tumor expression value for that gene is greater than the gene's normal median plus 1 MAD, downregulated if the tumor expression value for that gene is less than the gene's normal median minus 1 MAD, and normal if the tumor expression value of the gene falls anywhere within 1 MAD of the gene's normal median. We found this categorization method to be more effective than the TCGA expression values in predicting HNSCC subtypes (see Text S1 and Table S10 in Data Set S1 in the supplemental material). In certain situations, a cutoff for this categorization was applied, in which case genes with RNA levels below the cutoff in both normal tissue and tumors were not considered in the analysis (Text S1).
Clustering gene expression. Gene expression patterns were clustered using the R pheatmap package with the default clustering distances. The default Euclidian distance clustered HPV ϩ tumors together better than other clustering distance methods. The categorized data (described above) were clustered using specific genes of interest. Data for genes with similar patterns of expression across tumor samples were clustered on the y axis, and data for patient tumor samples with similar patterns of expression across the genes of interest were clustered on the x axis. HPV presence was detected in TCGA BAM files as described previously (29). A virus was considered to have been detected in a sample if the number of alignments to the virus was Ն1,000. Protein and mutation data. Protein expression in tumors was measured as a z score compared against the expression in other HNSCC tumors. The Rb protein measured was phosphorylated (S807/ S811) Rb. A gene was considered mutated if one or more nonsynonymous mutations were found in that gene. No mutation was defined by zero nonsilent mutations or only silent mutations.
Correlations. To verify the association of clusters seen in the heatmap representation, correlations were done using the chi-square test or the Fisher test in R. Specifically, this tested for significant association of the genes' mRNA levels (downregulated, upregulated, or normal) with the presence of HPV or mutations (P Ͻ 0.01). The Fisher exact test was also used for the mutation contingency analysis.
Functional analysis. To gain a better understanding of the biological function and significance of groups of genes, the Database for Annotation, Visualization, and Integrated Discovery (DAVID) was utilized (16). Genes were entered into the Functional Annotation Tool and analyzed with Functional Annotation Clustering, which gives enrichment scores and P values for the associated functions of the genes. Similarly, Ingenuity Pathway Analysis software (Qiagen) was used to identify pathways and diseases that are associated with a group of genes.