Combined Transcriptome and Proteome Analysis of Immortalized Human Keratinocytes Expressing Human Papillomavirus 16 (HPV16) Oncogenes Reveals Novel Key Factors and Networks in HPV-Induced Carcinogenesis.

Human papillomavirus (HPV)-associated cancers still remain a big health problem, especially in developing countries, despite the availability of prophylactic vaccines. Although HPV oncogenes have been intensively investigated for decades, a study applying recent advances in RNA-Seq and quantitative proteomic approaches to a precancerous model system with well-defined HPV oncogene expression alongside HPV-negative parental cells has been missing until now. Here, combined omics analyses reveal global changes caused by the viral oncogenes in a less biased way and allow the identification of novel factors and key cellular networks potentially promoting malignant transformation. In addition, this system also provides a basis for mechanistic research on novel key factors regulated by HPV oncogenes, especially those that are confirmed in vivo in cervical cancer as well as in head and neck cancer patient samples from TCGA data sets.

KEYWORDS HPV, RNA-Seq, SILAC, TCGA, cervical cancer, head and neck cancer, integrated analysis H igh-risk human papillomaviruses (hrHPVs) are not only the etiological agents for cervical cancer (1,2), but also associated with other malignancies, including head and neck (60%) (3), anal (93%) (4), vulvar (69%), vaginal (75%), and penile cancers (47%) (5). Among all hrHPV types, HPV16 is the most prevalent one, posing a heavy health burden on both males and females (5). Since 2006, three prophylactic HPV vaccines have been widely used worldwide to provide protection against hrHPV-related cancers (6,7). However, these cancers still remain a challenge in countries where screening and vaccination are unavailable (8). Recent global cancer statistics from GLOBOCAN show that cervical cancer is still a leading cause of death in 42 countries in Sub᎑Saharan Africa and Southeast Asia (9). Meanwhile, therapeutic vaccines targeting existing persistent infections or hrHPV-positive tumors are still in their infancy (10). Further studies to identify novel factors and pathways regulated by hrHPV are mandatory to provide new insights into virus-host interactions to allow more-effective diagnoses and treatments of hrHPV-associated cancers.
Taking an evolutionary viewpoint into account (11), HPVs are well-selected entities whose oncoproteins always attack central hubs within key regulatory pathways to gain a selective advantage (12). The hrHPV E6 and E7 oncoproteins target p53 and pRB, two of the key cellular tumor suppressors, for proteasomal degradation (13,14), and their ectopic expression is sufficient to immortalize primary keratinocytes (15,16). Besides sustaining proliferative signaling, promoting genome instability, and resisting cell death, hrHPV E6 and E7 also account for immune escape, angiogenesis, and the formation of a proproliferative microenvironment, making them key players in tumor development by affecting the hallmarks of cancer (17,18).
Interactions between HPV oncogenes and host cells have been intensively investigated during the past 15 years, during which time methods such as microarray analysis, transcriptome sequencing (RNA-Seq), or proteomics revealed global changes caused by E6 and E7. In principle, there are different ways to investigate the role of HPV oncogenes in host cell transformation. The most common approaches were either to express or to knock down E6 and/or E7 to study the changes in host transcriptome profiling (19)(20)(21)(22)(23). In addition, available cell lines or clinical samples with different progression states were used to examine the role of either distinct HPV oncogenes or the whole viral genome in an episomal or integrated state (22,24,25). Moreover, there were also attempts to understand the host transcriptome of cells where a productive cycle of HPV was taking place (26). The generation of new HPV progenies can occur only when the host cell is able to differentiate (27). In contrast, HPV-induced cancer is considered to be an evolutionary accident with a dead end for the viral life cycle (28).
In the present study, we used immortalized human keratinocytes (normal oral keratinocytes [NOKs]) (29) with a well-defined expression of HPV16 oncogenes and the corresponding parental negative-control cells. We intentionally chose immortalized cells over primary cells, allowing a certain number of cell doublings to guarantee sufficient amino acid incorporation for stable isotope labeling by amino acids in cell culture (SILAC). Furthermore, the use of immortalized cells also permits an unlimited number of follow-up experiments in the HPV-negative parental cell line without bias being created by differentiation of primary cells or by variations among different donors (30,31). Using this in vitro model, transcriptome and quantitative proteome data were combined for the first time to identify novel regulators that putatively contribute to host cell transformation. Moreover, to enhance the validity of RNA-Seq analysis, results were processed by different bioinformatic methods to get a more complete and less biased overview of global changes caused by HPV16 oncogenes. Key findings were supported by data from various cancer entities in The Cancer Genome Atlas (TCGA).
Furthermore, initial analysis of RNA-Seq and SILAC data identified factors (Table S3) that were already shown to be affected by HPV in multiple previous studies, including the upregulated genes IL1A (39), UCHL1 (40), TYMS (41), EGFR (42), TP63 (43), and PCNA (44) and the downregulated genes STAT1 (45), FN1 (46), and ISG15 (47). In other words, the validation of aforementioned findings served as an internal reference for the reliability of this cell system to identify new factors with important functions in HPV-related cancers. Genes that are highly modulated in all lists were validated by qPCR FIG 1 Establishment of a keratinocyte cell line expressing HPV16 E6 and E7. (a) Schematic representation of the vector construct used to stably express E6 and E7 in NOKs. Strep-tagged E7 and Flag-tagged E6, separated by a P2A sequence, were cloned into the pWPI vector for lentiviral packaging. NOK cells were subsequently transduced and selected with puromycin. Pools of oncogene-transduced or empty vector control cells were used for all further studies. 3=-LTR, 3= long terminal repeat; IRES, internal ribosome entry site. (b) Semiquantitative PCR of transduced NOKs. RNA of untransduced (NOKs), NOKs with an empty vector control (NOKs pWPI), and NOKs with oncogenetransduced cells (NOKs HPV16 E6E7) was reverse transcribed, and PCR was performed with primers for HPV16 E6/E6* and E7. ACTB served as an internal control. (c) Quantitative PCR comparing relative transcript levels of E6 and E7 from NOKs HPV16 E6E7 with the HPV16-positive cancer cell line CaSki. Transcript levels were normalized to those of TOP1 as a housekeeping gene. (d) Western blot analyses of untransduced and transduced NOKs. Protein lysates (30 g) were analyzed for the presence of HPV16 E6 and E7 using oncoprotein-specific antibodies as well as anti-Flag (E6) antibody or streptavidin-HRP (E7). Levels of p53 and pRb were analyzed to confirm the functionality of the viral proteins. Actin served as a loading control. (e) Light microscopic image of transduced cells prior to RNA extraction. Scale bar ϭ 200 M. (Fig. S1a). Here, the expression of CPPED1, OAS2, OAS3, FN1, SAMHD1, and ISG15 was significantly downregulated, while that of KYNU, LCP1, UCHL1, and GAGE12H was upregulated, comparable to the results from RNA-Seq. Significantly modulated proteins from SILAC results were validated by Western blotting (Fig. S1b).
Next, we analyzed whether general transcriptome changes observed here are in accordance with those of other studies. For this purpose, we took the 155 factors FIG 2 Schematic overview of the experimental procedures used in this study. Protein and RNA were extracted from NOKs pWPI and NOKs HPV16 E6E7 to perform SILAC and RNA-Seq experiments in parallel. Proteins from the two cell lines were quantified and mixed at a ratio of 1:1, followed by trypsin digestion, two-dimensional LC separation of peptides, and MS/MS analysis. Protein identification, abundance calculation, and annotation were performed with MaxQuant. Perseus was used for comparison and statistical analysis. mRNA was enriched for library preparation and single-read sequencing. Raw reads were put through the FASTX Toolkit and Homertools to remove adaptor sequences and reads with bad quality. Four different sets of tools, including Bowtie2 plus DESeq2, HISAT2 plus StringTie plus DESeq2, Salmon plus DESeq2, and Salmon plus Sleuth were used for read alignment and gene expression quantification. The outputs of each set of tools were abbreviated BtDe, HsStDe, SmDe, and SmSu, respectively. Data from SILAC and RNA-Seq were combined for pathway analysis and interpretation of biological functions. Illustrations for the dish, cell, and protein were obtained from Library of Science and Medical Illustrations (http://www.somersault1824.com) and are used here with permission. identified above (Fig. 3b) and compared them to the results of three other studies performed with primary foreskin keratinocytes (48), primary lung fibroblasts (49), and primary cervical keratinocytes (50) (Table S4). Although certain discrepancies can be seen among all four studies, we found a large number of genes, such as TAGLN, TGM2, FN1, AURKB, and SAMD9, showing changes with the same tendencies, especially when this study was compared to the RNA-Seq-based study of primary foreskin keratinocytes (48).
IPA analyses identified new upstream regulators putatively involved in HPVdriven carcinogenesis. In order to determine whether changes in the transcriptome and proteome can be attributed to a small number of regulatory proteins, we subjected all five lists of significantly changed genes/proteins to comparative pathway analyses using Qiagen's Ingenuity Pathway Analysis (IPA) platform. For this purpose, the results of individual core analyses and the data sets obtained from the four RNA-Seq lists as well as SILAC results were compared to each other. Figure 4 shows the 44 upstream regulators with the highest activity scores (activation Z-score), including regulatory FIG 4 Activity profile of upstream regulators predicted by pathway analysis. Significantly deregulated factors obtained by different bioinformatic processing methods and SILAC data were individually analyzed using the core analysis function of Ingenuity Pathway Analysis (IPA) and subsequently compared with each other. Depicted are predicted upstream regulators displaying activation factors higher than 3 (orange) or lower than Ϫ3 (green). Pubmed citations with HPV or cancer in previous studies are indicated as ϩϩϩ (mentioned Ͼ100 times), ϩϩ (mentioned Ͼ10 times), ϩ (mentioned at least 5 times),°( mentioned Ͻ5 times), and -(not reported).
Omics Analysis of HPV16-Positive Keratinocytes factors that were previously reported to be affected by HPV, like interferons (51), STAT1 (45), TGFB1 (52), and the estrogen receptor (ESR1) (53). Some of the predicted upstream regulators were also supported by data from other studies (Table S6), such as RABL6, CDKN1A, TNF, and EHF. Of the 3,837 factors whose levels were significantly altered in at least one of the five data sets, 15.27% (586 factors) are predicted to be regulated by at least one of the 44 identified upstream regulators (Table S5). The respective values of this ratio in the individual data sets ranged from 14.04% in SmDe (430 out of 3,063 genes) to up to 32.26% in BtDe (160 out of 496 genes). This highlights that oncogene-induced changes of a few upstream regulating factors are sufficient to cause major alterations both in the transcriptome and in the proteome.
In order to assess the influence of these upstream regulators, we conducted further analyses, which revealed clusters that can be partially organized in a hierarchical structure (Fig. 5). The largest identified cluster consists of 21 regulators that are all directly connected to each other (Fig. 5a). Moreover, some of the upstream regulators are situated downstream of others. STAT1 (a well-documented factor in gamma interferon signaling that is decreased upon HPV oncogene expression) not only displays a self-regulatory loop but additionally is regulated by five upstream factors, including a transcription factor (CNOT7) shown to negatively affect STAT1 expression (54). Another central node is the hypoxia-inducible transcription factor (HIF1␣). Apart from its regulation by STAT1, it is shown to be downstream of two other transcription factors, namely, SPDEF and MITF. Notably, knockdown of SPDEF was previously shown to increase the expression of HIF1␣ (55).
A second cluster links 12 upstream regulators by indirect associations (Fig. 5b). This cluster is comprised mainly of innate immunity-associated factors, like interferons and Toll-like receptors, whose activity is predicted to be predominantly downregulated based on the expression of their respective target genes. Apart from an upstream regulation of TGM2 by PLA2R1 and miR-122 (Fig. 5c), the remaining eight upstream regulators do not have any reported association with any of the other factors analyzed here (Fig. 5d). Their respective functions have previously been mentioned in different cancer backgrounds but not yet in the context of HPV (Fig. 4).
Since the observed HPV-induced proteome changes occurred mainly at the transcriptional level (Fig. 3), we focused on transcriptional regulators because their activity changes can affect multiple downstream targets. In addition to STAT1, HIF1␣, and TP63, which have been reported to be affected by HPV before (43,45), pathway analysis predicted CNOT7, SPDEF, MITF, PAX5, and HIC1 as major novel regulators ( Fig. 4 and 5). In general, these transcription factors have been shown to regulate immunity, proliferation, differentiation, and cancer in other tissues. Since so far no association to HPV-related cancer has been reported, these novel upstream regulators may play a central role in HPV-driven cancer development. Remarkably, among the downregulated genes assessed by RNA-Seq (Fig. 5, blue boxes), the five predicted upstream regulators, STAT1, ISG15, USP18, TGM2, and EIF2AK2, themselves are shown to be affected by multiple other upstream regulators (Fig. 5, blue boxes with yellow frames). This further highlights a strong link between the identified regulatory factors and their activity in response to HPV16 oncogene expression. Since the predictions made by IPA are based on RNA-Seq, selected genes from Fig. 5 were validated by qPCR. The chosen genes again show changes similar to those seen in RNA-Seq (Fig. S2).
Genes regulated by identified regulators are involved in multiple biofunctions and cellular pathways. Based on the data sets analyzed by IPA, putatively altered biological functions that can uncover the possible impact of the HPV-induced changes on host cells were revealed. We chose predicted cellular processes that appeared in at least two of the analyzed lists and grouped them according to their functional context (Fig. 6). As expected, in cells expressing HPV oncogenes, processes favoring viral replication and propagation as well as host cell cycle were activated, whereas functions associated with immune responses were inhibited. Note that also cell movement, cell adhesion, and cell death were predicted to be repressed throughout all five lists. The Omics Analysis of HPV16-Positive Keratinocytes overall tendencies of increased or decreased activity were highly comparable throughout all different data sets. However, some processes, such as colony formation and cell invasion, showed opposing activity changes. This indicates that a slightly different composition of the altered genes can easily tip the balance in opposing directions with respect to such predictive analyses.
We next assessed how many of the DE genes underlying each depicted cellular process from Fig. 6 were reported to be regulated by one or more of the 44 predicted upstream regulators (Fig. 4). The ratios listed for each biological function in Fig. 6 are derived from the number of factors regulated by any of the identified upstream regulators versus the total number of DE genes/proteins assigned to the respective genes for which an association with the regulator was not yet fully confirmed. The average predicted activity changes of regulators are represented by the intensity of their respective color (orange, increased activity; green, decreased activity). Black-framed boxes depict a regulation of the respective gene by more than one of the identified upstream regulators. Yellow-framed genes are themselves predicted to be upstream regulators. Upstream regulators were grouped into clusters according to their previously reported associations. (a) Cluster of factors directly involved with each other (solid lines); (b and c) clusters of predicted factors indirectly associated with each other (dashed lines); (d) upstream regulators without reported links to any of the other predicted factors from panels a to c.

FIG 6
Predicted diseases and biological functions based on genes affected by upstream regulators in response to HPV oncogene expression. Genes that are regulated by predicted upstream regulators in the individual data sets were subjected to comparative pathway analysis using IPA. Predicted effects on the highlighted diseases and biological functions in each group are represented by color. Orange, increased activity; green, decreased activity; N/A, not represented in the respective data set. The ratios (also shown as percentages) depict how many of the DE genes/proteins that were assigned to the respective pathway by IPA were regulated by one or more of the upstream regulators in Fig. 4. pathway. These ratios, ranging from 40% (migration of cells, epithelial-mesenchymal transition of tumor cell lines) to up to 83% (viral infection), suggest that these regulators play a paramount role in HPV16 E6E7-expressing cells.
To further evaluate the significance of the 586 factors (Table S4) that are controlled by the 44 identified upstream regulators, we subjected these genes to pathway enrichment analysis by REACTOME (56). The top 25 predicted pathways are shown in Table 1. Here, seven of the pathways are connected to innate immune signaling, highlighting a major role of HPV oncogene expression to circumvent immunological detection. Apart from mitosis and cell cycle progression, a senescence-associated secretory phenotype was also identified. Moreover, 11 pathways associated with cellular organization, cell contact, and motility, such as extracellular matrix (ECM) composition, collagen synthesis, and integrin and syndecan interactions, were identified. Hence, expression of E6 and E7 might affect not only cellular shape and organization but also interactions with surrounding tissues by alterations in the ECM.
Identification of novel factors and gene families potentially involved in carcinogenesis. Besides identifying the newly identified upstream regulators and the corresponding cellular networks, this study identified several genes and gene families that could not be assigned to any previously reported network or pathway but might be of potential importance in cancer development (Table 2). For example, CPPED1, GPRC5A, and TAGLN, recently reported to have tumor-suppressive function (57)(58)(59), were significantly downregulated at both the mRNA and protein level. Conversely, AURKB, KYNU, and LCP1 were found to be upregulated by HPV16 E6 and E7. Of note, Aurora kinase B, one of the essential kinases for cell division via regulating mitosis, is also upregulated in multiple cancer types, and its overexpression leads to unequal distributions of genetic information and, subsequently, aneuploidy (60). a Five hundred eighty-six factors regulated by identified upstream regulators were subjected to pathway analysis using the REACTOME database. Found entities are numbers of molecules of the type selected with result types that are common between the submitted data set and the pathway named. The entity ratio is the ratio of entities from this pathway that are molecules of the type selected with "results type" versus all entities in REACTOME of the type selected with "results type." The entity P value is the result of the statistical test for overrepresentation for molecules of the result types selected. The entity FDR is the overrepresentation probability corrected for the false-discovery rate. The number of found reactions is the number of reactions in the pathway that are represented by at least one molecule in the submitted data set for the molecule type selected with "results type." The reaction ratio is the ratio of reactions from this pathway that contain molecules of the type selected with "results type" versus all REACTOME reactions that contain molecules of the type selected with "results type." We also found a deregulation of several gene families, including HOX genes, which are master transcriptional regulators with diverse roles in carcinogenesis (61), frizzled genes, a family of G protein-coupled receptor proteins that serve as receptors in the Wnt signaling pathway (62), and FK506 binding proteins, which have been shown to affect a wide range of cellular processes, including protein folding, receptor signaling, and apoptosis (63). Interestingly, the GAGE family, including GAGE1, GAGE10, and GAGE12, are the most strongly upregulated genes found in our study. The GAGE gene family codes for a group of cancer-testis antigens (CT antigens) that is expressed only in testis under normal conditions but widely expressed in multiple types of cancer tissue (64).
TCGA analysis of newly identified upstream regulators and single factors: clinical relevance. The focus of our study was ultimately the identification of novel factors whose alterations can be seen both at early stages of transformation (represented by our model) and in actual patient-derived tumors, indicating a protumorigenic association. For this purpose, we chose several of the newly identified upstream regulators, their downstream targets, and some markedly changed single factors to examine their expression levels in RNA-Seq data from The Cancer Genome Atlas (TCGA). Besides cervical cancer (CESC) data sets, data sets for head and neck squamous cell carcinoma (HNSC), esophageal cancer (ESCA), different urogenital cancers (uterine, ovarian, and rectal carcinoma), and lung squamous cell carcinoma (LUSC) were included in order to evaluate our findings with respect to pan-cancer or HPV-specific aberrations. Figure 7a shows the RNA-Seq profiling of six identified upstream regulators from Fig. 5 in the above-mentioned cancer entities. The mean levels of CNOT7, PAX5, and SPDEF are slightly elevated in CESC samples compared with those in healthy tissue, though not significantly, while the level of TGM2 is significantly downregulated in cervical cancer. One possible explanation for this is that the prediction of the activation status of the upstream regulators may not necessarily be associated with the mRNA level. In the case of TGM2, it not only was predicted to be downregulated as an upstream regulator but also showed decreased mRNA levels, which is consistent with data from cancer patients. Remarkably, STAT1, which has been intensively reported by numerous studies to be downregulated by hrHPV oncogenes, is strongly upregulated not only in CESC but also in the majority of cancer types listed here. Similar results were observed for ISG15.
Further, we selected several genes that were predicted to be targets of the identified upstream regulators (Fig. 7b). VIM, MMP2, and COL5A are significantly downregulated in CESC, while CLDN7 is upregulated, which is consistent with our study. The levels of CDC25B and CCND1, however, are barely changed in CESC compared to levels in normal tissue. Interestingly, we noticed that the expression of MMP2, COL5A, and CLDN7 showed completely different tendencies in CESC and HNSC. While more than 90% of cervical squamous cell carcinomas are found to be HPV positive, the prevalence of HPV substantially varies for HNSC (65). We therefore analyzed the levels of MMP2, COL5A1, and CLDN7 in HPV-positive and HPV-negative HNSC samples to determine whether HPV status plays a role in the expression of these genes (Fig. 7c). For both MMP2 and COL5A, the mRNA levels in HPV-positive tumor samples were significantly reduced, whereas the levels of CLDN7 were increased. This finding indicates that depending on the HPV status of HNSC tumors, expression of these genes is either comparable or contrary to their expression in cervical cancers, suggesting that at least for MMP2, COL5A1, and CLDN7, the transcriptional changes might be dependent on HPV in vivo.
Finally, TAGLN (Fig. 6), which was identified as an important downregulated factor with a potential tumor suppressor function, is significantly downregulated in almost all cancer types listed here (Fig. 7d). AURKB, HOXB7, KYNU, and LCP1 are strongly upregulated in CESC, with the first two also upregulated in multiple other cancer types. The expression of TAGLN and AURKB was also analyzed in HPV-positive and HPV-negative HNSC samples (Fig. 7c). In HPV-positive HNSC, the expression of TAGLN is significantly decreased, while AURKB is strongly upregulated, suggesting that HPV might be respon-sible for gene alterations that are important in cancer development in general. GAGE12H was found to be strongly upregulated in our system; however, TCGA data show that throughout the tumor samples, there are only a few which display a strong upregulation of GAGE12H across all cancer types, indicating random or patientdependent alterations.

DISCUSSION
Although the impact of HPV infection on host cells has been studied intensively, the mechanisms underlying virus-induced transformation are still not completely understood. The vast number of host factors that are deregulated in response to viral gene expression makes reliable model systems indispensable for a bottom-up analysis to further investigate in vitro findings with clinical relevance at a mechanistic level. Depending on the differentiation status of the host cell and the viral life cycle, tremendous differences in transcriptome and proteome profiles can be expected and have to be taken into account when choosing a model system. Several omics studies were performed with either freshly infected keratinocytes in a raft system (26) or Cancer type abbreviations shown in red and green suggest significant (q Ͻ 0.01) up-or downregulation of the mentioned genes in tumor samples. Four-way ANOVA, using sex, age, ethnicity, and disease state (tumor or normal tissue) as variables, was used for calculating differential expression. TPM, transcripts per million; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; HNSC, head and neck squamous cell carcinoma; ESCA, esophageal carcinoma; UCS, uterine carcinosarcoma; UCEC, uterine corpus endometrial carcinoma; OV, ovarian serous cystadenocarcinoma; READ, rectum adenocarcinoma; LUSC, lung squamous cell carcinoma. (a) Profiling plots of upstream regulators identified in this study. (b) Profiling plots of selected target genes of predicted upstream regulators. (c) Box plots show genes that are differentially expressed in HPV-positive HNSCs from those in HPV-negative HNSCs. Analysis was done with the UCSC Xena browser using GDC TCGA head and neck cancer data sets. HPV status was determined by p16 testing. The log 2 (fkpm-uq plus 1) values, where fkpm-uq is the number of fragments per kilobase of transcript per million mapped reads upper quartile, of all samples were used for differential analysis. HPV-positive HNSCs, n ϭ 31; HPV-negative HNSCs, n ϭ 74. Statistics was done by unpaired t tests with Welch's correction. **, P Ͻ 0.001; ****, P Ͻ 0.00001. (d) Profiling plots of isolated genes whose expression showed significant changes in our RNA-Seq analysis. long-term adapted cervical carcinoma lines obtained from patients (22,66). In general, viral integration, which precedes malignant transformation, always abolishes the productive virus life cycle. Subsequently, the predominant expression of the oncogenes E6 and E7 becomes the main driving force behind host cell transformation. Therefore, we used freshly established HPV16 E6-and E7-expressing immortalized human oral keratinocytes and combined quantitative proteome and transcriptome analyses ( Fig. 1 and  2) to provide a more complete view on the impact of HPV oncogene expression on host cells preceding and putatively contributing to tumorigenesis.
We followed the workflows outlined in Fig. 2, and 290 proteins were found to be significantly altered in HPV-positive cells; of these, 155 were also differentially expressed in RNA-Seq analyses (Fig. 3). This suggests a substantial role of transcriptional regulation in the cellular changes caused by HPV16 E6E7. Alterations of protein levels that are not reflected in RNA-Seq analyses might be the result of either low mRNA steady-state levels, with fluctuations diminishing significance, or posttranslational modifications.
One major challenge in such global approaches is to find common central upstream regulatory networks that ultimately control clusters of downstream genes. Pathway analysis platforms like IPA provide a solution to this aim by putting data into the context of previous and current research. Based on DE genes identified here, IPA predicted marked activity changes in 44 putative upstream regulators (Fig. 4) that were associated with 14 to 32% of all deregulated factors (see Table S3 in the supplemental material). For instance, while MITF, SPDEF, CNOT7, BTK, PAF1, and PAX5 seem to be novel key regulators in the context of HPV-induced carcinogenesis, they are apparently already known to play essential roles in other cancer entities (Fig. 4). This is in agreement with the concept of multistep carcinogenesis stating that tumor viruses mostly target regulatory proteins that are also functionally affected in tumors without any viral etiology (12). Tumorviruses as evolutionary sophisticated entities-independently of their origin and tissue tropism-always attack central hubs of a cellular network, a concept referred to as "local-impact hypothesis" (49).
Many of the identified upstream regulators are directly or indirectly controlled by each other, revealing a network with a hierarchical structure (Fig. 5). Within this network, the transcription factors MITF, SPDEF, and CNOT7 stand out because they are predicted to directly regulate other cellular key factors, like STAT1 and HIF1␣ (55). Specifically, CNOT7 binds and inhibits the antiproliferative protein BTG1, downregulates innate immune responses (91), and serves as an upstream regulator of STAT1 (54). SPDEF is usually expressed in prostate epithelium, and higher expression levels are associated with prostate cancer and cancers of the brain, lung, breast, and ovaries (67). MITF regulates melanocyte differentiation and is known to activate genes with essential roles in cell differentiation, proliferation, and survival, which is also the case in our analysis (Fig. 5a) (68). The last two findings are interesting because these genes are usually expressed in cell types other than keratinocytes and may therefore represent promising candidates for further studies in the context of HPV-induced carcinogenesis. Of important note is that several of the connections identified by IPA are based on single reports in only one cell line. It is therefore indispensable to view these predicted associations and regulations with caution when considering them for future studies.
Concerning the biological status of cells expressing HPV16 oncogenes, pathway analysis ( Fig. 6 and Table 1) showed that, apart from known physiological changes, like a decrease in innate immune responses and increased cell cycle progression and differentiation, pathways associated with cell-cell contact and movement were especially enriched, a finding that is consistent with a recent study using raft cultures with cervical keratinocytes (50). Cell adhesion and cell migration were predicted to be decreased to similar extents, which is possibly rooted in the fact that genes involved in these pathways show a strong overlap. Furthermore, oncogene expression is predicted to affect cell shape and possibly favor epithelial mesenchymal transition (EMT) at later stages of virus-induced transformation. Of special interest is the additional prediction of a senescence-associated secretory phenotype (SASP). The SASP has been shown to play a paradoxical role in cancer formation. Recent studies showed that it contributes to tumor-promoting inflammation, an immunosuppressive microenvironment, and EMT, thus acting as a driving force of malignancy in premalignant cells (69)(70)(71)(72).
We identified genes and gene families that might inhibit or drive virus-induced carcinogenesis that are distinct from highly interactive networks of factors (Table 2; Fig. S1b). CPPED1, also known as CSTP1, has been reported to be involved in blocking the cell cycle, promoting apoptosis, and suppressing tumor growth as a potential tumor suppressor (57). Its downregulation, furthermore, leads to improved glucose metabolism (73). Here, CPPED1 was dramatically downregulated (up to a log 2 fold change of less than Ϫ5) at both the mRNA and protein level. Similarly, GPRC5A and TAGLN, two reported potential tumor suppressors in lung cancer (58,59), were also observed to be downregulated after expression of HPV16 oncogenes. While it is known that HPV16 can modulate host tryptophan metabolism, a key metabolic event contributing to immune escape by upregulating the immunoregulatory molecule indoleamine 2,3-dioxygenase (IDO 1) (74), much less is known about the role of kynureninase (KYNU), another key enzyme in the kynurenine pathway. LCP1, a member of the actin-binding protein family of plastins is important for the activation of human peripheral blood T lymphocytes (75) and was recently reported to be upregulated and to serve a critical role in oral squamous cell carcinomas (76). The upregulation of factors like LCP1 (usually expressed in lymphocytes) and members of the GAGE cancer/testis antigen family are intriguing. Aberrant occurrences of such factors in keratinocytes can putatively serve as diagnostic markers or therapeutic targets when they are expressed in tumor tissues, which has recently been tried in GAGE-expressing cancers (64).
To assess the clinical relevance of our observed findings, we chose several validated factors from Fig. 5 and Table 2 and compared them to transcriptome data from TCGA (Fig. 7). We tried to provide a comparison of our results with patient data in order to identify factors that have not previously been described in the context of HPV oncogene expression but were also frequently altered in tumors of different origins, defining them as likely candidates promoting tumorigenesis. Here, the same tendency in expression changes of TGM2, MMP2, CLDN7, HOXB7, and LCP1 can be found in different cancer types, including cervical and head and neck squamous cell carcinoma. However, not all selected DE genes were significantly altered in patients. Immunity-associated genes, like STAT1, whose downregulation by HPV has been reported (77,78), were found to be upregulated in almost all analyzed cancer tissues. The virus-induced inhibition of the immune system is a known prerequisite for viral persistence, and immunological surveillance also strongly depends on the anatomic site (30). After the productive life cycle is aborted upon viral integration, cells that undergo malignant transformation can subsequently display a chronic inflammatory phenotype, which can be seen as a general hallmark of cancer (17). Furthermore, network interactions can also be affected by the amount of E6/E7 oncoproteins within the respective host cell and may explain why transcriptomes of infected cells with high copy numbers differ from host cells, harboring only a few copies of integrated HPV (25,26). Another possible explanation for genes with inconsistent changes within TCGA data might be due to variations among individuals. This is prominently visible for GAGE12H, which was upregulated by several hundredfold in our study but is not significantly altered in data from TCGA (Fig. 7d). Its expression is strongly increased in some tumors, whereas in others, the mRNA levels are nearly undetectable. Factors like these might therefore be interesting candidates in personalized medicine.
Notably, several genes whose expression changes in our in vitro model also appear to be linked to HPV positivity in patient samples (Fig. 7c). Upon comparison of CESC and HNSC, an opposing change in the expression levels of certain genes was apparent. In contrast to HPV-negative HNSC samples, HPV-positive tumors showed the same tendency as cervical cancers with respect to the expression of COL5A, MMP2, CLDN7, TAGLN, and AURKB, supporting their possible contribution to virus-induced carcinogenesis. This is consistent with a previous study where the transcriptomes of more than 4,400 tumors and 19 different types of cancer were analyzed (79). For instance, TAGLN is a tumor suppressor that is downregulated in multiple cancers, and its loss seems to be an early event in cell transformation, coinciding with alterations in cellular plasticity (80). The latter fits the numerous changes that we observed in factors involved in cell shape and organization, as well as cell-cell contact, including COL5A, MMP2, and CLDN7. Upregulation of CLDN7 has been found in various types of cancer, suggesting that its deregulation by HPV might be part of an early event during cell transformation (81,82).
In conclusion, using a highly integrative approach allowed the identification of novel HPV-induced cellular changes that are also reflected in cancer patients, providing a basis for future studies in both basic and translational research.
Lentivirus production and transduction of NOK cells. HEK293T cells were cotransfected by lentiviral expression plasmids, packaging plasmid psPAX2 (Addgene catalog number 12260), and enveloping plasmid pMD2.G (Addgene catalog number 12259; gift from Didier Trono) using 25-kDa linear polyethylenimine (Polysciences). Forty-eight hours after transfection, the supernatant was collected, filtered, and applied to NOKs in the presence of 10 g/ml Polybrene (Santa Cruz). Virus-containing medium was removed after 24 h, and cells were kept in medium supplemented with 1 g/ml puromycin (Sigma-Aldrich) for selection until all untransduced control cells were dead. Only cells within 10 passages after selection were used for experiments.
Protein extraction and Western blotting. Monolayer cells were washed with 1ϫ phosphatebuffered saline (PBS), harvested in 1.25ϫ Laemmli buffer (78 mM Tris, pH 6.8; 2.5% SDS; 6.25% glycerol; 0.125% bromophenol blue; 2.5% ␤-mercaptoethanol), and immediately incubated at 99°C for 5 min and then placed on ice. Samples were treated with 100 U/ml Benzonase (Millipore) for 5 min at room temperature, and protein concentration was determined with a NanoDrop spectrophotometer (Thermo RNA extraction and reverse transcription. RNA was extracted using the RNeasy minikit (Qiagen) according to the manufacturer's instructions. RNase-free DNase Set (Qiagen) was used to remove residual DNA for downstream applications. RNA concentrations were determined by NanoDrop spectrophotometry, and 1 g of RNA was reverse transcribed with a RevertAid real-time reverse transcription kit (Thermo Fisher Scientific) and oligo(dT) 22 primers (Thermo Fisher Scientific) according to the manufacturer's instructions.
PCR. Semiquantitative RT-PCRs were performed using DreamTaq Green DNA polymerase (Thermo Fisher Scientific) according to the manufacturer's instructions. Quantitative real-time PCR (qPCR) was performed on a CFX96 Touch real-time PCR detection system (Bio-Rad) using iTaq Universal SYBR Green supermix (Bio-Rad) according to the manufacturer's instructions. Samples used for qPCR were generated from cells seeded on three different days. The sequences of all primers used in this study are listed in Table S1.
Omics Analysis of HPV16-Positive Keratinocytes 215 nm. The spraying voltage was 1.75 kV, and the heated capillary temperature was 275°C. The mass spectrometer operated in the positive-ion mode by performing survey MS (at 350 to 1,650 m/z) and data-dependent tandem MS (MS/MS) scans on the 10 most intense precursors, with a dynamic exclusion window of 30 s. MS scans were acquired with a resolution of 70,000 from 10 6 accumulated charges; the maximum fill time was 100 ms. The intensity threshold for triggering MS/MS was set at 5 ϫ 10 4 for ions with a z of Ն2 and an isolation window of 1.6 Da. Normalized collision energy for high-cell-density (HCD) fragmentation was 27 units. MS/MS spectra were acquired with a resolution of 17,500 from 10 5 accumulated charges; the maximum fill time was 100 ms.
Protein identification and quantification. Raw data sets were processed by MaxQuant (v1.6.0.1) coupled with the Andromeda search engine (89). Data were searched against a database constructed by merging the reference proteome sets of Homo sapiens (downloaded from UniProt, accession number UP000005640, November 2017), human papillomavirus types 16 and 18 (accession numbers UP000106302 and UP000009109, respectively, November 2017), and Chlamydia trachomatis (accession number UP000050023, March 2017). The MaxQuant-implemented database was used for the identification of common contaminants. The identification and quantification parameters of MaxQuant were set as follows: the mass tolerance for the first search was 20 ppm, that for the second search from recalibrated spectra was 4.5 ppm (with individual mass error filtering enabled), the maximal charge per peptide (z) was 7, the minimal peptide length was 7 amino acids, the maximal mass of the peptide was 4,600 Da, the fixed modification was carbamidomethylation of cysteine, variable modifications were oxidation of methionine and acetylation of the protein N terminus, the maximum number of variable modifications was set to 5 per peptide, and digestion with trypsin/P was with a maximum of 2 missed cleavages. Mass tolerance for fragments in MS/MS was 20 ppm, with the 12 most intensive peaks per 100 Da taken for the search. The minimal Andromeda score for modified peptides was 40, and the minimal delta score for modified peptides was 6. The false-discovery rate (FDR) estimation of peptide identification was based on a target decoy approach using a reverted-search database. An FDR filter of 0.01 was applied to the peptide spectrum match level. For peptide quantitation, Arg plus 6[ 13 C 6 ] and Lys plus 4[D 4 ] were set for the medium channel and Arg plus 10[ 13 C 6 15 N 4 ] and Lys plus 8[ 13 C 6 15 N 2 ] were set for the heavy channel. The requantify function was enabled. Normalized ratios were used for the calculation of labeling site occupancies. For protein quantification, only protein groups with at least 2 unique peptides with a SILAC ratio were considered. The light, medium, and heavy SILAC labels were swapped to make different combinations in the three replicates. For the purpose of MaxQuant, the fractions from basic-pH LC separation were set as one experiment.
Proteomics data analysis was performed with Perseus (v1.5.3.0) after the protein group files generated by MaxQuant (90) were uploaded. Available annotations were uploaded from the Homo sapiens database (release date, 20 June 2015). Proteins identified only by site and reverse database hits, as well as potential contaminants, were removed. The replicate samples were grouped, and 1/x transformation was applied when necessary. The normalized ratios were subsequently log 2 transformed. Proteins with fewer than two values in at least one group were removed, and missing values were imputed from a normal distribution around the detection limit. A one-sample test was performed. A P value smaller than 0.05 was used as a threshold to identify proteins whose levels were significantly altered.
Network and functional analysis. The regulatory networks and functional analyses were generated through the use of IPA (Qiagen Inc.).
Analysis of selected data sets in GEO. Data from Gene Expression Omnibus (GEO) (accession numbers GSE92496, GSE38467, and GSE109039) were used to generate Table S4 and Table S6 for comparison of studies. For Table S4, the average log 2 fold change of genes from accession number GSE92496 (duplicates) were calculated and compared to the significantly deregulated gene lists from the present study, accession number GSE38467, and accession number GSE109039. For the analysis of upstream regulators, the activation scores for GSE92496 were taken directly from its publication (48), while it was calculated by the upstream regulator prediction function in IPA for GSE38467 and GSE109039.
TCGA data analysis. Differential analysis for selected genes in eight different cancer types was performed by gene expression profiling interactive analysis (GEPIA; http://gepia.cancer-pku.cn/) (99) using log 2 (TPM plus 1) values of TCGA tumor samples with paired adjacent TCGA normal samples and GTEx normal samples. Four-way analysis of variance (ANOVA), using sex, age, ethnicity, and disease state (tumor or normal) as variables, was used for calculating differential expression. Differential-expression analysis for HPV-positive HNSC samples compared to HPV-negative HNSC samples was performed with the UCSC Xena browser (https://xenabrowser.net/) with the GDC TCGA Head and Neck Cancer data set (100). HPV status was determined by p16 testing. The log 2 (fkpm-uq plus 1) values of all samples were used for differential analysis. Statistics was performed by unpaired t tests with Welch's correction.
Statistics. Statistics for qPCR results was performed using GraphPad Prism 6 (v6.02) with the unpaired t test with Welch's correction. Perseus software (v1.5.3.0) was used for statistical analysis of proteomic data as indicated above. RNA-Seq statistical analysis was carried out using either DESeq2 or Sleuth as mentioned earlier. For TCGA data, statistics was performed using either GEPIA or the UCSC Xena browser by unpaired t tests with Welch's correction.
Data availability. The RNA-Seq data provided in this study have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO series accession number GSE124357. The mass spectrometry proteomics data can be accessed at the ProteomeXchange Consortium via the PRIDE (101) partner repository at https://www.ebi.ac.uk/pride/archive/. The data set identifier is PXD012186.

ACKNOWLEDGMENTS
We acknowledge the high-throughput sequencing unit in the Genomics and Proteomics Core Facility (GPCF), German Cancer Research Center (DKFZ), for providing the Illumina sequencing. We thank Daria Fijałkowska, Shuai Ni, Lin Wang, and Yi Pan for their support in data analysis. We thank http://www.somersault1824.com for use of illustrations. E