Metagenomic Discovery of 83 New Human Papillomavirus Types in Patients with Immunodeficiency

Although some members of the viral family Papillomaviridae cause benign skin warts (papillomas), many human papillomavirus (HPV) infections are not associated with visible symptoms. For example, most healthy adults chronically shed Gammapapillomavirus (Gamma) virions from apparently healthy skin surfaces. To further explore the diversity of papillomaviruses, we performed viromic surveys on immunodeficient individuals suffering from florid skin warts. Our results nearly double the number of known Gamma HPV types and suggest that WHIM syndrome patients are uniquely susceptible to Gamma HPV-associated skin warts. Preliminary results suggest that treatment with the drug plerixafor may promote resolution of the unusual Gamma HPV skin warts observed in WHIM patients.

Preliminary evidence based on three WHIM patients treated with plerixafor, a leukocyte mobilizing agent, suggest that longer-term therapy may correlate with decreased HPV diversity and increased predominance of HPV types associated with childhood skin warts. IMPORTANCE Although some members of the viral family Papillomaviridae cause benign skin warts (papillomas), many human papillomavirus (HPV) infections are not associated with visible symptoms. For example, most healthy adults chronically shed Gammapapillomavirus (Gamma) virions from apparently healthy skin surfaces. To further explore the diversity of papillomaviruses, we performed viromic surveys on immunodeficient individuals suffering from florid skin warts. Our results nearly double the number of known Gamma HPV types and suggest that WHIM syndrome patients are uniquely susceptible to Gamma HPV-associated skin warts. Preliminary results suggest that treatment with the drug plerixafor may promote resolution of the unusual Gamma HPV skin warts observed in WHIM patients.
KEYWORDS epidermodysplasia verruciformis, gammapapillomaviruses, metagenomic, next-generation sequencing, plerixafor, skin swabs, WHIM syndrome T he Papillomaviridae family is a large and diverse family of nonenveloped, 8-kb double-stranded circular DNA viruses named for their ability to cause skin warts (papillomas) in specific vertebrate hosts (1). Human papillomaviruses (HPVs) are divided into five genera designated Alphapapillomavirus, Betapapillomavirus, Gammapapillomavirus, Mupapillomavirus, and Nupapillomavirus (for brevity, the name of each genus has been abbreviated to Alpha, Beta, Gamma, Mu, and Nu, respectively, in this article). A group of nearly two dozen genus Alpha HPV types, including HPV16 and HPV18, preferentially infect mucosal epithelia. Although these "high-risk" types generally do not cause visible mucosal lesions, they can cause carcinomas of the cervix, anus, oropharynx, and other sites in a minority of chronically infected individuals (2). Other Alpha types, such as HPV6, cause benign genital warts, while Alpha HPVs 2, 27, and 57 cause skin warts in areas such as the palms of the hands. In most individuals, detectable Alpha HPV infections are transient, typically clearing within a period of months.
Individuals with a rare group of genetic immunodeficiencies collectively known as epidermodysplasia verruciformis (EV), including but not exclusively those with a deficiency in EVER1 or EVER2 proteins (3,4) typically present with flat warts across various skin surfaces as the sole clinical manifestation. The lesions are caused by genus Beta HPV types, such as HPV5 and HPV8, that normally infect the hair follicles and skin of most healthy adults without causing warts. In EV, dysplastic wart-like lesions can progress to squamous cell carcinoma (SCC). Although it is well established that Beta HPVs, together with ultraviolet (UV) light, are carcinogenic in EV patients, their proposed causal role in SCC in the general population remains inconclusive (5,6). Animal model studies support the concept that Beta HPVs might play a so-called "hit-and-run" role in SCC development (7)(8)(9).
Patients with warts, hypogammaglobulinemia, infections, and myelokathexis (WHIM) syndrome suffer from an extremely rare genetic immunodeficiency caused by various gain-of-function mutations in the C-X-C chemokine receptor gene CXCR4 (4,10). Recently, plerixafor (AMD3100, Mozobil), a CXCR4 antagonist, has been used to treat WHIM patients, resulting in partial responses in the outward appearance of skin warts (11,12,48). The distribution of HPV types has been previously described for a few WHIM syndrome patients and includes Alpha (HPV2, -6, and -11) and Beta (HPV5 and -23) types (13,14).
Although some HPV types in the genus Gamma transiently cause skin warts, others are associated only with subclinical infection. Healthy adults typically shed low levels of one or a few Gamma HPV types from apparently healthy skin and mucosal surfaces. Recent deep sequencing studies have revealed highly divergent new Gamma types, some of which appear to be associated with SCC (15).
It is known that the E6 and E7 proteins of high-risk Alpha papillomaviruses differ in their affinity to degrade p53 or bind Rb by an order of magnitude compared to low-risk Alpha papillomaviruses or other genera (reviewed in reference 16). While these differences help to elucidate why some viruses have a more aggressive profile than others, they do not fully elucidate the underlying mechanisms that lead to different clinical manifestations for the other genera. A recent study, however, highlights how some differences in HPV biology lead to drastically different outcomes. By studying EV-like patients who lacked EVER1/2 mutations and instead had CIB1 (calcium and integrin binding protein 1) deficiencies, de Jong and colleagues uncovered a role for the CIB1-EVER1-EVER2 complex on HPV-related disease (17). In healthy individuals, the CIB1-EVER1/2 complex represses Beta HPVs, but Alpha, Gamma, and Mu papillomaviruses utilize their E5 and E8 proteins to counteract this repression. Additional unidentified repression mechanisms in keratinocytes probably account for the lack of clinical manifestations of non-Beta HPVs in most healthy individuals. In patients with CIB1, EVER1, or EVER2 mutations, the Beta HPVs, which lack E5, are able to replicate very efficiently in the absence of a functional CIB1-EVER1-EVER2 complex. Although the biology of the Gamma genus is poorly understood, some differences have been noted such as the loss of E6 in the Gamma-6 species of HPVs (18), and the ability of the E6 of some Gamma types, such as HPV197, to interact with tumor suppressor proteins, such as TP53.
In the present study, we used recently developed virion enrichment and deep sequencing methods to survey HPV sequences in human skin. We sampled patients with several immunodeficiencies associated with persistent warts (WHIM syndrome, EV, Gata2, etc.), as well as healthy individuals with or without warts or other types of skin lesions.

RESULTS
Identification of known taxa in samples. A total of 125 samples, including skin swabs, biopsy specimens, and whole-blood samples, were obtained from immunocompromised patients, patients with skin conditions not known to be associated with warts (lichen sclerosus [LS] and acne inversa [AI]), and healthy volunteers (see Table S1 in the supplemental material). Samples were treated with detergent and nuclease to reduce the amount of nonviral nucleic acids, and virions were separated from the lysate by ultracentrifugation through Optiprep gradients (19). Viral DNA was extracted from gradient fractions, amplified by random-primed rolling circle amplification (RCA), and sequenced on the Illumina MiSeq platform.
In swabs from immunologically normal individuals (healthy and LS subjects), bacterial and human reads comprised the majority (range, 81% to 99%) of the sample, and no more than 12% of the reads were attributable to HPVs ( Fig. 1 and Table S1). In contrast, HPV sequences comprised the majority of the reads in all EV patients (97% to 99%). In a majority of WHIM patient samples, HPVs were also highly abundant, accounting for up to 94% of the sample. Merkel cell carcinoma (MCC) patient samples (n ϭ 7) varied greatly in their HPV content, with some patients having no HPV reads, and one patient having 87% HPV reads. Unexpectedly, only one of the swabs from MCC patients (M292) had a detectable amount of Merkel cell polyomavirus (MCPyV), but the 172 reads with identity to MCPyV were well below the 2,000-read confidence threshold.
Identification of known taxa in wart biopsy samples or scrapings. We compared the HPV profile for all available wart biopsy samples from six immunologically normal individuals (children) versus wart biopsy samples or lesions from WHIM patients, a WILD (warts, immunodeficiency, lymphedema, anogenital dysplasia) patient, and an idiopathic CD4 lymphopenia patient (ICL) (Fig. 2 and Table S1). Five out of six warts from immunologically healthy individuals showed a single HPV type, and the remaining sample showed two HPV types. HPVs in warts from healthy children were exclusively types known to be common in transient skin warts (Mu HPV1; Alpha HPV2, and HPV57) (20). Five of the 10 wart and tissue biopsy specimens from WHIM patients had more than one HPV type. In addition to the expected common wart-causing types found in healthy individuals, WHIM-associated tissues also contained abundant Beta and Gamma HPVs, as well as various human polyomaviruses (HPyV6, HPyV10, STLPyV, and TSPyV). Polyomaviruses were usually present in relatively low abundance, except for one case where HPyV6 was dominant. We had only one WILD patient and one ICL patient for which biopsy specimens were available (only two biopsy specimens for the same ICL patient had detectable HPV reads), so it is difficult to draw a conclusion. However, it is notable that the WILD patient sample possessed the typical childhood wart type (HPV57), while the ICL patient sample had more unusual Alpha and Gamma types.
To compare the proportion of HPV and HPyV reads obtained by the two methods of sampling, we performed a Mann-Whitney U test on 52 swabs and 14 biopsy specimens/scrapings from WHIM patients. There were statistically significantly more HPV and HPyV reads obtained from biopsy samples/scrapings (P ϭ 0.043 and P ϭ 0.022, respectively) than from swabs. The proportions of human, bacterial, plasmid, and other viral reads were not significantly different between the sampling methods. The methodological difference was even more pronounced in healthy individuals, where HPV accounted for 99.9% of the reads in biopsy samples (n ϭ 6), whereas in swabs (n ϭ 8), HPV sequences constituted only 2.4% of the reads (P Ͻ 0.0001).
Identification of known taxa in blood. A comprehensive recent report on the DNA virome of 8,240 healthy blood donors found that HPV and HPyV viremia is rare (21). We observed similar results for blood samples from eight WHIM patients. No HPV sequences were identified, and only two samples had low levels of polyomavirus sequences (HPyV7 and JCPyV). Bacterial and bacteriophage sequences accounted for different fractions of reads in the blood samples. In three blood samples, the predominant observed sequences were torque teno viruses (TTVs) (family Anelloviridae). Since the MiSeq technology used for this work does not appear to be compatible with sequencing the GC-rich hairpin at the TTV origin of replication, the complete genomic sequences of these novel TTV isolates will be the subject of a future study.
Identification of novel viral types and species. HPVs are typed according to nucleotide alignments of their L1 coding sequences. New HPV types are Ͻ90% identical to their nearest neighbor within a species, whereas members of new species diverge from those of their nearest neighbor species by Ͼ30% (1). We were able to identify the complete genomic sequences of 83 previously unknown HPV types (GenBank submission performed on 2 August 2017), 69 of which occupy the highly diverse genus Gamma (Fig. 3). Four of the new types represent new species. Incomplete genomic sequences of an additional 35 potentially novel Gamma types (including some possible new species) were also observed. No representatives of new papillomavirus genera (L1 with Ͻ60% identity to the nearest neighbor) were found. Since this discovery method patients) (B) were assessed for the presence of distinct HPV types. Genera are depicted in different colors, with slices of the same color representing different types in the same genus. Blue numbers inside the donut represent the total number of viral types for the sample. In some cases, such as W03, some of the types represented a very small fraction of the whole, and wedges are not discernible. Black numbers on the donut correspond to the HPV type or isolate designation (letters are used to denote not-yet-assigned new HPV types), or human polyomavirus (HPyV) species number. For example, Wart I contained a single type, HPV1 from the Mupapillomavirus genus (Mu). For clarity, only the larger predominant graph segments are annotated with HPV type designations. Some patients had biopsy specimens of multiple warts, and each is shown individually. has previously been used to detect extremely divergent papillomaviruses and polyomaviruses associated with fish (22) (unpublished results), it seems unlikely that any previously undiscovered HPV genera would have been missed at the bioinformatic level.
Of the 83 novel genomes that are complete, 49% were from WHIM patients, 19% were from EV patients, 12% were from healthy individuals, 10% were from MCC patients, and the remainder were from HIV, DOCK8, and GATA2 patients. Thirty-seven of the novel HPVs (33 complete genomes and 4 incomplete genomes) occurred in more than one sample. Nine were present in different samples from the same individual, while the remaining 28 were associated with samples from more than one patient or volunteer. For example, the Gamma species 5 isolate w20c04 was initially observed in sample 47 (a WHIM patient) as well as in different anatomical sites and dates from the same person, once each in two other WHIM patients hailing from different continents (samples 75 and 86), and once in the DOCK8-deficient patient.
A potential pitfall of massive parallel sequencing is that contig assembly can result in the artifactual assembly of chimeric genomes, particularly when closely related viral strains are present in the same sample. To address this issue, we created a second phylogenetic tree based on E1 protein sequences (Fig. 3B). The E1 and L1 protein phylogenetic trees show essentially identical topology for our novel sequences. The results indicate that artifactual chimerization between early and late genome segments does not appear to have occurred in the current study.
Using the approach described above, we attempted to detect additional examples of new species within known vertebrate DNA virus families of interest. Contigs from each sample were probed using a low-stringency Blastx alignment against a custommade library consisting of highly conserved proteins from each selected viral family. This approach detected gemycircularvirus-like sequences and a known species of human poxvirus, molluscum contagiosum virus (Table S1).
Predominant HPV genera in swabs from immunodeficient or diseased patients. In addition to assessing the abundance of HPV reads that we observed for five disease conditions examined (MCC, EV, GATA2, DOCK8, and WHIM patients), we assessed HPV genus diversity. Swabs from various anatomical sites from diseased or immunodeficient patients taken at a single time point often showed multiple HPV types (24 out of 31 samples where HPV was detected), with up to 17 types in both MCC and EV patients and up to 26 types in WHIM patients ( Fig. 4 and Table S1). In contrast, swabs from only two healthy individuals (TVMBSF and TVMBSH) had enough HPV reads to reach our threshold of 2,000 reads, and they had one or two HPV types, respectively (Table S1). In EV and MCC patients, members of the genus Beta generally predominated. One exception occurred in an MCC sample that yielded an Alpha type originally discovered in a flat skin wart (23). In most WHIM patients (except in WG3, a WHIM patient from Italy), Beta HPVs were not dominant, although they were readily identified in several samples. In addition, WHIM patients tended to have a greater diversity of HPV types, with several samples containing members of three HPV genera. In WHIM patient samples, Gamma HPVs were often predominant, in some cases by the number of reads (W01, W07, W14, W20, W22, W24, W27, WG1, and WG2) and in other cases, by the number of different Gamma types (W11, W18, W20, W23, W34, and W35). Swabs from genital samples of WHIM patients (designated W#-Gen) were dominated by known mucosal Alpha types (Fig. S1).
For the single GATA2 and DOCK8 patients, the pattern of HPV types was closer to what was seen in WHIM patients than in EV patients, with few Beta types and more Alpha and Gamma types. The single sample swab from an ICL patient contained a Beta type.
Effects of plerixafor therapy on WHIM patients. Treatment of WHIM patients with plerixafor increases the absolute leukocyte count in peripheral blood by effectively mobilizing myeloid and lymphoid cells out of the bone marrow and possibly other storage sites into circulation. Preliminary evidence suggests that plerixafor may reduce the frequency of bacterial and viral infections in WHIM patients (12,24,25). HPV type analysis for three patients (W06, W20, and W27) who were treated with prolonged open label plerixafor in phase 1 clinical trial NCT00967785 (clinicaltrials.gov) is shown in Fig. 5 and Table S1. A detailed account of the clinical effects of the treatment is described in a separate article (reference 48, patient 1). The diversity of HPVs dramatically declined during the trial and corresponded to the reduction in the outward appearance of warts. The Gamma HPV predominance typical of WHIM patients became less evident and posttreatment profiles were more dominated by Alpha HPV types (such as HPV3, -6, -27, and -57). Prior to treatment, patient W20 had 13 to 18 different detected viral types in pooled specimens (Fig. 4, Fig. 5, and Table S1). After 7 to 18 months of treatment, the patient had either very low levels of HPVs (Ͻ2,000 [2K] reads) or only one to three Gamma HPV types (and, in one sample, BKPyV) depending on the swabbed site (Fig. 5). For patient W27, the first available sample was from 1 year posttreatment and 22 different viruses could be detected (Fig. 4), with the vast majority (95%) belonging to the Gamma genus. After an additional 4 months on treatment (Fig. 5), the samples showed a predominance of Alpha HPV types (59% to 100%, depending on the sample).

DISCUSSION
Discovery of novel HPVs has traditionally been accomplished through PCR using partially degenerate oligonucleotides designed to target conserved regions of the genome (26,27). Even as more HPV types have been added to primer design strategies, it remains difficult to make oligonucleotides that have adequate sensitivity across genera or within the highly diverse genus Gamma HPV (28). The advent of deep sequencing has provided a less biased approach for metagenomic discovery and has accelerated the discovery rate of new HPVs, including types that are not detected with "consensus" PCR (15,29,30). However, even studies utilizing deep sequencing have yielded only a few new HPV types with complete genomes (e.g., four types discovered by Arroyo Mühr and colleagues [15] and two types by Bzhalava [29]). Our current approach to papillomavirus discovery (19) provides another layer of sensitivity by physically enriching for encapsidated viral DNA (ultracentrifugation) followed by random-primed amplification of circular DNA through RCA. For skin samples from immunosuppressed individuals, the method often achieves deep sequencing runs that are dominated by HPV sequences. In some of our samples, both from wart biopsy samples and from skin swabs, more than 90% of reads mapped to HPVs. The methods seem to be able to discriminate between low and high virus burdens. The skin samples from patients with skin conditions not known to be associated with HPVs such as acne inversa and lichen sclerosus were not dominated by viral reads (Ͻ13% HPV reads) and did not have more than two HPV types present.
This study began as an exploratory exercise into the papillomavirus and polyoma- virus diversity in the skin of patients with immunodeficient conditions in order to detect novel members of each family. The rarity of some of these genetic conditions, as well as the laboriousness of the method resulted in limitations in this study. For example, it was difficult to obtain multiple samples from some of the conditions, such that Job syndrome and GATA2 and DOCK8 deficiency are each represented by a single sample. The fact that some samples (i.e., some healthy individuals) were pooled for convenience, while others were processed individually also complicates the analysis. Several diseases, including HIV infection, MCPyV-induced skin cancer, and inflammatory skin disease, all of which can result in susceptibility to papilloma-or polyomaviruses, were included simply based on availability of samples. The variability in sample number and body sites sampled added to the complexity in evaluation and interpretation of the data. Quantitative PCR can readily detect MCPyV DNA in skin swabs of healthy individuals (31) or the unaffected skin of a subset of MCC patients (32). While we detected a large amount of MCPyV in pooled samples from healthy individuals, MCPyV was surprisingly not detected in six of seven MCC patients and was only marginally detectable in the seventh MCC patient. The discrepancy between our findings and prior results from MCC patients could be due to differences in methodology or sample selection (not all the tumors from our MCC patients tested positive for MCPyV in the original diagnosis). In particular, it is unclear whether the virion enrichment/RCA method used in the present study is as sensitive as qPCR. In MCC, productive virions are not usually found, as the virus is integrated into the human genome, so our method would miss their detection. Our method is particularly suited to disease states where the HPV is not integrated such as is the case for Beta HPVs in EV (33); however, little is known about the integration status of Gamma HPVs even in cases where it is associated with cancer (15).
Our findings better define the cutaneous DNA virome associated with specific immunodeficiency states. WHIM patients are susceptible to a broad range of bacterial infections, whereas their susceptibility to viruses seems to be markedly skewed toward Gamma HPVs. In contrast, EV patients appear to be vulnerable to uncontrolled Beta HPV infection. Both types of patients may also be less likely to control some types of human polyomavirus infections. Although seropositivity against TSPyV is high in healthy adults, it is rare to detect the virus in the skin of healthy individuals (34, 35) (unpublished results). Several of our skin swabs/samples from WHIM patients were positive for TSPyV DNA, as well as STLPyV, HPyV6, or HPyV10 DNA.
It has been speculated that both SDF-1 and/or CXCR4, which are dysregulated in WHIM syndrome, affect the growth of keratinocytes (36). This might theoretically account for the apparent failure of WHIM patients to control Gamma HPV infections. Another hypothesis is that myeloid and lymphoid cells, including plasmacytoid dendritic cells, which express CXCR4 and are deficient in the blood in untreated WHIM patients, may be involved in the control of HPV infections (37). The reduction of overall HPV burden by the use of plerixafor in WHIM patients requires further investigation, but it is certainly intriguing to speculate about the mechanisms through which infections with this HPV genus are controlled. WHIM patients seem an ideal population in which to study this phenomenon.

MATERIALS AND METHODS
Patient population. Patients signed informed consent forms, and protocols were approved by Institutional Review Boards at the National Cancer Institute, National Institute of Allergy and Infectious Diseases, University of Eastern Piedmont in Italy, and Northwestern University.
A combination of archived and prospectively collected samples were compiled for this study (Table 1). Sixty patients with immunodeficiencies were included, 26 with WHIM syndrome, 3 with epidermodysplasia verruciformis (EV), 7 with Merkel cell carcinoma (MCC), 18 with HIV infection (pooled into one sample), 1 with GATA2 deficiency, 1 with DOCK8 deficiency, a lung aspirate from a patient with Job syndrome (autosomal dominant hyperimmunoglobulin E IgE syndrome), 2 with idiopathic CD4 lymphopenia (ICL), and 1 diagnosed with WILD (warts, immunodeficiency, lymphedema, anogenital dysplasia) syndrome. Although WILD syndrome is sometimes attributable to GATA2 deficiency, the patient in the present study was not tested for GATA2 gene deletions. Lesions from the EV patients from different time points and anatomical sites had been previously described (38)(39)(40). One of the ICL patients was originally described as EV-like and later reassigned to unclassified immunodeficiency, as described by Landini et al. (41); the Job syndrome patient was also previously described (42). Thirteen patients with other skin conditions in which skin-tropic viruses might hypothetically play a role (ten with lichen sclerosus [LS], three with acne inversa [AI]) were included for comparison. A pool of vaginal washes from six healthy volunteers was also processed and analyzed. Samples from 14 immunologically healthy adult volunteers, without visible warts, were also included for reference (samples for 6 of the 14 were analyzed individually, and samples from all 14 were analyzed as a pool) (43). In addition, six warts excised from immunologically normal children were analyzed. All samples were anonymized prior to testing.
Processing of samples. (i) Swabs. Cotton-tipped wooden swabs (Greiner Bio-One) were used to vigorously rub the skin in both affected areas and areas that appear healthy and not affected. Nuclease cocktail (Dulbecco's phosphate-buffered saline [DPBS], 10 mM MgCl 2 , 0.5% Brij-58, and 0.1% Benzonase; all from Sigma) (150 l) was added to each swab. Swab tips were placed inside an empty 2-ml fritted centrifuge column (Pierce). The samples were incubated for 15 min at room temperature and spun at 1,000 ϫ g for 5 min to recover the eluate. Columns containing the swabs were washed three times with 500 l of wash buffer (DPBS, 800 mM NaCl, 0.5% Brij-58), and the supernatants were combined and clarified by centrifugation at 5,000 ϫ g for 5 min. Samples were incubated with 1 U/ml of Clostridium perfringens neuraminidase (Sigma) for 15 min at 37°C and loaded onto a 27, 33, 39% iodixanol (Optiprep; Sigma) step gradient which was run at 234,000 ϫ g for 3.5 h at 16°C in an SW55Ti rotor (Beckman). The Optiprep portion of the gradient, including the interface with the swab eluate, was collected by puncturing the bottom of the tube and dividing the gradient into six fractions. The procedure is described in detail at our laboratory website (https://home.ccr.cancer.gov/Lco/).
(ii) Tissue. Tissue biopsy specimens and curettages weighing less than 300 mg were minced with a razor blade, transferred to siliconized 2-ml microcentrifuge tubes (BioPlas), and combined with 400 l of nuclease cocktail along with 1 U/ml neuraminidase. Material was further disrupted with a disposable plastic pestle (Fisher Scientific) and incubated at 37°C for 20 min with rocking. The homogenate was then treated with 1 mg of collagenase H (Sigma) overnight at 4°C. The sample was adjusted to 800 mM NaCl and clarified at 5,000 ϫ g. Clarified supernatant was transferred to a clean tube. The pellet was resuspended in DPBS/0.8 M NaCl and sonicated for three cycles of 30 s on a cup horn (Misonix). The sonicated homogenate was clarified as described above and combined with the previous supernatant, loaded onto an Optiprep gradient, and processed as described above for the swabs.
(iii) Blood. Whole blood (sodium citrate or heparin anticoagulated) was diluted 1:6 into nuclease cocktail and incubated for 30 min at 37°C. The sample was adjusted to 800 mM NaCl, clarified, incubated with neuraminidase, and then loaded onto an Optiprep gradient.
Rolling circle amplification. Rolling circle amplification (RCA) was performed as previously described (19). Briefly, 200 l of each fraction was treated with 50 l of digest buffer (250 mM Tris [pH 8], 125 mM EDTA, 2.5% SDS, 2.5% proteinase K [Qiagen], and 50 mM DTT) to extract DNA. The digestion products were incubated at 50°C for 15 min and heat inactivated at 72°C for 10 min. To precipitate the DNA, 125 l of 7.5 M ammonium acetate and 975 l of 95% ethanol were added, incubated at room temperature for one hour, and then incubated overnight at 4°C. After the samples were allowed to equilibrate at room temperature, they were centrifuged for 1 h at room temperature (44), washed with 70% ethanol, and air dried for 10 min. Ten l of sample buffer (TempliPhi; GE Healthcare) were used to resuspend the pelleted DNA, and the sample was denatured at 95°C for 3 min and then cooled before the addition of 10.2 l of reaction buffer/enzyme mix. The RCA reaction was performed for 24 h at 30°C. The reaction was inactivated at 65°C for 10 min. The DNA was precipitated again using 1/2 volume of 7.5 M ammonium acetate and 2.6 volumes of 95% ethanol, with the same incubation times as described above, and resuspended in 75 l of 2 mM Tris (pH 8) and 0.2 mM EDTA. Next-generation sequencing and bioinformatics. The Nextera XT DNA sample kit (Illumina) was used to prepare samples for sequencing on the MiSeq platform (Illumina) analysis as previously described (19). Reads were trimmed for adaptors and quality and were assembled into contigs using CLC Genomics Workbench 10.1.1 (Qiagen). Contigs were restricted to a minimum length of 300 nt. All contigs were analyzed via megablast (NCBI) for the presence of known sequences (Ͼ95% identity) using the most current nucleotide collection (nr/nt) database. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). Results from megablast were visualized utilizing Krona (45).
All contigs were queried against the protein sequence library utilizing tBlastn, and the resulting hits were reevaluated using blastn against GenBank to eliminate false-positive results. The bioinformatic pipeline described is available at https://github.com/BUCK-LCO-NCI/VirConTaxa. Contigs that were not eliminated with these criteria were further evaluated if they fulfilled one of the three following conditions. (i) The contig was at least 4,500 bp long. (ii) At least 2,000 reads mapped to the contig. (iii) Contigs were less than 90% identical to known HPV types. These criteria increased the possibility of finding novel full-length papillomavirus genomes with an average coverage of at least 33-fold over every nucleotide position (read length after quality trimming was at least 150 bp). The raw reads used to generate the contigs passing the filtering criteria were mapped back to their respective contigs to validate the completion of the circular genome (i.e., to identify individual reads that overlapped both ends of the candidate genome). When genomes were found to be incomplete, raw reads for that sample were mapped back to the ends of the viral contig to identify overhanging reads that could be used to extend and complete the viral genome. A valid extension of the viral contig required a minimum of 10ϫ read coverage identically matching at least 40 bp of the contig end and extending beyond that end at least 15 bp. Additionally, 100% identity across the candidate sequence extension was required for all supporting reads. The candidate sequence extension meeting these criteria was added to the end of the contig, and this process was iteratively performed with each newly extended contig until the aforementioned criteria for a complete, circular genome was met or no sequence passing these criteria was found.
Phylogenetic trees. Phylogenetic trees were constructed using at least one representative from each known HPV species. Trees were created as reported previously (22) using the Phylogeny.fr website http://www.phylogeny.fr/ without Gblocks in "One Click" mode (46,47). Trees were displayed using FigTree software 1.4.3 http://tree.bio.ed.ac.uk/software/figtree/.