Regulation of Yeast-to-Hyphae Transition in Yarrowia lipolytica

Many yeasts undergo a morphological transition from yeast-to-hyphal growth in response to environmental conditions. We used forward and reverse genetic techniques to identify genes regulating this transition in Yarrowia lipolytica. We confirmed that the transcription factor Ylmsn2 is required for the transition to hyphal growth and found that signaling by the histidine kinases Ylchk1 and Ylnik1 as well as the MAP kinases of the HOG pathway (Ylssk2, Ylpbs2, and Ylhog1) regulates the transition to hyphal growth. These results suggest that Y. lipolytica transitions to hyphal growth in response to stress through multiple kinase pathways. Intriguingly, we found that a repetitive portion of the genome containing telomere-like and rDNA repeats may be involved in the transition to hyphal growth, suggesting a link between this region and the general stress response.

IMPORTANCE Many yeasts undergo a morphological transition from yeast-to-hyphal growth in response to environmental conditions. We used forward and reverse genetic techniques to identify genes regulating this transition in Yarrowia lipolytica. We confirmed that the transcription factor Ylmsn2 is required for the transition to hyphal growth and found that signaling by the histidine kinases Ylchk1 and Ylnik1 as well as the MAP kinases of the HOG pathway (Ylssk2, Ylpbs2, and Ylhog1) regulates the transition to hyphal growth. These results suggest that Y. lipolytica transitions to hyphal growth in response to stress through multiple kinase pathways. Intriguingly, we found that a repetitive portion of the genome containing telomere-like and rDNA repeats may be involved in the transition to hyphal growth, suggesting a link between this region and the general stress response. KEYWORDS Yarrowia, dimorphic, genomics, molecular genetics, morphology, signaling M any fungi harbor the ability to grow in either a yeast, pseudohyphal, or hyphal form (1). Morphological plasticity allows fungi to adapt to and invade new environments in response to external conditions. This trait, while essential for fungi in natural environments, can be problematic for their use in industrial settings, such as cultivation in bioreactors. The morphological switch between yeast and hyphal growth can be initiated by nutritional, pH, temperature, and osmolarity cues (2)(3)(4)(5). Industrial utilization of dimorphic yeasts presents a particular challenge, as maximum economic efficiency demands that bioreactors be run at high temperature and osmolarity using low-quality nutrients, all of which may initiate the switch to hyphal growth.
Dimorphism is common in many species of ascomycete yeasts and has been most thoroughly studied in the genetic model Saccharomyces cerevisiae and the closely related opportunistic pathogen Candida albicans where the switch to hyphal growth is important for infection (6). Environmental signals controlling hyphal growth regulate specific genetic outputs through kinase cascades and calcium signaling pathways. The adenylate cyclase Cyr1p is required for hyphal growth in yeasts (7,8) and signals through protein kinase A (PKA) to the transcription factor Efg1p to promote the yeast-to-hyphae transition (9,10). Two mitogen-activated protein kinase (MAPK) cascades integrate signals from different sources to position and regulate filamentous growth in yeasts. The kinase Ste20p responds to the GTPase Cdc42p and activates the Ste11p/Ste7p/Kss1p MAPK cascade to control polarized growth and bud site selection (5,11,12), while the Ssk2p/Pbs2p/Hog1p MAPK cascade responds to osmotic and oxidative stress in S. cerevisiae and C. albicans and regulates the yeast-to-hyphae transition in both species (10,13,14).
Yarrowia lipolytica is a model industrial ascomycete yeast distantly related to S. cerevisiae and C. albicans (15). The yeast-to-hyphae transition in this species has been examined by proteomics and transcriptomics (16,17) and has given clues to the proteins involved. The transition is regulated by a number of transcription factors, including those encoded by znc1 (18), tec1 (19), hoy1 (20), and the histone deacetylase complex component gene sin3 (21). The Y. lipolytica msn2 (Ylmsn2) homolog (originally identified as mhy1 in Y. lipolytica) is critical for the yeast-to-hyphae transition and is positively regulated by the kinase Rim15p which itself is repressed by the Tor nitrogen signaling pathway (22,23). As in other yeasts, Ras GTPases (Ras1p and Ras2p) are essential for the dimorphic transition and also likely signal through the transcription factor Msn2p (24,25).
In this study, we isolated strains of Y. lipolytica that fail to undergo the yeast-tohyphae transition. These smooth colony mutants do not form hyphae in a bioreactor, making them more amenable as industrial bioproduction hosts. We characterized the mutations present in the mutants obtained and mutations that promote the transition to hyphal growth in a smooth strain to further elucidate the signaling pathways regulating dimorphic growth in Y. lipolytica.

RESULTS
Isolation of Y. lipolytica mutants lacking filamentous growth. Y. lipolytica strain FKP355 was passaged to allow accumulation of mutations and screened for lack of filamentous growth from large colonies. Small slow growing colonies often did not produce hyphae or did so only under certain conditions or after an extended period of time. Approximately 500,000 colonies were screened from which 65 mutants were isolated that did not appear to make hyphae. After isolation, these mutants were further tested for filamentous growth after 2 weeks of incubation on YNB, YNB150, and YPD agar (see Materials and Methods), as well as YPD and YNB150 liquid medium for microscopic analysis. From those mutants, five smooth mutants (smooth-17, smooth-18, smooth-19, smooth-33, and smooth-43) were identified that did not undergo transition to hyphal growth morphology under any of the conditions tested ( Fig. 1). Many of the original 65 isolates produced short invasive hyphae into the agar or grew slowly and were not considered further in this work. Approximately 100,000 colonies from each of the five mutants were screened for reversion to hyphal growth habit from the smooth phenotype. No revertants (Ͻ0.0002%) from any of the mutants were identified, confirming genetic stability.
Identification of mutations in Y. lipolytica mutants lacking filamentous growth. Each of the five mutants lacking filamentous growth and the wild-type parent (FKP355) were sequenced using Illumina paired-end 150-base-pair sequencing to an average depth of Ͼ13ϫ to identify the causative mutations. This initial search revealed few mutations limited to a single nucleotide polymorphism (SNP) affecting a tRNA in smooth-17, a deletion in gene Yali0F20592g in smooth-19, and a noncoding SNP in smooth-43. None of these candidate genes complemented the smooth phenotype when expressed from an autonomously replicating plasmid (data not shown). To better assess the mutants, genomic DNA from strain FKP355 was sequenced on the PacBio platform to a depth of 279ϫ, assembly and annotation of which are available at http://genome.jgi.doe.gov/Yarlip1/Yarlip1.info.html. Using this assembled genome allowed us to search for gaps in read coverage in the mutants and resulted in identification of deletions in smooth-17, smooth-33, and smooth-43 strains. Interestingly, the deletions are in the same general location near the end of scaffold 14 in all three of these smooth mutants ( Fig. 2A). Analysis of the mutated region of scaffold 14 revealed that it ends in an array of polymorphic 5=-TTAGTCAGGG-3= tandem DNA repeats previously described as the telomere repeat sequence in Y. lipolytica (26). Exceptionally high sequencing read depth at this locus suggests that it is highly repetitive and underrepresented in the genome assembly. We therefore sought to explore the possibility of alternative assemblies of the DNA at the end of scaffold 14 to better understand the composition of this mutated locus.
We hypothesized that the length of the repetitive DNA present at the end of scaffold 14 is much longer than the ϳ800 bp represented in the genome assembly and used long PacBio sequencing reads from strain FKP355 to test this. We identified 3,786 reads ranging from 117 to 29,910 bp in length (average of 3,548 and median of 2,331 bp) that aligned to a unique portion of the genome near the end of scaffold 14. This subset of reads was assembled using Canu (27) to assess the minimum length and content of the repetitive DNA adjacent to the end of the unique part of scaffold 14 without assembly interference from additional repetitive reads from different loci. From these reads, seven alternative contigs to scaffold 14 were assembled that mapped to a variety of scaffolds within the reference genome, confirming the repetitive nature of the locus. We aligned the 150-bp Illumina sequencing reads from strain FKP355 and the five smooth mutants to these new contigs and identified mutations. Interestingly, six out of FIG 1 Isolation of Y. lipolytica mutants that lack filamentous growth. Approximately 500,000 colonies were screened for smooth morphology with no visible hyphae. From strain FKP355, five mutant strains were isolated that exhibit growth only as yeast (FKP500 to FKP504). The leu2-270 mutation was complemented in strain FKP503 to construct FKP514 and confirm the phenotype in a prototrophic strain. Confocal microscopy confirmed yeast phase growth and lack of elongated cells or pseudohyphae in auxotrophic and prototrophic smooth strains. seven of these alternative contigs harbor mutations in at least one of the smooth strains ( Fig. 2B and C), while the seventh is a complete assembly of the ribosomal DNA (rDNA) locus (18S, 5.8S, and 28S rRNA) (28) with adjacent 5=-TTAGTCAGGG-3= tandem repeats. These results suggest that all five smooth mutants harbor mutations in a related locus with short tandem repeats and rDNA repeats.
Repeat analysis of the FKP355 genome and smooth mutants. Given that the mutations identified in the smooth strains affected tandem repetitive DNA, we decided to more thoroughly assess the repetitive DNA content of the FKP355 genome. To avoid biases from the genome assembly process, we again examined the DNA directly in 150-bp Illumina sequencing reads from strain FKP355 for the presence of tandem repeats to identify and define all the telomere-like repetitive sequences present in this strain. All possible tandem duplications of unit size 1 to 75 bp were quantified in the raw sequencing reads, and the copy number of each repeat was estimated as follows:

Copy number ϭ
Times found ϫ Genome size Total reads ϫ ͑ Read length Ϫ Repeat length ϩ 1 ͒ The most overrepresented tandem repeat sequences identified in the FKP355 genome correspond to the 5=-TTAGTCAGGG-3= 10-mer found at the end of scaffold 14 as well as derivations on a 5=-TTGACGAGGCAC-3= 12-mer on its own and in combination with Raw PacBio reads with homology to the single-copy region at the end of scaffold 14 (from 1 to 12 kb) were reassembled and analyzed for mutations not detected from the curated genome assembly. An example of an alternative assembly of the region detects a deletion in smooth-19 not seen in the reference assembly. (C) All five smooth mutants exhibit a different polymorphism rate than the wild-type rate at a transition point between a high-copy-number transposoncontaining region and a moderate-copy-number region of short, tandem repeats.
a 5=-TTGACGAGGCGCGTGC-3= 16-mer (Fig. 3A). A number of low-copy-number polymorphic variations on these repeat sequences were also identified. Long PacBio sequencing reads from strain FKP355 containing tandem duplications of the 5=-TTAG TCAGGG-3= repeat unit were identified and aligned to the FKP355 genome assembly to identify additional repetitive and/or single-copy loci adjacent to this repeat array but found the end of scaffold 14 as the only nonrepetitive assembled portion of the genome adjacent to a 5=-TTAGTCAGGG-3= repeat array. This result suggests that either a single large 5=-TTAGTCAGGG-3= repeat array is present in the genome or that additional 5=-TTAGTCAGGG-3= repeat arrays are present but bordered by alternative unassembled repetitive DNA sequences consistent with subtelomere structural arrangements in yeast (29) and humans (30). We identified individual long PacBio reads from strain FKP355 containing each of the short tandem 10-to 40-bp repeat elements as well as those mapping to the assembled rDNA locus to test how often different repeat sequences cooccurred. Roughly 11% of the long reads with a tandem 10-mer 5=-TTAGTCAGGG-3= repeat also mapped to the rDNA locus, while a higher percentage of the reads with a 5=-TTGACGAGGCAC-3= derived tandem repeat (49% of the 12-mer, 52% of the 28-mer, and 32% of the 40-mer) also mapped to the rDNA locus. Together, these results suggest that short tandem repetitive DNA is interspersed with the rDNA repeats.
We hypothesized that changes in the repetitive DNA content of the genome might underlie the smooth phenotype. Thus, the number of Illumina sequencing reads containing each of the different repeat units was assessed in the wild-type strain and each of the smooth mutants to quantify repetitive DNA content in a reference genome agnostic manner (Fig. 3B). All the smooth mutants have a decrease in short tandem repetitive DNA content with the greatest losses in the smooth-17 and smooth-33 mutants. These two mutants present a similar deletion when mapped to the FKP355 reference genome (Fig. 2). The number of reads mapping to the rDNA locus was also assessed, as there appears to be at least some rDNA that is genetically linked to the end of scaffold 14 as well as the 10-mer 5=-TTAGTCAGGG-3= tandem repeats. All the smooth mutants have relatively fewer reads that map to the rDNA locus in a ratio similar to that of the short tandem repeat sequences (Fig. 3C). This suggests that the rDNA and the short tandem repeats together make up a repetitive part of the genome that is lost in the smooth mutants. We unsuccessfully attempted to reconstruct these complex mutations by transforming the wild-type parent (FKP355) with resistance marker constructs designed to randomly replace large tracts of repetitive DNA (data not shown). Thus, while the loss of repetitive DNA in the smooth mutants is intriguing, it has not been verified to be the cause of the smooth phenotype.
Transcriptome analysis of a smooth mutant. We compared gene expression from a prototrophic smooth-33 mutant (FKP514) to a prototrophic wild-type strain of the same genetic background (FKP391) in chemostat culture to assess the effect on gene expression. Differentially expressed genes were analyzed for enrichment of Gene Ontology terms to assess specific biological processes perturbed in the smooth-33 mutant (Table 1). Genes associated with DNA replication and repair as well as transcriptional regulation are more highly expressed in the smooth-33 strain, while genes associated more generally with signaling, as well as membrane and cell wall biochemistry are downregulated. The promoter regions of differentially expressed genes were analyzed for enrichment of short DNA motifs to identify regulatory pathways acting through sequence-specific DNA-protein interactions. Genes upregulated in the smooth-33 mutant are enriched for 5=-ACGCG-3= motifs in their promoters, while genes downregulated in the smooth-33 mutant are enriched for 5=-CCCCT-3= motifs in their promoter region (E value Ͻ 0.05). We assessed the differential expression levels of genes with zero or more of these motifs near the transcription start site to confirm a specific effect on gene expression (Fig. 4). The presence of 5=-ACGCG-3= near the transcription start site has a slight positive effect on expression level in the smooth-33 mutant. This is primarily associated with the presence of no less than two 5=-ACGCG-3= sites within 200 bp 5= and 1,000 bp 3= of the transcription start site. The presence of 5=-CCCCT-3= both 5= and 3= of the transcription start site is associated with a large negative effect on the expression level in the smooth-33 mutant in a manner that increases with the number of 5=-CCCCT-3= sites. We searched the Jaspar core fungal motifs database (31) for proteins that are known to interact with either of these motifs. A number of transcription factors from S. cerevisiae have been identified that interact specifically with the 5=-CCCCT-3= DNA motif via their C2H2 zinc finger domain(s). These transcription factors include Msn4p, Rgm1p, Rei1p, Rph1p, Msn2p, Gis1p, Com2p, and Usv1p (E value Ͻ 1). Comparison of these factors with proteins encoded by the Y. lipolytica genome (32) identified four C2H2 zinc finger domain-containing homologs ( Table 2). The 5=-ACGCG-3= motif interacts with the cell cycle regulator proteins Mbp1p, Swi6p, and Swi4p in S. cerevisiae (E value Ͻ 1) via an APSES DNA interaction domain (33). Comparison of these proteins with proteins encoded by the Y. lipolytica genome (32) identified two homologs ( Table 2). We attribute the presence of fewer genes to the whole-genome duplication event in S. cerevisiae which generated many paralogs represented by a single gene in Y. lipolytica (34).
Reverse genetics screen. We hypothesized that downregulation of genes with 5=-CCCCT-3= promoter motifs in the smooth-33 strain is controlled by a C2H2 zinc finger transcription factor. Of the four transcription factors predicted to bind this motif in Y. lipolytica, one (JGI protein ID 143137; Ylmsn2) is very significantly downregulated (Table 2), which suggests that it may be an activator that is failing to regulate genes important for the yeast-to-hyphae transition in the smooth-33 mutant. To test this, we overexpressed Ylmsn2 using a constitutive promoter in a smooth-33 strain and deleted it in the wild-type parent used for the mutagenesis screen. We found that overexpression of Ylmsn2 restores hyphal growth in the smooth-33 mutant, while deletion of FIG 4 Effect of smooth-33 on expression of genes with specific DNA motifs near their transcription start site. The number of ACGCG and CCCCT motifs on each strand of DNA was determined (from 0 to 2 sites) between the transcription start site (labeled 0) and a given distance. The given distances shown are 200 to 2,000 bp in 200-bp intervals, both up-and downstream of the transcription start site. For each interval, the average difference in expression between FKP514 (smooth-33) and FKP391 (wild type) during chemostat cultivation is shown. Note that the presence of more CCCCT motifs close to the transcription start site is generally associated with decreased expression in the smooth-33 mutant, while the presence of more than one ACGCG site very near and 3= of the transcription start site is associated with increased expression in the smooth-33 mutant. Ylmsn2 results in loss of hyphal growth in wild-type Y. lipolytica, confirming its important role in regulation of this process and promotion of hyphal induction when expressed (Fig. 5). We hypothesized that upregulation of genes with 5=-ACGCG-3= promoter motifs in the smooth-33 strain is controlled by Ylswi6 (JGI protein ID 13938) and Ylmbp1 (JGI protein ID 129847), which form a complex that regulates the G 1 /S phase transition in S. cerevisiae (35). Both of these genes are significantly upregulated in the smooth-33 mutant ( Table 2), suggesting promotion of the G 1 /S transition during yeast phase growth. We hypothesized that lower expression of these important cell cycle regulators in wild-type strains is associated with the transition to hyphal growth. To test this, we overexpressed Ylswi6 or Ylmbp1 using a constitutive promoter in the wild-type parent strain and deleted them in the smooth-33 strain. We found that deletion of Ylswi6 or Ylmbp1 restores some hyphal growth in the smooth-33 mutant, while overexpression of Ylswi6 or Ylmbp1 results in reduced hyphal growth in wild-type Y. lipolytica, confirming that these genes play a role in regulation of the yeast-to-hyphae transition process (Fig. 5).
Isolation of mutants reverting to hyphal growth in the smooth-33 background. The success of our reverse genetic screen suggested that we may be able to identify additional factors regulating the yeast-to-hyphae transition via a forward genetic screen. Prototrophic Y. lipolytica smooth-33 strain FKP514 was thus mutagenized with ethyl methanesulfonate (EMS) and plated on YNB agar plates to screen for colonies reverting to hyphal growth typical of wild-type Y. lipolytica on YNB. Approximately 500,000 colonies were screened, but no mutants were found that had reverted to a colony morphology typical of the wild type. However, 100 mutants were isolated that did not make completely smooth colonies. These mutants often appeared ruffled as colonies and upon microscopic observation appeared to have elongated cells and/or hyphae around their margins.
Identification of mutations promoting the yeast-to-hyphae transition in the smooth-33 background. Twenty-eight of the hyphal mutants were sequenced using Illumina paired-end 150-bp sequencing and compared to the FKP355 reference genome to identify causative mutations. This initial search identified many genes with nonsynonymous mutations. Five genes were identified with nonsynonymous mutations in more than one mutant strain (JGI protein IDs 113409, 140296, 127631, 122144, and 109080), indicating that these genes are likely to be either the causative mutation or present at a hypermutable locus. The screen also identified four genes (JGI protein IDs 124736, 128138, 131882, and 129277) hit in only one mutant that are implicated in  Table 4. the yeast-hyphal transition in other species and present in a mutant with a low background mutation rate, indicating that they are likely to be the causative mutation (summarized in Table 3). Eight of the mutant strains had many nonsynonymous mutations, making prediction of a likely causative mutation difficult.
Five of the high-confidence gene hits appear to be homologous to genes in the high-osmolarity glycerol response (HOG) MAPK signaling pathway of S. cerevisiae. We recovered three independent alleles of the MAPK kinase kinase YLssk2, two independent mutants with the same allele of the MAPK kinase Ylpbs2, and one mutant with a premature stop mutation in the MAPK Ylhog1 ( Fig. 6 and Table 3). In addition, we identified mutations in two genes with similarity to the sln1 histidine phosphotransfer kinase, which regulates the HOG MAPK cascade in S. cerevisiae (36,37). Further investigation into the structure of the mutated genes within the context of the histidine kinase gene family in Y. lipolytica revealed that proteins 113409 and 109080 (JGI protein IDs) are not orthologous to the sln1/ssk1 two-component regulator (38) known to regulate the HOG MAPK cascade in S. cerevisiae. Rather, they represent proteins not found in S. cerevisiae that are orthologous to the nik1 and chk1 genes of C. albicans respectively (39, 40) (Fig. 7A). In C. albicans, both the histidine kinases nik1 and chk1, as well as the sln1 ortholog are involved in hyphal formation (41). Disruption of any of these genes impairs hyphal formation, while double disruption of sln1 or nik1 in combination with chk1 partially restores hyphal formation (41). We disrupted Ylchk1 in both the wild-type and smooth-33 genetic background to assess its function in Y. lipolytica and to partially validate the results of the genetic screen. While Ylchk1 is not required for hyphal formation, deletion in the smooth-33 background partially restores hyphal formation consistent with the results obtained for the Ylchk1 point mutants (Fig. 8).
The mutations isolated in Ylnik1 are nonrandomly distributed (Fig. 7B). In Ylnik1, all five mutations occur within a series of HAMP domain repeats (42). These repeats are associated with fungicide sensing (43)(44)(45)(46) and mutation of the HAMP domain in bacterial receptor histidine kinases is associated with constitutive activation (47)(48)(49). The very specific site of the five mutations present in Ylnik1 from amino acids 342 to 598 (Table 3) and the lack of any putative nonfunctional mutations (e.g., premature stop codons or kinase functional domain mutations) suggests that Ylnik1p may be constitutively activated in these mutants and that the hyphal phenotype is caused by constitutive signaling rather than loss of function.
Three genes were recovered in single mutant strains known to be involved in morphogenesis in S. cerevisiae, including hym1 (50), the GTPase-activating protein lrg1 (51), and the tyrosine phosphatase mih1 (52). Four independent mutations were found in the endochitinase cts1 in strains with weaker colony morphology phenotypes typical of hyphal growth. Close observation revealed the morphology phenotype was due to a cell separation defect, as has been found in S. cerevisiae (53), rather than a switch to hyphal growth. These mutants were not considered further in this work.

DISCUSSION
Development of yeast strains that do not switch between yeast and hyphal growth is critical for the utilization of fungi in reproducible bioprocesses. In this work, we isolated five spontaneous Y. lipolytica mutants that grow only in the yeast phase and do not form hyphae when cultivated on solid agar or in liquid medium in flasks or during bioreactor cultivation (Fig. 1). These mutants, which we named smooth mutants, were screened for rapid growth and nonreversion of the phenotype to identify strains useful for genetic engineering efforts toward the production of biofuels and chemicals with a Y. lipolytica host chassis. Genomic analysis of the smooth mutants revealed that all share mutation of a repetitive locus resulting in loss of short repetitive telomere-like DNA and rDNA repeats (Fig. 2 and 3). Transcriptome analysis of a selected mutant (smooth-33) revealed specific DNA motifs in the promoter regions of up-and downregulated genes (Fig. 4). These short DNA motifs implicated specific regulatory proteins important for maintenance of the yeast form which we confirmed by deletion and overexpression analysis (Fig. 5).
Our analysis identified the homolog of the stress response regulator Ylmsn2 as a primary regulator of the yeast-to-hyphae transition in smooth mutants. This gene, previously identified as mhy1 in Y. lipolytica (23) is essential for the yeast-to-hyphae transition (Fig. 5) and activates gene expression in response to general stresses by binding to the stress response element 5=-CCCCT-3= (23,54). Loss of signaling through Ylmsn2 in the smooth strains and frequent hyphal growth in the wild-type strain suggests that our typical laboratory growth conditions (YNB medium at 28C) are stressful to Y. lipolytica. We observe more frequent initiation of hyphal growth on medium with a higher C/N ratio and less hyphae with a rich nitrogen source like peptone, which implicates nitrogen quantity and quality in this response (55). Msn2p is regulated by the TORC1-Sch9-Rim15 signaling pathway in Y. lipolytica (22), suggesting nutrient availability may be the inducer of hyphal growth in these experiments.
All five smooth strains exhibit what appear to be similar mutations in a poorly assembled, repetitive region of the genome represented by the end of scaffold 14 in the parent strain genome assembly (http://genome.jgi.doe.gov/Yarlip1/Yarlip1.home .html) (Fig. 2). Scaffold 14 ends in tandem 5=-TTAGTCAGGG-3= repeats characteristic of Y. lipolytica telomeres (26); however, the mutation in some of the smooth strains, particularly the smooth-43 strain, extends into the unique portion of scaffold 14 and initially alerted us to examine the repetitive DNA content of the wild-type and smooth strains. An unbiased search for short tandem repeats confirmed that the 5=-TTAGTCA GGG-3= repeats are common and identified additional repetitive tandem DNA sequences present in the Y. lipolytica genome that are likely to constitute the telomere and subtelomere regions. Analysis of long PacBio reads found that many of these short repeats are bordered by rDNA repeats (28), and all repeat types are lost in similar quantity within each of the five smooth mutants (Fig. 3). Copy number variation in the rDNA repeats has been reported in filamentous fungi and yeasts and affects general physiological parameters, such as growth rate (56)(57)(58)(59) and in C. albicans is associated with morphological mutants (60).
The complete mechanism governing loss of filamentous growth in the smooth mutants remains unclear. Our results indicate that expression changes in the smooth strains are governed primarily by reduced activation of genes with stress response elements by the transcription factor Msn2p (54,61). Activation of the general stress response via Msn2p occurs through phosphorylation of the transcription factor by PKA and nuclear localization (62)(63)(64) and is dependent on cAMP signaling in response to a variety of nutritional and environmental stresses (65). We found that cell cycle progression genes are upregulated in the smooth-33 mutant and that disruption of either component of the G 1 /S transition-promoting MBF complex (Mbp1p/Swi6p) (66, 67) conferred a sporadic low-level return to filamentous growth (Fig. 5). Together, these results suggest that the loss of repetitive telomeric and ribosomal DNA repeats is reducing signaling via the general stress response and promoting cell cycle progression.
We performed a forward genetic screen for reversion to hyphal growth in a prototrophic smooth-33 strain to better understand the signaling occurring in response to the loss of repetitive telomeric and ribosomal DNA at the smooth locus (selected mutant phenotypes in Fig. 6). From this screen, 28 mutants were sequenced by high-throughput sequencing, and interestingly, we did not identify strains with mutations in Ylmbp1 or Ylswi6. This suggests that the screen was not exhaustive for recovery of mutants with a sporadic reversion phenotype, as we sequenced the subset with the strongest hyphal phenotype maintained in all colonies after passaging and replating. Examination of the mutations in these strains implicates the histidine kinases Ylnik1 and Ylchk1 as well as the core components of the HOG MAPK cascade (Ylssk2, Ylpbs2, and Ylhog1) in regulation of the yeast-to-hyphae transition in Y. lipolytica. In C. albicans, nik1 and chk1 are required for normal hyphal growth (41). Here we found that Ylchk1 is not required for hyphal growth (Fig. 8). The mutations recovered in Ylnik1 all occur only in the sensory HAMP domain (Fig. 7). Deletion of the HAMP domain in C. albicans nik1 strain results in constitutive signaling as well as phosphorylation and activation of Hog1p (68). No mutations predicted to be nonfunctional were recovered in Ylnik1, suggesting that reversion to hyphal growth is due to constitutive or altered activation of this kinase.
Nik1p and Chk1p represent the common type III and type X histidine kinases that govern morphogenesis and enable pathogenicity in many fungi (69). Localization studies in Candida guilliermondii found that unlike the membrane-localized type VI histidine kinase, Sln1p, Nik1p, and Chk1p both localize to the cytosol and nucleus (70). While both Nik1p and Chk1p have been demonstrated to respond to general stresses by signaling to downstream targets, their method of sensing stresses has not been determined. Genetic studies have found that the nonkinase domains are required for sensing stresses, but further work is needed to determine how extracellular stresses alter the structure and activity of these cytoplasmic proteins (70). These histidine kinases, particularly nik1, are implicated in activation of the HOG pathway. However, it remains to be determined whether this affect is direct (e.g., through phosphorylation of HOG MAPK components) or a result of stress caused by their constitutive signaling.
In summary, we examined Y. lipolytica mutants that do not transition to hyphal morphology under conditions relevant to industrial production of biofuels and commodity chemicals. We identified mutations in the repetitive DNA of these strains that reduce signaling through the general stress response pathway via an unknown mechanism. Reversion to hyphal growth is possible in these mutants via signaling or lack thereof by Ylmsn2, the HOG MAPK cascade components (Ylssk2, Ylpbs2, and Ylhog1), and the histidine kinases encoded by Ylnik1 and Ylchk1. This work builds upon our understanding of the dimorphic transition in Y. lipolytica and confirms that the pathways regulating this morphological switch are conserved with other ascomycete yeasts. How the loss of repetitive DNA reduces the msn2-mediated stress response remains an enigma. Eleven of the mutants recovered in the reversion to hyphal growth screen warrant further analysis and may shed light on the connection between the loss of repetitive DNA and reduction of the stress response.

MATERIALS AND METHODS
Yeast cultivation and forward genetic screens for nonhyphal mutants. All Y. lipolytica strains used in this study (Table 4) were maintained in YNB (1.7 g/liter yeast nitrogen base without amino acids and ammonium sulfate but with 20 g/liter glucose and 5 g/liter ammonium sulfate) or YPD (10 g/liter peptone, 10 g/liter yeast extract, 20 g/liter glucose) liquid medium at 28°C and 200 rpm unless otherwise noted. Auxotrophs were supplemented with 0.1 g/liter leucine when appropriate. Frozen stocks were maintained at Ϫ80°C in 15% glycerol. To isolate smooth mutants, Y. lipolytica strain FKP355 was passaged daily in YPD for 2 weeks to allow accumulation of mutations and plated at a density of 10,000 cells per plate on YNB agar plates. Plates were incubated 72 h at 28°C to allow development of colonies. Large Reference genome sequencing and assembly. Genomic DNA and RNA were isolated from Y. lipolytica strain FKP355 (55) using a yeast genomic DNA purification kit (AMRESCO, Solon, OH) and TRIzol reagent (Invitrogen, Carlsbad, CA), respectively. One microgram of DNA was sheared to 10 kb using the g-TUBE (Covaris). The sheared DNA was treated with DNA damage repair mix followed by end repair and ligation of SMRT adapters using the PacBio SMRTbell Template Prep kit (PacBio). The SMRTbell templates were then purified using exonuclease treatments and size selected using AMPure PB beads. Sequencing primer was then annealed to the SMRTbell templates, and Version P6 sequencing polymerase was bound to them. The prepared SMRTbell template libraries were then sequenced on a Pacific Biosciences RSII sequencer using Version C4 chemistry and 4-h sequencing movie run times. Filtered subread data were assembled together with Falcon version 0.4.2 (https://github.com/PacificBiosciences/FALCON) to generate an initial assembly. Mitochondria were then assembled separately using the corrected preads with Celera version 8.3 and subsequently polished with Quiver. It was then used to remove mitochodrial data from the preads. A secondary Falcon assembly was generated using the filtered preads with Falcon version 0.4.2 and polished with Quiver version smrtanalysis_2.3.0.140936.p5 (https://github.com/Pacific Biosciences/GenomicConsensus). The final genome assembly was annotated using the JGI Annotation Pipeline (73).
Stranded cDNA libraries were generated using the Illumina Truseq Stranded RNA LT kit. mRNA was purified from 1 g of total RNA using magnetic beads containing poly(T) oligonucleotides. mRNA was fragmented and reverse transcribed using random hexamers and SSII (Invitrogen) followed by second strand synthesis. The fragmented cDNA was treated with end pair, A-tailing, adapter ligation, and eight cycles of PCR. The prepared Illumina libraries were quantified using KAPA Biosystem's next-generation sequencing library qPCR kit and run on a Roche LightCycler 480 real-time PCR instrument. The quantified libraries were then multiplexed with other libraries, and the pool of libraries was then prepared for sequencing on the Illumina HiSeq 2500 sequencing platform utilizing a TruSeq paired-end cluster kit, v4, and Illumina's cBot instrument to generate a clustered flow cell for sequencing. Sequencing of the flow cell was performed on the Illumina HiSeq2500 sequencer using a TruSeq SBS sequencing kit, v4, following a 2 ϫ 100 indexed run recipe.
Transcriptome raw fastq file reads were evaluated for artifact sequence using BBDuk (https:// sourceforge.net/projects/bbmap/), raw reads by kmer matching (kmer ϭ 25), allowing 1 mismatch and detected artifact was trimmed from the 3= ends of the reads. RNA spike-in reads, PhiX reads, and reads containing any Ns were removed. Quality trimming was performed using the phred trimming method set at Q6. Finally, following trimming, reads under the length threshold were removed (minimum length 25 bases or 1/3 of the original read length, whichever is longer). Filtered fastq files were used as input for de novo assembly of RNA contigs. Reads were assembled into consensus sequences using Trinity (ver. 2.1.1) (74) with the -normalize_reads (In-silico normalization routine) and -jaccard_clip (Minimizing fusion transcripts derived from gene dense genomes) options. The assembled transcriptome was used for genome annotation and made available through the JGI fungal genome portal MycoCosm (http:// genome.jgi.doe.gov/Yarlip1/Yarlip1.home.html).
Yeast strain construction. Transformations were performed by the lithium acetate method (83), and transformants were selected on YNB agar. PCR products were amplified using Q5 DNA polymerase (New England Biolabs, Ipswich, MA) and custom primers (Table 5). DNA fragments were purified using a GeneJET purification kit (Thermo Fisher Scientific, Waltham, MA). leu2-270 was complemented in FKP503 (smooth-33) by transformation with full-length leu2 after PCR amplification using primer pair OKP443/444 to construct strain FKP514. Integration at the leu2-270 locus was confirmed by PCR. Ylmsn2, Ylmbp1, Ylswi6, and Ylchk1 were replaced with a leu2 ϩ nutritional marker. Briefly, 1-kb regions flanking each gene were amplified from FKP355 genomic DNA using Q5 DNA polymerase and primers designed with overhangs homologous to the leu2 gene (amplified with primers OEB544/545) from Y. lipolytica genomic DNA (primer pairs OEB487/548, OEB549/490, OEB493/550, OEB551/496, OEB499/552, OEB553/502, OEB846/847, and OEB848/849). The fragments were purified using a GeneJET purification kit (Thermo Fisher Scientific, Waltham, MA) and assembled into full-length deletion cassettes with leu2 using NEBuilder HiFi assembly kit or as split marker deletion cassettes with internal leu2 primers OEB4 and OEB575. Deletion cassettes were transformed into strain FKP355 or FKP503 as appropriate. Replacement of genes with leu2 ϩ was confirmed by PCR. Deletion and overexpression strains were characterized on YNB agar at 28°C. Transcriptome analysis. Samples for transcriptome analysis were collected from steady-state chemostats, frozen in liquid nitrogen, and stored at Ϫ80°C. Total RNA was extracted with TRIzol (Invitrogen, Carlsbad, CA, USA) following the manufacturer's instructions with additional mechanical disruption of the cells using a FastPrep homogenizer (MP Biomedicals, Santa Ana, CA, USA) and 1-mm silica beads. Further RNA preparation and RNA sequencing were performed by SciLifeLab in Uppsala, Sweden, using their IonTorrent platform. Raw RNA-seq reads were aligned to the Y. lipolytica genome using Bowtie (76), and counts were obtained with HTSeq (84) and transformed using voom (85). The top 1,000 genes with the greatest positive and negative fold change values from the FKP514 versus FKP391 transcriptome comparison were analyzed for enrichment of Gene Ontology terms using FunRich (86). The 500-bp promoter region of the top 1,000 genes with the greatest positive and negative fold change values from the FKP514 versus FKP391 transcriptome comparison were analyzed for enrichment of specific sequence motifs using DREME (87). Identified motifs were compared to the Jaspar core fungal motifs database (31) using Tomtom (88) to identify candidate regulators.
Microscopy. For confocal microscopy, live cells were collected and immediately visualized using a Zeiss LSM710 confocal laser-scanning microscope (Carl Zeiss MicroImaging GmbH, Munchen, Germany) with a Plan-Apochromate 100ϫ/1.4 oil objective. All images were processed using ImageJ (89). For colony morphology, cells were imaged on a VWR Stereo Zoom Trinocular microscope fitted with a Canon EOS 6D DSLR camera, and images were processed with Adobe Photoshop.
Data availability. Sequence data from the whole-genome shotgun project for Y. lipolytica FKP355 have been deposited at DDBJ/ENA/GenBank under accession number PKSB00000000. The version of sequence data described in this paper has accession number PKSB01000000. Sequence data for the Y. lipolytica smooth strains (FKP355 and FKP500 to FKP504) have been deposited at NCBI SRA under