Adaptation in a Fibronectin Binding Autolysin of Staphylococcus saprophyticus

Staphylococcus saprophyticus is an important cause of urinary tract infections (UTI) in women; such UTI are common, can be severe, and are associated with significant impacts to public health. In addition to being a cause of human UTI, S. saprophyticus can be found in the environment, in food, and associated with animals. After discovering that UTI strains of S. saprophyticus are for the most part closely related to each other, we sought to determine whether these strains are specially adapted to cause disease in humans. We found evidence suggesting that a mutation in the gene aas is advantageous in the context of human infection. We hypothesize that the mutation allows S. saprophyticus to survive better in the human urinary tract. These results show how bacteria found in the environment can evolve to cause disease.

substantial, with recent estimates from the United States numbering in the billions of dollars per year (4). UTI are also associated with severe complications such as pyelonephritis, sepsis, and premature labor (4). Staphylococcus saprophyticus is second only to Escherichia coli as a cause of UTI in reproduction-aged women (5,6).
S. saprophyticus can be found in diverse niches, including the environment, foods, and livestock, and as a pathogen and commensal of humans. Several features of the epidemiology of S. saprophyticus suggest that infections leading to UTI are acquired from the environment rather than as a result of person-to-person transmission (7). This implies that adoption of the pathogenic niche by S. saprophyticus has not entailed a tradeoff in its ability to live freely in the environment. A recent PCR-based survey of virulence factors in clinical and animal-associated isolates showed that dsdA, a gene encoding D-serine deaminase that is important for survival in urine (8), and uafA and aas, genes encoding adhesins that mediate binding to uroepithelium (9,10), were present in all isolates surveyed (11), suggesting an underlying pleiotropy, with these virulence factors playing important roles in the diverse environments occupied by S. saprophyticus.
The human urinary tract could represent an evolutionary dead end for S. saprophyticus, with "virulence factors" such as DsdA, UafA, and Aas serving an essential function in the primary environmental niche and enabling invasion of the urinary tract as an accidental by-product of this unknown primary function. In such a case, we would expect urinary isolates to be interspersed throughout a phylogeny of isolates from the primary niche(s). However, our previous research (7) indicated that human urinary tract infections are associated with a specific lineage of S. saprophyticus. Invasion of the human urinary tract enables S. saprophyticus to grow to high numbers in urine, isolated from competing bacterial species, before being redeposited in the environment. This is analogous to Vibrio cholerae, which cycles through human and environmental niches and grows to high abundance in the human gut before being deposited in the environment via stool (12,13). Based on our previous observations and the example provided by other human pathogens that cycle through the environment, we hypothesized that the human urinary tract is an ecologically important niche for S. saprophyticus and sought to identify genetic signatures of adaptation to this niche.
The increased availability of sequencing data has enabled comparative genomic approaches that have led to identification of changes in gene content in association with pathogen emergence and shifts in host association. Several notable human pathogens, including Mycobacterium tuberculosis, Yersinia pestis, and Francisella tularensis, are the product of a single emergence characterized by gene loss and horizontal acquisition of virulence factors (14)(15)(16). Similarly, genomic analysis of Enterococcus faecium revealed gene gains and losses affecting metabolism and antibiotic resistance in the emergence of a hypermutable hospital-adapted clade that coincided with the profound shift in hospital ecology caused by the development of antibiotics (17). Gene gains via recombination have also allowed Staphylococcus aureus ST71 to emerge into a bovine-associated niche (18).
Using contemporary and ancient genomic data from strains of S. saprophyticus, we found previously that UTI-associated lineages of S. saprophyticus were not associated with specific gene gains or losses; the evolutionary genetic processes underlying the adoption by S. saprophyticus of the human-pathogenic niche are likely more subtle than those previously described for canonical pathogens (7). Here we have identified one of the mechanisms underlying the adaptation of S. saprophyticus to the uropathogenic niche: a selective sweep in the Aas adhesin, which is associated with an apparently large-scale expansion into the human-pathogenic niche. This is, to our knowledge, the first identification of a single nucleotide sweep in a bacterium.

RESULTS
We reconstructed the phylogeny of S. saprophyticus isolates (see Table S1 in the supplemental material) from a whole-genome alignment using maximum likelihood inference implemented in RAxML (Fig. 1). The bacterial isolates are separated into two clades, which we previously named clades P and E (7). In both clades, human-associated lineages are nested among isolates from diverse sources, including food (cheese rind, ice cream, meat), indoor and outdoor environments, and animals. Interestingly, cheese rinds harbor diverse strains of S. saprophyticus, which cluster with both human-and animal-pathogenic strains.
Thirty-three of 37 modern, human-pathogenic isolates are found within a single lineage (that we term lineage U [for ЉUTI associatedЉ]) with respect to which bovinepathogenic (mastitis), food-associated isolates, and an ancient genome are basal. Given the association between this lineage and illness in humans, we were curious about its potential adaptation to the human-pathogenic niche. The placement of the 800-yearold strain between bovine-associated and human-associated lineages suggests that it could represent a generalist intermediate between human-adapted and bovid-adapted strains.
Core genome analysis of the 58 isolates of S. saprophyticus in our sample showed substantial variability in gene content; the core genome is composed of 1,798 genes, and there are an additional 7,110 genes in the pan-genome. We found previously that uropathogenic isolates of S. saprophyticus were not associated with any unique gene content (7). Given the variability in accessory gene content among the members of this larger sample of isolates, we decided to test for relative differences in accessory gene content between human clinical isolates and other isolates using Scoary (19), which performs a genome-wide association study (GWAS) using gene presence and absence. We did not identify any genes that were significantly associated with the humanpathogenic niche after correction for multiple-hypothesis testing using the Bonferroni method.
In addition to the observed gene content variability, analyses of the core genome also indicated relatively frequent recombination among S. saprophyticus isolates (Fig. 2). We identified recombinant regions with Gubbins (20), which identifies regions with high densities of substitutions. These results indicated that 70% of sites in the S. saprophyticus alignment had been affected by recombination. Recombination can affect bacterial evolution both by introducing novel polymorphisms from outside the population and by reshuffling alleles without increasing overall diversity. Considering sites that are reshuffled within the S. saprophyticus sample to be recombinant sites, we estimated a ratio of recombinant to nonrecombinant single nucleotide polymorphisms (SNPs) of 3.4. Considering only the SNPs that introduce novel diversity to be recombinant SNPs, our estimate of the ratio of recombinant SNPs to nonrecombinant SNPs is 0.51. The mean recombination-per-mutation (r/m) value for branches in the phylogeny is 0.82 (range, 0 to 7.6) as estimated by Gubbins. Removal of recombinant SNPs did not affect the topology of the maximum likelihood phylogeny. We observed regional patterns in the amount of recombination inferred, and, as expected, recombination appears to be frequent at mobile elements such as the staphylococcal cassette chromosomes (SCC 15305RM and SCC 15305cap ) and vSs15305 (10).
Adaptation to a new environment may be facilitated by advantageous mutations that quickly rise in frequency, leaving a characteristic genomic imprint: reduced diversity at the target locus and nearby linked loci (i.e., selective sweep [21,22]). In order for positive selection to be evident as a local reduction in diversity, there must be sufficient recombination for the target locus to be unlinked from the rest of the genome; for this reason, scans for sweeps have been used primarily for sexually reproducing organisms (23-26). As described above and in prior work (7), we found evidence of frequent recombination among S. saprophyticus isolates. We hypothesized that the transition of S. saprophyticus to the uropathogenic niche may have been driven by selection for one or more mutations that were advantageous in the new environment and that levels of recombination have been sufficient to preserve the signature of a selective sweep at loci under positive selection. We therefore used a sliding window analysis of diversity along the S. saprophyticus alignment as an initial screen for positive selection. We identified a marked regional decrease in nucleotide diversity () and in Tajima's D (TD) that is specific to lineage U (Fig. 3); the TD values for this window were Ϫ0.38 and 0.94 for non-lineage U clade P isolates and clade E isolates, respectively. The region with a decreased /TD ratio corresponded to bp 1760000 to 1820000 in S. saprophyticus ATCC 15305 and had the lowest values of and TD in the entire alignment. We investigated the sensitivity of our sliding window analyses to sampling by randomly subsampling lineage U isolates to the same size as clade E (n ϭ 10); we found the results to be robust with respect to changes in sampling scheme and size.
To complement the sliding window analysis and pinpoint candidate variants under positive selection, we used an approach based on allele frequency differences between bacterial isolates from different niches. We calculated Weir and Cockerham's F ST (27) for single nucleotide polymorphisms (SNPs) in the S. saprophyticus genome using human association and nonhuman association to define populations. The region of low /TD included three nonsynonymous variants in the top 0.05% of the F ST values ( Table 1). One of these variants was fixed among human-associated isolates in lineage U (position 1811777 in ATCC 15305; F ST ϭ 0.48) and distinct from the ancestral (anc) allele found in basal lineages of clade P, including the ancient strain of S. saprophyticus Troy. This suggests that the variant may have been important in adaptation to the human urinary tract. To assess the significance of the F ST value for this variant, we performed permutation analyses by randomly assigning isolates as human associated, and we did not achieve F ST values higher than 0.28 in 100 permutations.
Selective sweeps may be evident as a longer-than-expected haplotype block, since neutral variants linked to the adaptive mutation also sweep to high frequencies (28). Given the evidence suggesting that there was a selective sweep at this locus, we used haplotype-based statistics to test for such a signature in the S. saprophyticus alignment. Haplotype-based methods are hypothesized to not be applicable to bacteria due to differences between crossing-over and bacterial patterns of recombination (29), but the methods had not been tested in a scenario akin to a classical sweep, in which local changes in diversity and in the site frequency spectrum (SFS) have been observed. We found that the variant at position 181177 did show a signature of a sweep using the extended haplotype homozygosity (EHH) statistic (28) (Fig. 4). However, the variant did not have an extreme value of nS L , which compares haplotype homozygosity for ancestral and derived alleles (30), after normalization by the allele frequency.
The variant of interest (aas_2206AϾC) causes a threonine-to-proline change in the amino acid sequence of Aas, a bifunctional autolysin with a fibronectin binding domain ( Fig. 5) (31). There are 8 additional nonsynonymous polymorphisms in the fibronectin binding domain; however, none are as highly associated with human-pathogenic isolates. Adhesins such as Aas are important in the pathogenesis of S. saprophyticus urinary tract infections, and the gene encoding Aas has been previously implicated as a virulence factor (9,31,32).
The Aas variant is in a region known to bind fibronectin (Fig. 5) (31) and could be under selection because it affects adhesion to this host protein. We used enzyme-linked immunosorbent assays (ELISAs) to investigate potential effects of aas_2206AϾC on binding to fibronectin and thrombospondin-1, which binds to this region of the homologous AtlE amidase from Staphylococcus epidermidis (33). Staphylococcal autolysins contains 3 C-terminal repeats (R1 to R3), which can each be divided into two subunits (a and b) based on structural information (34). We confirmed that Aas R1ab binds fibronectin and discovered that it also binds thrombospondin (Fig. 6); there was no detectable difference between the ancestral and derived R1ab alleles in binding to fibronectin or thrombospondin (human or bovine). Interestingly, we observed several instances of recombination of the aas variant. In each case, the recombination event reinforced the association of the derived allele with  human infection. Two of the non-human-associated bacterial isolates in lineage U-an isolate from a pig and a second from cheese rind-had evidence of a recombination event at the aas locus resulting in acquisition of the ancestral allele. Conversely, one of the human UTI isolates in clade E (for which the ancestral allele is otherwise fixed) acquired the derived aas variant.
Several human pathogens appear to have undergone recent population expansion (35)(36)(37)(38). We wondered whether the uropathogenic lineage of S. saprophyticus might also have undergone a recent change in its effective size. The value for the genomewide estimate of TD for lineage U was negative (Ϫ0.58), which is consistent with population expansion. We used the methods implemented in ѨaѨi (39) to identify the demographic model that best fit the observed synonymous SFS of lineage U (Fig. 7).
The synonymous SFS showed an unexpected excess of high-frequency-derived alleles, which we hypothesized were the result of gene flow from populations with ancestral variants. Within-population recombination has been shown to have no effect on SFS-based methods of demographic inference in bacteria (40). However, external sources of recombination were not modeled in previous studies. We used SimBac (41) to simulate bacterial populations with a range of internal and external recombination rates. Similarly to previous studies, we found that within-population recombination had no effect on the value of Tajima's D. However, we did find that recombinant tracts from external sources resulted in positive values of Tajima's D (Fig. 8). Positive values of Tajima's D are also associated with population bottlenecks and balancing selection.
We used fastGEAR (42) to identify recombinant tracts that originated outside lineage U, and these sites were removed from the analysis prior to demographic inference. We compared five demographic models (constant size, instantaneous population size change, exponential population size change, instantaneous population size change followed by exponential population size change, and two instantaneous population size changes; Fig. 9) and used bootstrapping to estimate the uncertainty of the parameters and to adjust the composite likelihoods using the Godambe Information Matrix implemented in ѨaѨi (43). We found significant evidence of expansion in all models ( Table 2). The best-fitting model was an instantaneous contraction followed by an instantaneous expansion, in which the population underwent a tight bottleneck followed by a 15-fold expansion without recovering to its ancestral (anc) size (, N e /N anc ; , number of generations/N anc ; A , 2.9 ϫ 10 Ϫ2 ; B , 4.5 ϫ 10 Ϫ1 ; A , 1.2 ϫ 10 Ϫ1 ; B , 3.1 ϫ 10 Ϫ3 ). Recombination and positive selection are known to confound the inference of bacterial demography (40), so we used simulations to investigate their effects on our demographic inference performed for uropathogenic S. saprophyticus. We used SFS_CODE (44) to simulate positive selection (with a range of recombination rates) and to evaluate its effects on the accuracy of demographic inference with ѨaѨi. The method implemented in ѨaѨi relies on inference from the synonymous SFS, but it is possible for synonymous variation to be affected by selection, particularly at low rates of recombination (40,45). Neutral simulations with gene conversion did not affect demographic inference. We did find that positive selection can affect the synonymous SFS, resulting in inference of population size changes. In simulations of positive selection in a population of constant size, we found the spurious inference to be a bottleneck rather than an expansion. This suggests that the observed synonymous SFS of lineage U has been affected both by positive selection and by demographic expansion.

DISCUSSION
A central issue in the population biology of infectious diseases is how and why pathogenic traits emerge in microbes. Addressing this issue is important for understanding novel disease emergence and for identifying the genetic basis of virulence. Here we present evidence suggesting that a mutation in the S. saprophyticus aas gene, which binds host matrix proteins, is under positive selection and has enabled the emergence and spread of a human-pathogenic, UTI-associated lineage of this bacterium.
S. saprophyticus is familiar to medical microbiologists and clinicians as a common cause of UTI (46), which are associated with significant morbidity, economic costs, and severe complications (4). Despite its strong association with UTI in humans, S. saprophyticus can also be isolated from diverse environments, including livestock, food and food processing plants, and the environment (47,48). Our previous research suggested that pathogenicity to humans is a derived trait in the species (7).
That pattern was replicated here, where phylogenetic analyses linked human UTI with two lineages of S. saprophyticus that are nested among isolates from diverse, nonhuman niches (i.e., the free-living and food-and animal-associated niches). The aas mutation arose in lineage U, which contains most of the UTI isolates. Two lineages are basal to lineage U; one is bovine associated, and the other contains an ancient bacterial Adaptation of Staphylococcus saprophyticus sequence from a pregnancy-related infection in Late Byzantine Troy. The Troy bacterium has the ancestral, bovine-associated aas allele, and we have previously hypothesized (7) that this lineage could be associated with human infections in regions where humans have close contacts with animals-e.g., sharing living quarters with livestock, as they did at Troy during that time.
A second cluster of UTI isolates appears in clade E. One isolate has acquired the derived aas allele, which parallels our finding that two nonhuman isolates in lineage U acquired the ancestral variant; all of the recombination events that we observed at this

FIG 9
Cartoon of fitted demographic models. The observed synonymous SFS was fitted to 5 demographic models, including constant size, instantaneous population size change, exponential population size change, instantaneous population size change followed by exponential population size change, and two instantaneous population size changes. The parameters for the instantaneous and exponential models are the magnitude of the population size change ( ϭ N e /N ancestral ) and the timing of the change ( ϭ number of generations/N ancestral ). For the models with two population size changes, magnitudes are reported as A ϭ N b /N ancestral and b ϭ N e /N ancestral . locus reinforced the idea of an association between aas_2206AϾC and human infection. Several UTI isolates in clade E do not have the derived aas allele, and the clustering of UTI isolates suggests there may be a distinct adaptive path to virulence in this clade. Larger and more-comprehensive samples will be needed to investigate this hypothesis and to identify the factors shaping the separation of clades P and E.
The aas mutation has characteristics associated with a classical selective sweep driven by positive selection, namely, a regional reduction in diversity (21) and in Tajima's D (22,49). With the exception of the interesting allelic replacements noted above, there was also relatively little recombination at this locus, consistent with it being functionally important. To our knowledge, this is the first description of a single nucleotide sweep in a bacterium.
Depending on the strength of selection and the recombination rate, positive selection in bacteria has been observed to affect the entire genome, resulting in clonal replacements, or to affect only specific regions of the genome (50). For example, multiple clonal replacements have occurred in Shigella sonnei populations in Vietnam due to acquisition of resistance to antimicrobials and environmental stress (51). Recurrent clonal replacements have also been observed within single hosts during chronic infection of cystic fibrosis patients by Pseudomonas aeruginosa (52). Environmental bacterial populations can also be subject to clonal replacements; a metagenomic time course study of Trout Bog found evidence of clonal replacement occurring in natural bacterial populations but not gene-or region-specific sweeps (53). However, large regions of low diversity were also observed, suggesting that gene-specific selective sweeps had occurred prior to the start of the study. Shapiro et al. identified genomic loci that differentiated Vibrio cyclitrophicus isolates that were associated with distinct niches but that had limited diversity within niches; they concluded that differentiation of these populations had been enabled by recombination events that reinforced the association of alleles with the niche in which they were advantageous (54).
The aas_2206AϾC mutation is among the genetic variants that differentiate bacteria associated with human-pathogenic niches from those associated with other niches (i.e., it is an F ST outlier). SNPs associated with specific clinical phenotypes in the pathogen Streptococcus pyogenes were described recently (55), which is consistent with our finding that clinical phenotypes can represent distinct niche spaces preferentially occupied by subpopulations of bacteria. There is also precedent for a single nucleotide polymorphism to affect host tropism of bacteria (56).
In sexually reproducing organisms, haplotype-based statistics are frequently used to identify selective sweeps because positively selected alleles also increase the frequency of nearby linked loci faster than recombination can disrupt linkage, producing longer haplotypes for selected alleles (28,57). We found that aas_2206AϾC had a longer haplotype than the ancestral variant, but this difference was not extreme relative to the results seen with other regions of the genome (assessed with the nS L statistic). Haplotype-based statistics have been found to perform poorly in analyses of purebred dogs, where linkage across the genome is high (58). Relatively low levels of recombination may also contribute to a lack of sensitivity when haplotype-based detection methods are applied to bacteria; linkage of sites is also likely to be disrupted in a less predictable way by bacterial gene conversion than by crossing over (29). Based on our findings, we conclude that screening for regional decreases in diversity and distortions of the SFS (i.e., sliding window analyses) and identification of genetic variants with extreme differences in frequency between niches can be useful in identifying candidate sites of positive selection in bacteria.
S. saprophyticus encodes a number of adhesins, including UafA, UafB, SdrI, and Aas. UafA and Aas are found in all isolates, suggesting that they play important roles in the diverse niches occupied by S. saprophyticus. Aas has autolytic, fibronectin binding, and hemagluttinating functions (9,31,32,59). We identified a single, nonsynonymous polymorphism as a target of selection in the fibronectin binding repeats of Aas. This variant is predicted to affect the repeat's structure, as proline has a more rigid structure than other amino acids. Adhesins are plausible candidates for adaptation to the uropathogenic niche, as they are known to be important virulence factors in pathogens causing urinary tract infections (60). Fibronectin binding proteins, including Aas, have been identified as virulence factors in S. saprophyticus and Enterococcus faecalis (32,61,62). Adhesion to the uroepithelium is essential for uropathogens to establish themselves in the bladder, where they are subject to strong shear stress (63): we hypothesize that S. saprophyticus strains with the derived aas variant are better able to colonize the human bladder.
Invasion of the human urinary tract may provide a fitness advantage by allowing relative enrichment of S. saprophyticus in a site with little competition from other bacterial species and by providing a mechanism of dispersal in the environment. In analyses of selection in E. coli, another bacterium occupying diverse niches, residues in the FimH adhesin were found to be subject to positive selection in uropathogenic strains (64)(65)(66). FimH binds mannose, providing protection from shear stress through a catch bond mechanism (67). Interestingly, the vascular adherence and resistance to shear stress of Borrelia burgdorferi were recently found to be enabled by interactions between a bacterial adhesin and host fibronectin that also use a catch bond mechanism (68). There are also precedents in Staphylococcus aureus for polymorphisms in bacterial fibronectin-binding adhesins to affect the strength of binding and for these polymorphisms to associate with specific clinical phenotypes (69).
Further experiments are needed to investigate the effects of variation in Aas on S. saprophyticus biology. In our preliminary investigations of binding using ELISAs of recombinant bacterial peptides, we did not detect differences between ancestral and derived alleles in binding of the R1a1b repeat with respect to fibronectin. The variant could still affect fibronectin binding by altering the conformation of the protein in a manner analogous to that seen with FimH in E. coli (66). It is also possible that variants in the peptide affect binding under specific conditions that we did not test. Another possibility is that the variant affects autolysis or other as-yet-undescribed functions of Aas. The roles of adhesins and other virulence factors in the colonization by S. saprophyticus of niches in livestock and the environment are also interesting topics for further study.
Our demographic analysis of the uropathogenic lineage of S. saprophyticus showed evidence of a population bottleneck and subsequent expansion. Bottlenecks and expansion of drug-resistant clones have previously been shown to affect the population structure of Streptococcus agalactiae (70), demonstrating the effects of positive selection on the demographic trajectories of bacterial subpopulations. However, previous work has also shown that selection-and recombination-can produce spurious results from demographic inference in bacteria (40,71). We used an SFS-based method to reconstruct the demographic history of S. saprophyticus; the accuracy of demographic inference using these methods has been shown to be unaffected by withinpopulation recombination (40), and this was confirmed in our analyses of simulated data. We found that recombination from external sources may result in an excess of intermediate frequency variants, which is also a signature of population bottlenecks, so we masked externally imported sites. However, the frequency of synonymous variants could still be affected by selection on linked nonsynonymous sites, including the selective sweep in aas that we have described. We performed simulations to address these potential confounders and to aid in the interpretation of our demographic inferences. Simulation of a single site under conditions of positive selection resulted in the inference of a bottleneck (N e /N a ϭ 0.01 to 0.42), indicating that, at the recombination rates that we simulated, diversity was lost from neutrally evolving sites due to their linkage to the site under selection. In inferences from our observed data, a bottleneck was followed by a 15-fold expansion, suggesting that lineage U has undergone both a selective sweep and demographic expansion.
Here we have described an adaptation of S. saprophyticus that may have enabled its expansion into a human-pathogenic niche. Mutation of a single nucleotide within the aas adhesin appears to have driven a selective sweep, and allele frequency differences at the locus are consistent with niche-specific adaptation. Lateral gene transfer events in aas reinforced the association of the positively selected allele with human infection. These results provide new insights into the emergence of virulence in bacteria and outline an approach for discovering the molecular basis of adaptation to the humanpathogenic niche.

MATERIALS AND METHODS
DNA extraction. After overnight growth in tryptic soy broth (TSB) at 37°C in a shaking incubator, cultures were pelleted and resuspended in 140 l of Tris-EDTA (TE) buffer. Cells were incubated overnight with 50 units of mutanolysin. We used a MasterPure Gram-positive DNA purification kit (EpiCentre) for DNA extraction. For DNA precipitation, we used 1 ml 70% ethanol and centrifugation at 4°C for 10 min. We additionally used a SpeedVac for 10 min to ensure that pellets were dry before resuspending the pellet in 50 l of water.
Library preparation and sequencing. For SSC01, SSC02, and SSC03, library preparation was performed using a modified Nextera protocol as described by Baym et al. (72) with a reconditioning PCR with fresh primers and polymerase for an additional 5 PCR cycles to minimize chimeras and two-step bead-based size selection with a target fragment size of 650 bp and sequencing on an Illumina HiSeq