Transcriptome-Level Signatures in Gene Expression and Gene Expression Variability during Bacterial Adaptive Evolution

Even initially sensitive bacteria can rapidly thwart antibiotic treatment through stress response processes known as adaptive resistance. Adaptive resistance fosters transient tolerance increases and the emergence of mutations conferring heritable drug resistance. In order to extend the applicable lifetime of new antibiotics, we must seek to hinder the occurrence of bacterial adaptive resistance; however, the regulation of adaptation is difficult to identify due to immense heterogeneity emerging during evolution. This study specifically seeks to generate heterogeneity by adapting bacteria to different stresses and then examines gene expression trends across the disparate populations in order to pinpoint key genes and pathways associated with adaptive resistance. The targets identified here may eventually inform strategies for impeding adaptive resistance and prolonging the effectiveness of antibiotic treatment.

was estimated with a pooled dispersion method, and genes were considered to be significantly DE if P<0.05 and q<0.30. To estimate differential gene expression variability (DV), the coefficient of variation (CV) for adapted and unadapted samples was calculated from the FPKM for each gene. Genes for which FPKM values were flagged by Cufflinks to have low or high data were removed prior to analysis, as were genes with a mean FPKM=0 across any two set of duplicates, and four genes that had FPKM=0 in at least one replicate in all conditions. A twotailed type two Student's t-test was used to compare the CV across three adapted and two unadapted growth conditions. The false discovery rate was controlled with Benjamini and Hochberg's algorithm (4). Genes were considered significantly DV if P<0.05 and q<0.30.
Furthermore, we examined the differentially expressed genes and transcript abundances for a correlation with distance from the origin, which would indicate that minor differences in optical density at time of sampling are resulting in different growth phases that impact the results of gene expression analysis. We compared the normalized expression values from all differentially expressed genes to their chromosomal position and their absolute distance from the origin. The Pearson correlation coefficient for chromosomal position versus expression value was -0.010 with a P-value of 0.71. The correlation coefficient for distance from origin versus expression values was -0.023 with a P-value of 0.41. We also examined all genes in each library using FPKM vs chromosomal location or distance from origin, and found no significant correlation between position and transcript abundance for any of the libraries. Pearson correlation coefficients ranged from -0.015 to 0.029 with P-values from 0.066 to 0.95. Thus, all correlation coefficients strongly indicate that differential expression observed in this work cannot be explained by chromosomal position or proximity to the origin. Bowtie 2 version 2.0.2 (5) was used to index the reference genome and generate alignment (sam) files using end-to-end alignment mode and default scoring from each FASTQ file.
Alignment sam files were converted to sorted bam files using SAMtools version 0.1.18 (6). Indels and single nucleotide polymorphisms (SNPs) were called using the Genome Analysis Toolkit version 2.4-9(7). Variants were called from sorted bam files from which PCR duplicates had been removed with SAMtools. SNP calls were filtered according to quality by depth (QD<2.0), mapping quality (MQ<40.0, MappingQualityRankSum< -12.5), strand bias (FS>60), and position of alternate allele in the read (ReadPosRankSum< -2.0). Indel calls were filtered according to quality by depth (QD<2.0), strand bias (FS>60), and position of alternate allele in the read (ReadPosRankSum< -2.0). The ReadPosRankSum requirement was made more stringent according to observations that many of the false positives called were located near the end of reads.
A custom Python script was used to add annotations (type of mutation, gene affected, synonymous or non-synonymous, amino acid change) in comparison to the Ensembl reference and gene annotation files. A custom MATLAB script was used to remove variants that exactly matched any variant in wild type samples, and find overlaps between populations. The Integrative Genomics Viewer(8) (IGV) was used to visualize all variants that passed the filter in DE genes, DV genes, and transcription factors regulating DE/DV genes. In many cases, false variants passed the filtering stage (e.g. variant in a minority of reads, and only located at the end of reads). Variants that appeared to be true calls or variant calls that were ambiguous were verified with Sanger sequencing, as follows: Glycerol stocks were streaked on LB agar and grown for 16 hours at 37⁰C.
Two colonies from each population were selected for sequencing (Quintarabio). Colony PCR was performed with Phusion DNA polymerase to amplify an approximately 400 bp region of genomic DNA flanking the purported polymorphism. Amplification was verified with gel electrophoresis.

Gene ontology classification.
For analysis of enriched gene ontology terms (in Fig. 1D, Fig. 3D, and File S1), we applied DAVID's functional annotation clustering tool (version 6.8) with the highest classification stringency (9). In the main text, we report the fold enrichment and the Pvalue from DAVID's EASE score, a more conservative Fisher's exact P-value. For Fig. 1D and 3D, we include the three most enriched functional classifications while in File S1, all unique enriched functional classifications are reported. For DE genes in File S1, we entered the 321 genes identified as significantly differentially under-expressed and the 434 genes significantly overexpressed in at least one of the ampicillin, tetracycline, or n-butanol adapted populations. For variability shifts in Fig. 3D and File S1, we entered 0-10 th percentile of ∆CV, including 418 genes that became more variable, and the 90-100 th percentile, which included 418 genes that became less variable.
Gene ontology information in Fig. 2 was derived from the Ecocyc and EcoliWiki annotation project file, validation date 10/31/2014 (10). We manually simplified the extensive gene ontology list to 27 categories, and allowed each gene to participate in only one category. Table S1. The

CRISRRi plasmid assembly. A list of plasmids used in this study is provided in
CRISPRi control plasmid targeting a DNA sequence not present in bacteria, pRFP-i, was first constructed by cloning the sgRNA portion from pgRNA-bacteria (Addgene plasmid 44251) into the pdCas9 vector (Addgene plasmid 44249) (11). Both plasmids were digested with the restriction enzymes XhoI and BsrBI (New England Biolabs). The 712 bp digestion product from pgRNAbacteria was gel-purified using Zymoclean™ Gel DNA Recovery Kit (Zymo Research Corporation). The 6497 bp digestion product from pdCas9 was also gel-purified in the same fashion. These products were ligated together using T4 DNA ligase (New England Biolabs) overnight. Ligations were transformed via electroporation into electrocompetent NEB 10-β cells.
Plasmid minipreps were performed using the Zyppy™ Plasmid Miniprep Kit (Zymo Research Corporation). The recovered plasmid was then transformed into chemically competent E. coli MG1655, and a miniprep was again performed and sequenced for validation of correct assembly product (GENEWIZ). A digestion confirmation was also performed by digesting the plasmid with EcoRI (New England Biolabs), in which three distinct bands were identified to confirm the product's assembly. The ultimate plasmid product includes aTc-inducible expression of dCas9, a chloramphenicol resistance marker, a constitutively-expressed sgRNA, and the strong rrnB terminator between the ORFs of dCas9 and sgRNA.
All CRISPRi plasmids perturbing target gene expression were derived from pRFP-i. Single guide RNA (sgRNA) targeting sequences were 20 nt long, and located immediately downstream of an Normalized gene expression was calculated according to the 2 -∆∆Cq method (12), using rrsA and gyrA as reference genes, and E. coli carrying a CRISPRi plasmid targeting RFP expression as a control strain (Fig. S5).  Mutation rates and 95% confidence intervals were determined via the Ma-Sandri-Sarkar maximum-likelihood method (14), implemented by the FALCOR web tool (15). Significance was assessed with Student's t-tests using the mutation rates and confidence intervals calculated by FALCOR.

Minimum inhibitory concentration (MIC) for
Swarming motility assay. We performed swarming motility assays on semi-solid plates (M9 minimal media with 0.4% glucose and 0.3% agar). We opted to use Keio collection strains (16) and E. coli BW25113 obtained from the Coli Genetic Stock Center (http://cgsc.biology.yale.edu/index.php), since these strains would not require aTc induction for 48 continuous hours to ensure a gene perturbation (as the CRISPRi strains would). We picked 5 colonies for each strain from LB agar plates and resuspended in 20 µL of sterile water. Plates were poured and dried for 30 minutes, then 1 µL of each colony suspension was stabbed into the center of each small plate. Three replicates were plated for each strain. Plates were incubated at 37°C for 48 hours. We photographed the plates using a Gel Doc EZ (Bio-Rad, Hercules, CA), and measured the area of the colonies using a custom pipeline in CellProfiler (17). Images were manually cropped, then colonies were detected using the IdentifyPrimaryObjects analysis module, with a global thresholding strategy and robust background thresholding method. Colony area was measured with the MeasureObjectSizeShape analysis module.
Resazurin metabolic rate assay. Overnight cultures in LB were diluted 1:100 into microplate wells with 40 µL LB, 40 ng/mL of aTc, 25 µg/mL of chloramphenicol, and a range of concentrations of gentamicin (up to 2 µg/mL of gentamicin). Four biological replicates were included for each strain. The plate was incubated for 20 hours at 37°C with 225 rpm shaking, then 4 µL of 10x resazurin was added to each well. Changes in florescence were monitored in a Tecan Genios (excitation 485 nm, emission 610 nm) in five minute intervals. The slope of the curve was determined using a custom MATLAB script. The most linear slope for each replicate was determined using a sliding window of 5 timepoints and an R 2 value (minimum R 2 value for any replicate was 0.98). Slopes were averaged across the replicates, and two-tailed t-tests were used to compare the slope of the RFP-i control to that of each CRISPRi strain. The 'clustergram' function from MATLAB's Bioinformatics toolbox was used to perform hierarchical clustering. Dendrograms were built with a Euclidean distance metric, optimal leaf ordering, and average linkage function.
Box plots were generated and other statistical analysis were performed with OriginPro 9.1 software (OriginLab Corporation, Northampton, MA). To compare variability in unadapted and adapted populations, the CV for each gene across wild type and n-hexane populations was compared to the CV for each gene across ampicillin, tetracycline, and n-butanol adapted populations. Significance was calculated with a two-tailed, type two t-test. To compare variability shifts in essential versus non-essential genes, the |∆CV| was calculated as |CVunadapted -CVadapted|. A two-tailed, type two ttest was used to compare the |∆CV| for 288 essential genes to that for 3828 non-essential genes.
The coefficient of variation for unadapted and adapted samples is the mean of FPKM across all unadapted or adapted populations respectively divided by the standard deviations across the same samples.
To examine the function of unknown genes, nucleotide BLAST was executed through NCBI's web interface at http://blast.ncbi.nlm.nih.gov/. We compared sequences from E. coli K12 MG1655 genes to the NCBI Chromosome database with the megablast algorithm (optimized for highly similar sequences).

Similarities & differences in gene expression levels for tetracycline adapted populations.
Using the log2(fold change) values depicted in Fig. 1C, there were 172 genes with ≥2-fold higher expression, and 218 genes with ≥2-fold lower expression in tetracycline population 1 but not 2.
There were 235 genes with ≥2-fold higher expression in and 251 genes with ≥2-fold lower expression in tetracycline population 2 but not 1. iraD, which controls σ S levels, was one of the over-expressed genes in population 2. Others included various transporters (dppB, trkG, sdaC) and redox associated genes (cyoABCDE, nuoA).

gspA, ydcT, sdaC, mdtO).
Mutation analysis. We called mutations in the RNA-sequencing data using a pipeline described in the Materials & Methods. We used IGV(8) to determine whether a mutation was called in any of the eleven commonly differentially expressed genes (tar, fiu, mntH, wzc, citC, entC, entE, fliA, amtB, yfiL, and yjjZ), the five differentially variable genes (ydiV, ybjG, yehS, ydhY, and yoeD), as well as the alternate sigma factors (σ S , σ E , σ N , σ F , σ H , and σ I ) and twenty-two transcription factors known to regulate the differentially expressed or variable genes (Fig. S4A). Out of all genes examined, mutations were called in five cases: in fiu and tyrR in ampicillin population 1, in wzc in ampicillin population 2, in entE in butanol population 2, and in tyrR in tetracycline population 2 ( Fig. S4B). We picked 2 colonies from the respective populations and used Sanger sequencing to examine areas of approximately 400 nt, including the called mutation (primers in Table S2). In all instances, the called variant does not appear in Sanger sequencing, suggesting that the called variants emerged during the library preparation or sequencing. Thus, there are no mutations in the common differentially expressed genes or the differentially variable genes, nor any mutations in known regulators of said genes. While there could be mutations in other unknown regulatory regions that promote changes in gene expression, it is clear that attributing gene expression changes to genomic changes is not straightforward. However, analyzing the significance of the gene expression changes can provide insight in the nature of the adaptive transcriptome.

Role of target genes with unknown function.
We applied NCBI's BLAST (18) and performed a literature search to examine potential roles for three genes without a predicted function: the differentially expressed gene yjjZ, and the differentially variable genes yoeD and yehS.
 yjjZ aligns well (100%) with annotated membrane proteins in a variety of Escherichia coli strains. In Shigella species, there is >96% alignment with hypothetical proteins. As mentioned in the main text, computational studies suggest that yjjZ encodes a small RNA (19).
 yoeD has more than 95% alignment with transposases in E. coli and Shigella species, Citrobacter rodentium, and Proteus vulgaris.
 yehS has 100% alignment with hypothetical proteins in various E. coli and Shigella species.
Interestingly, a 436 bp portion of the gene has 78% local alignment with DNA polymerase I in Klebsiella oxytoca KONIH1.