Gene Expression Changes with Minor Effects on the Population Average Have Major Effects on the Occurrence of Cells with Extreme Protein Concentrations

The cell-to-cell heterogeneity in a bacterial population provides a rich response to environmental changes and robust survival of an isogenic population. Especially, the rare, extreme phenotypes can be important for survival under transient lethal conditions.

function as ∂ t G(s, t) = α(s − 1)G(s, t) + (−(γ m + β)s + γ m )∂ s G(s, t) (0. 2) This can be solved using the method of characteristics, as described by Van Kampen [3]. Using t as the free parameter, the characteristics are: The solution to the first equation is given by the sum of the particular and the homogeneus solution, as described in [5]: By inserting 0.4 in equation 0.3, we obtain By applying the initial condition G(s, 0) = n s n P (n, 0) = n s n δ n,0 = 1, which corresponds to the assumption that all cells contain zero proteins at t = 0, to 0.5 we obtain G = exp α k γ m + β e (γm+β)t − 1 + −βt γ m + β (0.6) Now k is known from the expression 0.4: This is inserted into expression 0.6: The generating function is expanded in the variable s: So finally the probability for cells with zero proteins is given by: After a short transient, this expression can be approximated by: For typical parameter values the coefficient exp αβ (γm+β) 2 is approximately 1 since (γ m + β) 2 > β · α. So expression 0.8 is approximately given by: For high values of β this expression has a limit in exp(−αt) which corresponds to the probability for getting a particular time interval or more given a Poisson process. This is simply the probability of not producing an mRNA, which is to be expected for a high protein production rate.
The correction to the mRNA Poisson process, given by β γm+β in equation 0.9, is the probability to produce a protein once the mRNA is produced instead of degrading that mRNA.

Time spend in the zero-protein state
In the previous paragraph, an approximate analytical expression for the population level probability for zero proteins as a function of time was derived. In the following, we extend this by calculating the single cell probability to be in that state for a specified time interval τ c .
The fraction of cells with zero proteins at the start of the experiment can be given by the negative binomial approximation in equation 0.10 from [7].
The exit from the zero-protein state can be assumed to be governed by equation 0.9. So the probability of staying a certain time span τ c or longer is given by equation 0.11.
Finally, the fraction of surviving cells is obtained by multiplying expression 0.10 and 0.11, given in equation 0.12

The tail effect in the autoregulated system
The tail effect is also present in autoregulated systems. A small change in the binding of the protein to its own promoter can greatly increase or decrease the number of rare concentrations in the distribution. This is first shown by changing the mean with the dissociation constant K D = k of f kon , which in this study is done by changing the off-rate of the protein binding to its own promoter. This corresponds to a change in the binding affinity of the protein to the DNA and could easily be modified by small mutations in either the promoter of the protein binding region or in the protein itself.

Master equation for the autoregulated system
We model the autoregulated system by a two-state system, with the promoter being either unbound ("off"), which means the promoter is on, or bound ("on"), which means the promoter is off. This is similar to the system in Shahrezaei et al. [7]. The protein itself binds to and unbinds from the promoter at constant rates, given by k on and k of f . Since the probability of binding is constant per protein, the actual on-rate is given by n · k on . The off-rate is a constant, which means that only one gene copy is present. Such a system is described by the master equation 1.1.
When a protein bound to the promoter is diluted, the system changes from bound to unbound. This is represented by the term d · P + m,n+1 referred to as dilution of bound protein.

A change in the mean
Changing the mean by modifying the dissociation constant of the autoregulating protein gives a strong tail effect, as shown in figure S1. A difference of ±50 percent changes the probability for rare events with several orders of magnitude for both a high (n high = 480) and a low (n low = 80) threshold.
The values of the dissociation constant correspond to a somewhat weak regulation. The values presented in figure S1 are between 100 and 1000 molecules per cell, while the mean is between 165 and 324. The mean without regulation is 429. So even for a somewhat weak regulation, there is a major tail effect.
As can be observed in figure S1, the stronger the regulation, i.e. a lower dissociation constant, the narrower the distribution. The same notion was made by Friedman et al. [2], that stronger negative autoregulation gives less cell-to-cell variation in protein distributions. The problem is that a decrease in the mean in a constitutive system will also give a smaller variance, as seen for the variance in the negative binomial approximation σ 2 = µ(1 + b). So to compare the effect of autoregulation on tails, we chose to compare distributions with approximately the same mean. This is similar to the study by Rosenfeld et al. [6], where systems with the same mean were compared.

Constant mean but different shape
Protein distributions with the same mean, but different dissociation constants, were obtained as described in section 1.5. The solutions result in PMFs that are higher at the mean and decays faster for values away from the mean when the dissociation constant is low. This has a big effect on the tails and drastically influences the probability for rare concentrations, especially for the zero-protein phenotype as seen in figure S2B. The parameters to keep the constant mean are within α ∈ [0.05; 19] and K D ∈ [0.1; 1000].
Interestingly, below a limit around K D ≈ 1 the variance starts increasing, as shown in the figure S3.
This makes the probability for both the high concentrations and the zero-protein state increase, as seen in figure S2. This is probably due to burstiness in the mRNA production. The variance and the Fano factor for the mRNA distribution increases below K D ≈ 1 as seen in the figures S3BC.
The Fano factor approaches 1 as the dissociation constant gets weaker, which is to be expected for the unregulated promoter, which produces mRNA as a Poisson process [8].
For K D ≥ 1, the larger the K D the larger the variance, which increases both the probability of extremely low and high protein numbers.
Interestingly, the tail of high concentrations is less affected by changes in the dissociation constant. In the constitutive system, the far high concentration tail was more affected by changes to the translation step, than to the transcription step. Since autoregulation happens at the level of transcription, this could be related to that effect.

Transient overshoot during adaptation
The negatively autoregulated systems exhibit some transient phenomena after activation. The preceding results have shown phenomena related to steady-state distributions, but an autoregulated system has a new kind of tail effect during the evolution of the system from the zero molecules initial condition.
The time evolution of the mean, given by the master equation 1.1, is shown in figure S4A. Following the activation of the system, there is a significant overshoot in the mean protein level, before the steady state is reached. This is because there is some delay from the first production of mRNAs to the production of a protein, which means that too many mRNAs, relative to the mean level, can be produced before the system autoregulates.
As The maximum overshoot appears in the system with a very high mRNA production rate and very low dissociation constant, which means a strong autoregulation.
But the overshoot is only 2-fold, with the mean level transiently exceeding 100 proteins, as shown in figure S4A. On the other hand, if we choose a rare phenotype of n > τ = 450 and look at the probability as a function of time, given by P r(t, n ≥ τ ) = ∞ n=τ P (t, n), the result is drastically amplified. As shown in figure S4B, the probability for the occurrence of a rare concentration increases with orders of magnitude. For the most strongly regulated system, the difference in the occurrence of rare concentrations is more than 3 orders of magnitude different at its maximum value, as shown in figure S4C.

How to obtain the same mean for various dissociation constants
Since no analytical solution exists to equation 1.1, it is not easy to obtain the same mean for each of these distributions. First, an array of dissociation constants were chosen. Now, the task is to determine a corresponding array of transcription rates, α, that give the same mean. The transcription rate was obtained from Friedman's approximation [2]: First the expression was numerically integrated from zero to infinity to determine A K D ,α , i.e. the specific normalization constant associated with K D and α given by Next the mean was determined using the integral: Using these values for the exact numerical solution, gave somewhat imprecise results with a mean of 50 ± 2. To get more precision, a series of solutions to the appropriate master equation were obtained, with incremental changes in α. A linear fit to these values finally gave the target α value and yielded means of 50 ± 0.5.

The tail effect with soft thresholds
Probability distribution tails are in general more affected by small changes than the mean part of the distribution. As such, it is not surprising to observe this effect in protein distributions. However, the main task lies in how to relate the protein distribution tails to the formation of rare phenotypes. This requires a more careful consideration of what constitutes the threshold between a main and a rare phenotype.
The analysis in the main text relies on a hard threshold in the definition of a rare concentration, which means that the difference between a rare and a main concentration corresponds to a difference of one  [1]. The regulation by small RNAs can in some cases exhibit ultrasensitive regulation, once the level of sRNA exceeds the level of mRNA [4]. Depending on the system of interest, the threshold defining a rare phenotype might be a lot softer than the hard threshold described in the main text. Other possible softer thresholds might define a rare phenotype, such as a Hill function, an exponential function or a more coarse-grained log-scale threshold. These are investigated in the following sections, after the mathematical definition of a threshold is given.

Mathematical definition of thresholds
We define a threshold as a function that picks out a subset of the probability mass function. The probability for a cell to have a rare phenotype is then given in equation 2.1.
Where the function T (x | τ ) defines the threshold. The parameter τ defines the location or scale of the threshold between rare and regular phenotypes.
For a hard threshold the function T (x | τ ) is simply a step-function defined by equation 2.2.
High threshold: Hill function threshold High threshold: As seen in figure S5, for H = 8, this type of threshold do not lead to a big tail effect. A hill coefficient of 8 corresponds to eight proteins in the regulating complex, which can be considered a very high number and thus should lead to a sharp transition. A change of fifty percent in the protein level, leads to less than two orders of magnitude of probability. As such, a threshold defined by a Hill function do not lead to ultrasensitivity.

Exponential function threshold
An extension of the phage evasion example, where the presence of a receptor leads to phage infection, could be a softening of the hard threshold, by assigning a fixed probability to be infected per receptor. Given a fixed probability to infect per receptor 1 − p 0 , the probability to survive would be the probability p 0 to the power of the number of receptors(n): P survival = p n 0 = exp(n · log(p 0 )) = exp(−n · log(1/p 0 )). Depending on the probability of survival per receptor, this threshold would be more or less sharp. The lower the probability, the harder the threshold. This is shown in figure S6.

Log-scale threshold
The defining threshold for rare phenotypes could be more coarse-grained than the difference between plus minus a single protein. The threshold could be on a log-scale, where the importance was in the difference between e.g. a hundred and a thousand proteins. In such a case the tail effect is no longer present. For example, the probability for a cell to have zero proteins in an unregulated system was modelled by P (0) = 1 1+b a , which on a log-scale is only linearly dependent on the parameter a.