Enterobacterales Infection after Intestinal Dominance in Hospitalized Patients

Increasing antibiotic resistance has resulted in infections that are life-threatening and difficult to treat. Interventions that prevent these infections, particularly without using antibiotics, could save lives. Intestinal colonization by pathogens, including vancomycin-resistant Enterococcus and carbapenem-resistant Enterobacteriaceae (part of the order Enterobacterales) is associated with subsequent infection, and increased colonization density is associated with increased infection risk. Therefore, colonization offers a window of opportunity for infection prevention if (i) there are rapid and inexpensive assays to detect colonization, (ii) there are safe and effective interventions, and (iii) the risk of infection outweighs the risk of the treatment. Fecal transplants are proof of principle that manipulating the microbiome can reduce such colonization and prevent infections. This study demonstrates the feasibility of implementing rapid and inexpensive assays to quantify colonization and measures the strength of association between Enterobacterales dominance and subsequent infection. The approach described here could be a valuable tool in the prevention of antibiotic-resistant infections.

C olonization with Enterobacterales species, an order encompassing the Enterobacteriaceae family, including Escherichia coli, Klebsiella, and Enterobacter, and other families of Gram-negative pathogens (1), has been associated with subsequent infection, which is both a challenge and opportunity for infection prevention. As an example, for Klebsiella, 5.2% of colonized patients became infected compared to 1.3% of noncolonized patients (2). This association appears stronger in antibiotic-resistant Enterobacteriaceae-based on a review of 10 studies, carbapenem-resistant Enterobacteriaceae (CRE) infections occurred in 16.5% of colonized, hospitalized patients (3). Increasing colonization density may increase the risk of subsequent infection. In patients undergoing stem cell transplantation, a relative abundance of Proteobacteria above 30% was associated with subsequent Gram-negative bacteremia (hazard ratio, 5.2) (4). In long-term acute care patients, a relative abundance of CRE Klebsiella pneumoniae above 22% was associated with a relative risk of 4.2 for subsequent infection (5). This association of colonization with subsequent infection provides a window of opportunity for intervention with emerging therapies, including decolonization by antibiotics or manipulation of the microbiome. In fact, in 142 patients across 23 reports who underwent fecal transplant for decolonization of multidrug-resistant organisms (MDROs), 78% achieved microbiological clearance (6), and case reports suggest interruption of MDRO infections (7).
Measuring colonization density could be an effective tool for infection prevention if it could be deployed in clinical laboratories. To date, measures of colonization density have required massively parallel 16S rRNA gene sequencing that is expensive and slow. Quantitative PCR (qPCR) assays could be a rapid and less expensive approach to detect and quantify intestinal colonization by pathogens of concern in our hospitals. Since qPCR has become a standard tool in clinical microbiology, deploying these assays would not require additional expertise or equipment. To test the feasibility of qPCR measurement of colonization density and the association between Enterobacterales dominance and subsequent infection, we developed two qPCR assays, compared their performance to 16S sequencing, and assessed their association with subsequent Enterobacterales infection in a case-control study.

RESULTS
In silico analysis of primer and probe binding. The TaqMan assays were designed to use an Enterobacterales probe and a broad-range bacterial probe targeting different regions of the same bacterial 16S rRNA gene amplicon (8,9) (see Fig. S1A in the supplemental material). To do so, we identified a conserved region of the bacterial 16S rRNA gene amplicon in 35 clinically relevant Enterobacterales and designed two probes that accounted for a GG/AC polymorphism in this region. In Fig. S1B). The second assay also used a different universal Assay optimization. Despite differences in the 16S rRNA gene sequences between Enterobacterales and non-Enterobacterales, some cross-reactivity was observed in binding of the Enterobacterales probe. Although the amplification curves were not exponential, final fluorescence was strong enough to cross the fluorescence threshold used to quantify the PCR result. To increase specificity, a number of PCR additives were tried empirically. Addition of dimethyl sulfoxide (DMSO) (5% [vol/vol]) was able to decrease or eliminate reactivity with three non-Enterobacterales: Pseudomonas aeruginosa, Eubacterium rectale, and Lachnospiraceae (Fig. S2). This additive was used for all subsequent testing.
PCR using universal primers can be confounded by contamination of the reaction mixture from the laboratory or the reagents themselves (10). Indeed, we observed reactivity in no-template control reactions. To eliminate this contaminating DNA, which might alter the quantification of total bacterial DNA, we added a proprietary derivative of propidium monoazide, PMAxx (Biotium, Fremont, CA; 6 M) to the reaction mixture. PMA is an intercalating agent that binds readily to double-stranded DNA (dsDNA). Photolysis (460-nm-wavelength exposure) produces a nitrene that can then covalently bind to DNA, forming cross-links that prevent use as a PCR template (10)(11)(12). Addition of PMAxx eliminated reactivity seen in the no-template control while allowing robust amplification of a K. pneumoniae 16S template (Fig. S3).
Linearity, precision, and reportable range. To compare the linearity and reportable ranges of assay A and assay B, serial dilutions of various templates were performed, and linear regression was performed based on input DNA and cycle threshold (C T ) values (Table 2). Both assays were linear, although the slopes were closer to Ϫ3.3 for assay A, which would represent perfect PCR efficiency, and closer to each other for the Ent probe and the universal probe. Since the Ent probe is a mixture containing a GG dinucleotide (binds to K. pneumoniae among others) and AC (binds to Serratia marcescens among others), we tested linearity and reportable range in serial dilutions of S. marcescens genomic DNA (Table 2). Again, the slope for assay A was closer to Ϫ3.3 than assay B.
Rectal swabs are likely to contain mixtures of Enterobacterales with both the AC and GG polymorphisms in the probe target sequence. To assess linearity and reportable range from a mixture of Enterobacterales, overnight cultures of K. pneumoniae, E. coli, S. marcescens, Citrobacter freundii, and Enterobacter cloacae were mixed in equal volumes, serial dilutions were made, and six replicates were tested by assays A and B. As seen for individual bacteria, assay A had a slope close to Ϫ3.3 for both Ent and Uniprobe, whereas for assay B, the Ent probe slope (Ϫ3.5) diverged from the Uniprobe slope (Ϫ3.3; Fig. 1A and B; Table 2).
To calculate percent Enterobacterales, the delta-delta-C T (ddC T ) method was used, calculating the ratio of Ent and Uniprobe C T values and using the log 10 6 CFU/ml samples as the value representing 100% Enterobacterales. Assay A did not show significant bias with changing concentrations and showed good precision (standard deviation [SD] Ͻ 10) between log 10 4 to 8 CFU/ml ( Fig. 1C and E). In assay B, there was proportional bias in the calculated percent Enterobacterales with overestimation at higher total CFU and underestimation at lower CFU, likely attributable to the differences in PCR efficiency slopes ( Fig. 1D; Table 2). The precision was Ͻ10 between log 10 4 to 7, corresponding with the lowest bias ( Fig. 1D and E). The bacterial population on the rectal swabs will be a mixture of Enterobacterales and non-Enterobacterales species, and the total fecal content may vary from swab to swab. To determine the linearity and reportable range based on total bacterial CFU, the five Enterobacterales species were mixed with two non-Enterobacterales, Bacteroides ovatus and Eubacterium rectale. For the qPCR experiments, the percent Enterobacterales was calculated with the mean dC T from positive-control K. pneumoniae and S. marcescens genomic DNA (gDNA) defined as 100%. All assays were internally consistent in percent Enterobacterales calculated down to a total bacterial CFU of log 10 5 CFU/ml (Fig. S4 [the dashed line represents the lower limit of reportable range]). Taken together, the linearity and precision data above indicate that the reportable range of qPCR assays A and B are a total bacterial concentration of log 10 5 to 9 CFU/ml of sample. This corresponded to an average Uniprobe C T of 14.5 to 30.1 for assay A and 16.6 to 32.6 for assay B.
Accuracy. To assess the accuracy of each PCR assay compared to 16S rRNA gene sequencing (V4 region), we extracted DNA and performed sequencing and PCR assays A and B on patient rectal swabs. To test for detection of samples with known Enterobacterales, we included 50 rectal swabs that were culture positive for K. pneumoniae. Additionally, we included 100 samples that were culture negative for K. pneumoniae. The percent Enterobacterales of the 150 samples by 16S sequencing ranged from 0.332% to 96.2%. For qPCR analysis, K. pneumoniae genomic DNA was used as the calibrator to define the dC T that corresponds to 100% Enterobacterales. Spearman correlation was ϭ 0.826 (P Ͻ 0.01) for assay A and ϭ 0.883 (P Ͻ 0.01) for assay B ( Fig. 2A and B). The bias was ϩ3.4 for assay A and ϩ2.5 for assay B (Fig. 2C and D). Despite the predicted differences in inclusivity of the primers in each assay, the correlation between qPCR assays is excellent (r ϭ 0.9486, 95% CI of 0.9291 to 0.9629, and P Ͻ 0.0001). Overall, these data indicate that both qPCR assays are highly correlated with 16S sequencing results in calculating percent Enterobacterales from rectal swab samples.
Initial and unadjusted analyses using 16S data. The data set from 284 patient samples consisted of 95 cases and 189 matched controls (there was one case that could be matched to only a single control; see Data Set S1 in the supplemental material). Infections were caused by E. coli (33), Klebsiella (25), Enterobacter (13), Serratia (9), Proteus (6) and Citrobacter (2). These infections included 21 bloodstream, 29 urinary tract, and 45 respiratory infections. We had 155 males (54.6%), and mean age was 57.9 years (SD Ϯ16.3). Based on 16S rRNA gene sequencing (232 samples), the mean/ median relative abundance of Enterobacterales in cases versus controls were 31.1%/ 22.9% and 27.5%/19.6%, with considerable overlap between the two groups. The unadjusted analysis did not show that the difference in means here was statistically significant (P ϭ 0.322), and this was true even for other variable constructions such as dichotomization, which argued against a threshold effect in the overall cohort. Only eight patients in our cohort met criteria for neutropenia (absolute neutrophil count of Ͻ500 cells/l), and this was not associated with infection risk (P Ͼ 0.99). Initial and unadjusted analyses using PCR assays A and B. To independently assess the performance of both PCR assays on the novel task of predicting Enterobacterales infection from rectal swabs, we evaluated both relative abundance calculations and cycle thresholds (C T ) for the numerator (Ent) and denominator (uniprobe) components. For assays A and B, we observed a similar lack of significant differentiation between cases and controls by relative abundance (median/mean for cases versus controls 32.2%/35% versus 30.7%/33.8% for assay A [P ϭ 0.8, n ϭ 235] and 24.5%/35% versus 24.9%/33.5% for assay B [P ϭ 0.794, n ϭ 222]). However, for assay A, there was an interaction between the numerator C T and percent Enterobacterales, and when accounting for this, the association with infection became statistically significant (P ϭ 0.012 for interaction as a whole). That is, the absolute and relative abundance of Enterobacterales interacted with each other additively to increase the risk of infection. As C T values are the number of PCR cycles required to amplify the target to a detectable level, a lower C T indicates a higher abundance of target in the sample. For a specific example, at an Enterobacterales relative abundance of 35%, for every five-cycle decrease in C T (i.e., 32-fold increase in absolute Enterobacterales abundance), there is an increased odds of infection (OR, 3.36 [1.88, 5.98]; P ϭ 0.012). The denominator C T for assay A associated with infection-for every 5-unit decrease in C T (i.e., 32-fold increased total bacterial abundance), we see a 2.8-fold increased odds of invasive infection (OR, 2.8 [1.53, 5.2]; P Ͻ 0.001).
For assay B, percent Enterobacterales did not associate with infection (P ϭ 0.794). However, the numerator C T did show a significant association, but not as strong as for assay A (OR, 1.35 [1.09, 1.67]; P ϭ 0.007). The denominator C T value did associate with infection, but it was also not as strong as for assay A (OR, 1.9 [1.1-3.2]; P ϭ 0.012). Thus, only PCR assay A was considered for multivariable modeling, presented below.
Adjusted analyses using 16S rRNA sequence data. Stepwise regression constructed an adjusted multivariable model that included %Ent calculated from 16S sequence data (Table 3). This failed to show an association between %Ent and infection (P ϭ 0.229) but identified clinical variables, notably cephalosporin and pressor use, as putative predictors of infection. We also attempted to construct an adjusted model for the bloodstream infection subgroup; however, this was much more limited as a result of having only 21 cases. The only clinical variable retained in this model was the presence of a central line at baseline. After this adjustment, for those with Ͼ60% Enterobacterales dominance in the gut, there is a 7.28-fold increase in the odds of bloodstream infection of borderline significance (OR, 7.28 [0.81, 65.36]; P ϭ 0.076).
Adjusted analyses using PCR assay A. For adjusted analysis of PCR assay A, the interaction with numerator C T was carried forward. Denominator C T values were considered but did not make it into the final model and when forced in (either alone or with various interactions), were not significant and did not improve model fit (data not shown); thus, the denominator C T was not included in adjusted analyses. However, percent Enterobacterales, being a relative abundance metric, does implicitly carry some  (Table 4). We observed an increased risk of infection that depended on both the absolute and relative abundance of Enterobacterales as assessed by PCR assay A. Given the interaction term, the percent Enterobacterales and C T cannot be interpreted independently, so for these variables ␤ coefficients and standard errors are presented in the table in lieu of odds ratios and confidence intervals, and the P value is presented for both terms and the interaction together. The ␤ coefficients can be interpreted as increased risk (positive coefficient) or decreased risk (negative coefficient), but for the interaction terms, the coefficients must be summed and interpreted at a specific level. As an example of how to interpret the interaction results, after adjustment for clinical confounders and at the average numerator C T of 25, every 10% increase in percent Enterobacterales increases the odds of infection 2.8-fold (OR, 2.83 [1.47, 5.44]; P ϭ 0.061). Conversely, at a relative abundance of 35% Enterobacterales, every five-cycle decrease in C T (i.e., 32-fold increase in absolute Enterobacterales abundance), there is an increased odds of infection (Table 4; OR, 2.32 [1.21, 4.48], P ϭ 0.061).

DISCUSSION
The aims of this study were to develop a qPCR assay to measure Enterobacterales abundance on rectal swabs, validate this PCR assay against 16S rRNA sequence analysis as the gold standard, and measure the association between Enterobacterales abundance and subsequent infection. Towards these aims, we developed and validated two real-time PCR assays that measure Enterobacterales as a proportion of total bacterial 16S rRNA-encoding DNA in a sample. These assays are somewhat complementary, as assay A had better analytical performance in terms of precision, linearity, and reportable range, but assay B was inclusive of Bacteroides species. Compared to the percent Enterobacterales calculated from 16S sequencing as the gold standard, both assays correlated well with minimal bias in patient rectal swab samples. This was true both in specimens that were culture positive for Enterobacterales (K. pneumoniae) and in the specimens used in the case-control cohort. Although there were challenges in terms of eliminating contaminating DNA from reagents, minimizing probe cross-reactivity, and optimizing inclusivity, these results indicate that relative abundance of bacterial taxa in microbiome samples can be measured accurately by qPCR.
Measured by 16S sequencing, the relative abundance of Enterobacterales in rectal swabs had a limited association with subsequent Enterobacterales infection in our cohort of intensive care and hematology/oncology patients. Overall, there was no significant difference in relative abundance between cases and controls as measured by 16S sequencing or qPCR. In adjusted analysis, relative abundance calculated from 16S sequencing was not significantly associated with infection. Many patients had percent Enterobacterales values that met prior definitions of dominance (Ͼ30%) (4), suggesting that both cases and controls may have an abnormal microbiome. However, Enterobacterales of Ͼ60% was associated with bloodstream infection. With only 21 cases in our data set, this is a preliminary finding as the ability to adjust for clinical confounders was limited. Notably, though, this is consistent with prior findings that intestinal domination by Proteobacteria and certain Enterobacterales species is associated with subsequent Gram-negative bacteremia in allogeneic hematopoietic cell transplant patients (4,13). The potential association with bloodstream infections outside of transplant patients would be important to explore further, as bacteremia and sepsis are devastating clinical outcomes.
Although these qPCR assays were designed to measure relative abundance, our findings indicate that absolute abundance is important information that can be obtained from qPCR but not from 16S sequencing and that both absolute and relative abundance are best considered together in assessing the risk of infection. In unadjusted analysis, there was an interaction between C T values, a measure of absolute abundance, and percent Enterobacterales. Accounting for this interaction revealed a significant association between assay A qPCR results and subsequent infection. The C T value of the denominator in assay A was also associated with infection in unadjusted analysis. The design of these assays may be critical, as assay B, which sacrificed some analytical performance for increased inclusivity, was less informative. It is also unclear to what degree this superior performance was contingent upon the infections studied and whether this would change for other infections, which should be explored in future research. Adjustment for clinical variables still supported an association between increasing gut Enterobacterales and subsequent Enterobacterales infection. Our final clinical model for adjustment of qPCR variables included the presence of a central line, cephalosporin antibiotics, and albumin as associated with subsequent infection. In this context, the denominator C T value did not improve the performance of the model, but the interacting variables of relative and absolute Enterobacterales abundance remained informative.
Compared to 16S sequencing that provides relative abundance, qPCR can provide at least two additional measurements of the microbiota in patient samples: the absolute abundance of particular taxa and the total microbiota density in the sample. This approach may have broader applications, as reduced gut microbiota density has been associated with inflammatory bowel disease and recurrent Clostridioides difficile infection and is reversed by fecal transplantation (14). If the overall gut microbiota density varies between patients, measurement of both relative and absolute abundance of specific taxa may be required to assess associations with human health and disease. qPCR assays that predict infections based on measurements of the microbiota could be translated into practice, since many clinical laboratories are proficient in the validation and implementation of these assays. In summary, these results demonstrate that using qPCR to measure intestinal dominance by specific bacterial taxa is feasible, provides additional information compared to 16S rRNA gene sequence analysis, and suggests that increased intestinal Enterobacterales levels are associated with subsequent infection.

MATERIALS AND METHODS
Case definitions. We collected discarded swabs sent for vancomycin-resistant enterococcus (VRE) screening during a 3-month period from patients in the intensive care units (ICUs) and select wards (hematology, oncology, and hematopoietic stem cell transplant) at Michigan Medicine. We then observed whether patients developed a bloodstream, respiratory, or urinary infection from a member of the Enterobacterales within 90 days after swab collection (including infections that occurred after hospital discharge) and identified them as putative cases. The putative cases were confirmed via manual chart review by the study team to decide whether they met clinical criteria derived from the Infectious Diseases Society of America, National Health Safety Network, and/or the American Thoracic Society guidelines (15)(16)(17)(18)(19). Following this, cases were matched 1:2 to controls based on age (Ϯ5 years), gender, and number of days (Ϯ30) between swab collections (to account for any temporal trends). This study was approved by the University of Michigan Institutional Review Board.
Samples for PCR and 16S rRNA gene sequence analysis. Rectal swab cultures were collected using the ESwab liquid Amies collection and transport system (Copan Diagnostics, Inc., Murrieta, CA), which elutes the swab sample into 1 ml of liquid Amies medium. The Amies medium samples were retrieved from Ϫ80°C storage, thawed, and extracted for DNA as described in Text S1 in the supplemental material. 16S rRNA gene sequencing and PCR assays A and B (described below) were performed for each sample from the same extracted DNA. Klebsiella pneumoniae and Serratia marcescens were cultured overnight in Luria-Bertani broth. The following species and strain were selected for a simple contrived community representative of common gut inhabitants: Eubacterium rectale, Bacteroides ovatus, Roseburia intestinalis, and Lachnospiraceae strain DW28. Strains were grown anaerobically in a Coy vinyl anaerobic chamber on modified or enriched brain heart infusion (BHI) liquid medium until reaching an optical density of 0.5 to 1, at which point individually grown strains were combined into an aliquot consisting of 40% Lachnospiraceae strain DW28, 30% E. rectale, 15% R. intestinalis, and 15% B. ovatus (Text S1). This contrived community was combined with different ratios of K. pneumoniae for "spiked" community analysis. Individual, combined, and spiked communities were used for DNA extractions.
DNA extraction. Samples were added to wells of the bead plate included in the MagAttract PowerMicrobiome DNA/RNA kit (catalog no. 27500-4-EP; Qiagen) (formerly known as the MoBio Pow-erMag microbiome DNA/RNA kit) designed to be used in conjunction with the automated liquid handling Eppendorf EPmotion 5075 TMX (catalog no. 960020033; Eppendorf). Bacterial cultures were extracted from 200 l of culture. Rectal swab samples were extracted from 250 l of liquid Amies transport medium in the E-swab transport system. Subsequent steps to DNA extraction were conducted following the manufacturer's directions using the Eppendorf EPmotion 5075 TMX. The resulting DNA was aliquoted into two replicate plates of 50 l each, one stored for archive and the other used to generate the 16S rRNA gene-based sequencing.
Primer design and evaluation. Candidate primer and probe assays were optimized using Visual OMP software (DNA Software, Ann Arbor, MI), which uses secondary structures and thermodynamics of DNA hybridization to predict the melting temperatures of primers and probes compared to those of unintended dimerization products and calculate the expected efficiency of the PCR. Primer and probe coverage and specificity were evaluated using the Ribosomal Database Project's Probe Match (RDP database release 11, update 5) (2).
PCR assay protocol. To balance amplification efficiency and inclusivity of the universal probe, two assays (A and B) were designed. Real-time PCR was performed using primers (IDT) and probes (Thermo Fisher Scientific) with sequences and concentrations listed in Table 1 in combination with TaqMan Gene Expression Master Mix (Thermo Fisher Scientific), 5% dimethyl sulfoxide (DMSO) (Sigma), 6 M PMAxx (Biotium) (a proprietary derivative of propidium monoazide), and 10 l template in a final reaction volume of 50 l. Prior to template addition, the reaction mixture was incubated for 10 min at room temperature and then treated in a Biotium PMA-Lite LED (light-emitting diode) photolysis device for 10 min. PCR conditions were 50°C for 2 min, 95°C for 10 min, and then 40 cycles with 1 cycle consisting of 95°C for 15 s and 62°C for 1 min on a QuantStudio 3 real-time thermocycler (Thermo Fisher Scientific). As positive controls, genomic DNA from Klebsiella pneumoniae was used for the MGB_ENT_GG_Probe, Serratia marcescens was used for the MGB_ENT_AC_Probe and Eubacterium rectale was used for MG-B_Uniprobe_p1 and Uniprobe_B. DNA-free water (Qiagen) was used as a negative control.
16S rRNA gene sequencing and analysis. For details of the 16S rRNA gene sequencing and analysis, see Text S1 in the supplemental material.
Statistical methods. (i) Initial analyses. Initial descriptive statistics such as measures of central tendency/spread and counts were used to examine closely the primary variables of interest (percent Enterobacterales by 16S rRNA gene sequence analysis [hereafter referred to as %Ent] and the two PCR assays [assay A and assay B]). These data were used to explore different variable constructions such as log transformation, dichotomization, and categorization. This was done to ensure that we were not missing important associations due to nonnormal distributions in the assay results or nonlinear associations with threshold effects. For the PCR assays, we examined the cycle threshold (C T ) values for both the numerator and denominator, as this reflected absolute abundance.
(ii) Unadjusted analyses. Following initial analyses, we conducted unadjusted analysis of cases versus controls and the relative abundance of Enterobacterales from 16S data and the two Enterobacterales PCR assays via conditional logistic regression. This was repeated for the subgroups of bloodstream, respiratory, urinary tract infections, and extended-spectrum ␤-lactamase-producing (ESBLϩ) infections to assess whether the associations differed by type of infection. For the PCR assays, we also assessed the C T values, and for the numerators, we looked for interactions with the Enterobacterales relative abundance. We then conducted unadjusted analysis of the clinical variables versus cases and arrived at a set of clinical variables that were significantly associated with infection.
(iii) Multivariable modeling. These unadjusted analyses were then used to construct adjusted models that included the assays of interest (along with any significant interactions) as well as the potential clinical confounders that were significant by unadjusted analysis. This was again done by conditional logistic regression. Models were built using stepwise addition. Starting with a null model, the likelihood ratio test with a cutoff of P Ͻ 0.05 was used to add variables with the lowest P value iteratively until no additional candidates existed. If the assay of interest was not selected during this procedure, then it was forced back into the model. The equation to model the log odds of infection conditional upon stratum j (a matched set of cases and controls), with independent and interacting variables was specified as follows: logit ͑ infection|stratum j͒ ϭ iϭ1 n ␤ 0 ϩ ␤ i X ij ϩ ␤ p X pj ϩ ␤ q X qj ϩ ␤ pq X pj X qj where X i is each independent variable (i.e., predictor) from 1 to n variables, X p and X q are the interacting variables, and the modeled log of the odds of infection (i.e., logit) is conditioned upon stratum j. If no interaction is present, the X p and X q variables are omitted. For statistical inference regarding the interaction (i.e., the P value), an analysis of variance (ANOVA) was performed comparing the models with/without the interaction term, X p ϫ X q .
(iv) Missing data. There were many missing clinical data in our set, especially with infrequently ordered clinical laboratory results such as albumin. There were also rectal swab samples where insufficient DNA was extracted for 16S rRNA gene sequencing and PCR. Thus, prior to multivariable modeling, we applied random forest multiple imputation implemented in the R package missForest to fill in missing data and utilize the whole data set, as this method has shown superiority to others for building biomarker-based predictive models (20,21).
Data availability. All sequence files have been deposited in the Sequence Read Archive under BioProject accession numbers PRJNA631262 and PRJNA556249.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. TEXT S1, DOCX file, 0.02 MB.