Characterization of polybacterial clinical samples using a set of group-specific broad-range primers targeting the 16S rRNA gene followed by DNA sequencing and RipSeq analysis

The standard use of a single universal broad-range PCR in direct 16S rDNA sequencing from polybacterial samples leaves the minor constituents at risk of remaining undetected because all bacterial DNA will be competing for the same reagents. In this article we introduce a set of three broad-range group-specific 16S rDNA PCRs that together cover the clinically relevant bacteria and apply them in the investigation of 25 polybacterial clinical samples. Mixed DNA chromatograms from samples containing more than one species per primer group were analysed using RipSeq Mixed (iSentio, Norway), a web-based application for the interpretation of chromatograms containing up to three different species. The group-specific PCRs reduced complexity in the resulting DNA chromatograms and made the assay more sensitive in situations with unequal species concentrations. Together this allowed for identification of a significantly higher number of bacterial species than did standard direct sequencing with a single universal primer pair and RipSeq analysis (95 vs 51). The method could improve microbiological diagnostics for important groups of patients and can be established in any laboratory with experience in direct 16S rDNA sequencing.


INTRODUCTION
Detection and identification of bacteria directly from clinical samples by broad-range PCR targeting the 16S rRNA gene followed by DNA sequencing (direct 16S rDNA sequencing) is a valuable tool in clinical microbiology (Goldenberger et al., 1997;Harris & Hartley, 2003;Petti et al., 2008b). RipSeq (iSentio, Bergen, Norway) is a webbased application for the analysis of mixed DNA chromatograms that enables the use of this method also on typical polybacterial specimens such as abscesses and empyema (Hartmeyer & Justesen, 2010;Kommedal et al., 2009). However, the use of a single universal first PCR limits the potential of this approach because the various bacterial DNAs in a sample will compete for the same reagents, leaving those present at the lowest concentrations at risk of remaining undetected. We have found that this will occur when the difference in concentrations exceeds 1 : 10, with some variation dependent on primer affinity for the different targets and the number of 16S rRNA gene copies in the respective bacteria. Also, the RipSeq algorithm has been validated for chromatograms containing up to three different species only (Kommedal et al., 2008). The aim of this study was to increase the number of detectable species in polybacterial samples by replacing the single universal PCR with a set of group-specific broadrange PCRs.
Based on careful examination of the 16S rRNA gene we identified a semi-conserved area splitting a wide selection of human bacterial pathogens into three variant groups: group A, Gram-positive cocci in chains and non-anaerobic Gram-positive rods; group B, non-anaerobic Gram-negative bacteria; group C, Staphylococcus spp. and anaerobic bacteria not covered in group A. Further research allowed for the design of three group-specific broad-range primer pairs (A, B and C) each matching one of the three groups. All primer pairs amplified approximately the first 500 bp of the gene. In addition to reducing competition for reagents they also increased the theoretical number of species possible to detect in a sample using RipSeq from three to three times three. The group-specific primers were used to reinvestigate 50 clinical samples, of which 25 were known to be polymicrobial and 25 assumed to be sterile based on previous investigations and the clinical information available.

METHODS
Primer cross-reactivity. For each primer pair (A, B and C; see Table 1 for primer sequences) cross-reactivity with the other bacterial groups was investigated by amplifying series of mixes with a falling ratio of target to non-target DNA (1 : 1 to 1 : 1000). In PCRs with primer pair A and primer pair C cloned 16S rDNA fragments were used. In reactions with primer pair B genomic DNA was used instead due to residual Escherichia coli DNA from the cloning process. Cross-reactivity with human DNA was tested for each primer pair by amplifying 1000 pg, 100 pg, 10 pg, 1 pg and 0 pg of target bacterial DNA together with a constant amount of 100 ng human DNA (Promega).
Cloning and plasmid isolation. The 16S rRNA genes from relevant bacterial strains were amplified using primers 4F+pH/1542r (Table 1). The PCR products were cloned into pCR 2.2 TOPO vector and transformed into competent E. coli cells using the TOPO TA cloning kit (Invitrogen), following the manufacturer's protocol. E. coli containing verified clones were cultured aerobically overnight. Plasmids were isolated from 2 ml E. coli culture using PureLink Quick Plasmid Miniprep kit (Invitrogen). The isolated plasmids were diluted to 10 10 ml 21 and frozen at 220 uC until they were used.
Pre-PCR treatment of clinical samples. The samples had been extracted using the following protocol. Between 200 and 800 ml sample material was added to a tube containing a mixture of glass and ceramic beads (SeptiFast Lysis kit, Roche), together with 400 ml Bacterial Lysis Buffer (BLB, Roche). Eight hundred microlitres was the maximum capacity of the bead-tube, and this volume was used for liquid samples with low viscosity. For other samples 400 ml was used if available. Two hundred microlitres was the lowest volume that would still provide 400 ml supernatant for the subsequent DNA purification and was the lowest volume accepted for all specimens. A negative control containing lysis buffer and 400 ml PCR-grade water (Qiagen) was included in every batch of samples. The samples were run for 2645 s in a FastPrep machine (Cepheid) at speed 6.5. After a short spin (15 800 g, 3 min), 400 ml supernatant was transferred to a MagNa Pure Compact automated extractor (Roche) and DNA was extracted and purified using the 'Total_NA_plasma_100_400' programme according to the manufacturer's instructions. The resulting 50 ml of eluate was stored at 280 uC until used.
PCR and PCR conditions. PCR primers are listed in Table 1. All PCRs were performed in 25 ml reaction tubes on a real-time SmartCycler apparatus (Cepheid) with a final reaction volume of 25 ml. The universal PCR mixture consisted of 12.5 ml ExTaq SYBR master mix (TaKaRa), 0.4 mM F-primer and 0.4 mM R-primer, 8.5 ml PCR-grade water and 2 ml template. All group-specific PCR mixtures consisted of 12.5 ml ExTaq SYBR master mix, 0.8 mM F-primer and 0.4 mM R-primer, 7.5 ml PCR-grade water and 2.0 ml template. The PCR thermal profile was identical for all reactions and included an initial polymerase activation step of 10 s at 95 uC followed by 40 cycles of 15 s at 95 uC, 10 s at 64 uC and 20 s at 72 uC.
Negative controls. For all PCRs every sample was run in parallel with its negative extraction and purification control. For the universal PCR a sample was defined as positive if the fluorescence threshold value (C t ) was reached three or more cycles ahead of the negative control. For the group-specific PCRs the negative control that became positive first for a given sample defined the cut-off for positivity for all three group-specific PCRs for that sample (i.e. three or more cycles before the earliest negative control).
Sequencing. The amplicons from the positive PCRs were collected by centrifugation from the SmartCycler reaction tubes into a 1.5 ml Eppendorf tube and cleaned up using the ExoSAP-IT enzymic degradation kit (Affymetrix). Primers used in the cycle sequencing reactions are listed in Table 1. Sequencing was performed in a core facility using the ABI PRISM 1.1 Big-Dye sequencing kit and a 3730 DNA Analyser (Applied Biosystems).

Table 1. Complete list of primers used in this study
Capital letters indicate a locked nucleic acid (LNA). Ambiguous bases: K5G or T, M5A or C, R5A or G.

Ø. Kommedal and others
Interpretation of chromatograms. Mixed DNA chromatograms were analysed using the RipSeq Mixed web application (http://www. ripseq.com/login/login.aspx). The RipSeq Mixed algorithm searches against the '16S human pathogen iSentio' database currently containing about 1500 reference sequences. The definition of a positive identification with the RipSeq program has been described previously (Kommedal et al., 2008). Non-mixed chromatograms were analysed with both RipSeq and a standard BLAST search against the GenBank database (http://blast.ncbi.nlm.nih.gov). For the BLAST searches interpretation criteria given by the Clinical and Laboratory Standards Institute were followed (Petti et al., 2008a).

Validation of new primers for PCR
The group-specific primers were aligned against 160 highquality GenBank reference sequences representing all 129 different genera present in the '16S human pathogen iSentio' database.
The distribution of genera among the different primer pairs was as follows: primer pair A, 42 genera [Grampositive cocci in chains (including some anaerobic species) and non-anaerobic Gram-positive rods (including the aerotolerant genera Actinomyces and Propionibacterium)]; primer pair B, 46 genera (non-anaerobic Gram-negative bacteria); primer pair C, 41 genera (Staphylococcus spp. and anaerobic bacteria except those covered for by primer A). This distribution is to be viewed as a rule of thumb, and exceptions exist in particular for groups A and C. A more detailed overview is provided in Supplementary Table S1, available with the online version of this paper.
The highest tendency for cross-reactivity was between primer pair A (PCR 'A') and bacterial group C and between primer pair C (PCR 'C') and bacterial group A. For these combinations a 1000-fold higher copy number of non-target DNA was needed to reach equally high peaks as the target DNA in the resulting chromatograms. For primer pair B (PCR 'B') co-amplification of group 'A' and 'C' DNA was not detectable even when present at a 1000fold higher concentration than group 'B' target DNA (Supplementary Table S2). No cross-reactivity with human DNA was observed with any of the primer pairs. In the solutions with 100 ng human DNA and no added bacterial DNA, only background bacterial DNA from the reagents was amplified.

Clinical samples
The polybacterial specimens contained high levels of bacterial DNA, and with the universal PCR the C t distance between a sample and the corresponding negative control ranged from 220 to 28 cycles. For the specimens positive by PCR 'A' the C t distance to the negative control ranged from 217 to 24 cycles. For the specimens negative by PCR 'A' the C t distance ranged from 22.4 to +3.9. The corresponding intervals for PCR 'B' were 216 to 27 and 22.8 to +0.9 and for PCR 'C' 217 to 26 and 22.0 to +4.4. (2 indicates before the negative control and+indicates after.) Eighty per cent of the samples were affected by antibiotics administrated prior to specimen collection, and both standard direct 16S rDNA sequencing and direct sequencing using group-specific primers detected a higher number of bacteria than did culture. In total 37 species were found by culture, 51 by standard direct sequencing and 95 by the group-specific primers. A detailed comparison between these results is given in Table 2. All bacteria identified with the standard universal primer pair were also detected with the group-specific primers. For six samples the two different sequencing protocols yielded identical results. In the remaining 19 samples a total of 44 additional bacteria were found by the group-specific primers, ranging from one to five species per sample. Only 32 out of the 95 bacteria identified by group-specific direct sequencing were cultivable. Four samples contained one or more isolates found exclusively by culture. These were a strain of Staphylococcus haemolyticus isolated from sample 7, scarce growth of a coagulase-negative Staphylococcus together with a diphtheroid in sample 23 and two colonies of Streptococcus parasanguinis in sample 24. In addition an isolate of Klebsiella sp. from sample 6 was not possible to read from chromatogram 6B. Some of these isolates might represent sample contamination, but others are probably true findings that have not been detected by sequencing either because they had to compete for reagents with a more dominant species belonging to the same primer group or because direct sequencing can have lower sensitivity than culture in samples with living bacteria.
The detailed results from the group-specific PCRs for the polybacterial specimens are given in Table 3. Four cases of cross-reactivity between primer groups were observed. In sample 7, a group A species was detected as low secondary peaks in chromatogram 7C. In both samples 11 and 12 a species belonging to group C was also detectable as the lower peaks in the A chromatograms. In sample 16, Enterococcus faecium and Finegoldia magna were amplified by both PCR 'A' and 'C', but PCR 'A' became positive 8.6 cycles prior to PCR 'C'. Three of the group-specific chromatograms were found to be too complex to be completely resolved, indicating that these samples still might contain one or more unidentifiable species.
Among the 25 clinical specimens assumed to be negative (Table 4) one sample (ID 35) was found to be positive by the universal PCR. It reached C t at cycle 31.5, exactly three cycles before the negative control. Melt-point analysis showed no distinct peak and it was negative by the groupspecific PCRs. Sequencing was unsuccessful and the PCR result was defined as a false positive. Another sample (ID 50) was positive by PCR 'B'. This reached C t at cycle 30.6, 3.1 cycles before the closest negative control. Melt-point analysis gave an irregular peak. Sequencing produced a complex chromatogram dominated by a Pseudomonas sp. and an Acidovorax sp. This corresponds to what we Direct 16S rDNA sequencing with group-specific primers typically find in our negative controls (Acidovorax spp., Acinetobacter spp., Comamonas spp., Pseudomonas spp. and Sphingomonas spp.) and the PCR was judged to be a false positive. The remaining 98 PCRs were negative as expected.

DISCUSSION
Detection and identification of bacteria directly from polymicrobial clinical samples by broad-range PCR followed by DNA sequencing can be accomplished using RipSeq mixed chromatogram analysis. In this study we replaced the standard universal 16S rRNA gene PCR with a set of three separate group-specific PCRs, reducing the risk of species present at lower concentrations becoming outcompeted in the amplification step. The importance of this factor is illustrated by the fact that only eight of the chromatograms obtained with the universal primers were judged to contain more than three bacterial sequences ( Table 2).
The group-specific PCRs significantly increased the number of identified bacteria as compared to amplification with a universal PCR (95 versus 51, P,0.05 by Wilcoxon signed-rank test; Altman, 1991). All bacteria identified by standard universal direct sequencing and RipSeq analysis were also found with the group-specific primers and RipSeq analysis. Since the species combinations in the chromatograms obtained with the universal versus the group-specific primers for the most part were different, this is a confirmation of RipSeq's ability to analyse mixed DNA chromatograms. In addition 18 of the bacteria identified as part of a mixed chromatogram with the universal primer pair were recovered as non-mixed chromatograms with one of the group-specific primer pairs.
Discrimination between the three groups of bacteria was based on the forward primers alone, and the difference between forward primers A and C was a single nucleotide at the 39-terminal position only (Table 1). In order to obtain sufficient primer discrimination we used locked nucleic acids (LNAs). LNAs increase the impact of a mismatch in the actual and the subsequent base position. When placed in the penultimate 39-end position an LNA protects the 39-end terminal base from the proofreading 39-59-exonuclease activity of the TaKaRa polymerase. In combination with this primer design, polymerases without 39-59-exonuclease activity have been reported to provide less robust discrimination (Di Giusto & King, 2004;Rupp et al., 2006). LNAs were also useful for harmonizing primer annealing temperatures.
In a set of group-specific primers the number of primer pairs will be a compromise between the possibilities given by the chosen gene target, sensitivity and practical considerations. For maximum effect, relevant bacterial species should distribute evenly between the primer pairs and cross-reactivity should be low. We found an almost perfect distribution for the species included in the prestudy alignments, and we also saw a good distribution in the study: 42 species were targeted by primer pair A, 20 by primer pair B and 33 by primer pair C. Given that two targets belong to different primer groups the primers described here will accommodate ratios of 1 : 1000 or higher. With two targets from the same group the situation will be equivalent to what we see with a single universal primer pair, where successful detection of both targets cannot be expected with ratios higher than 1 : 10.
A negative control containing a mixed DNA population in low concentration could give low reproducibility with the group-specific primers due to random distribution of the different DNA targets in the respective reaction tubes. We also considered that low-level DNA contamination in a specimen could have a different composition from lowlevel DNA contamination in the negative control (due to e.g. sample collection and transportation tubes). Consequently, in reactions where the negative control did not contain any target DNA a poor amplification would lead to a very high C t value that subsequently could define a negative sample as positive. For these reasons, although the negative control was included in all group-specific PCRs, the PCR in which it became positive first defined the cutoff for a positive sample for all three reactions. With the definition of a positive PCR used in this study, we got two false-positive PCRs. Although none of the false positive PCRs resulted in findings that would have been reported, we suggest including a second criterion stating that in order to be defined as positive, a sample has to reach C t   Tables 2 and 3.
By using group-specific primers the cost per analysis will see some increase. We kept the increase in both price and workload as low as possible by sequencing in one direction only, and the average number of sequencing reactions in our material was 2.55 per positive sample. There are several publications based on mono-directional sequencing, and it is also used in a commercially available 16S rDNA sequencing kit (Bosshard et al., 2003(Bosshard et al., , 2004Wellinghausen et al., 2009). With the high fidelity of today's PCR and Sanger sequencing reactions the main reason for bidirectional sequencing is to obtain maximum read lengths and better discrimination between similar species. If we ignore the primer-binding areas which are of no value for identification, a reduction of 40-50 readable bases can be  expected with monodirectional versus bidirectional sequencing. With our primers this led to a 40-50 % reduction in the number of readable bases in the variable area V3, whereas the variable areas V1 and V2 remained intact (Baker et al., 2003). We find the potential reduction in resolution to the species level defendable when compared to the significant increase in identifiable bacteria. The only visible consequence of mono-directional sequencing in this study was found in sample 12, where the better discrimination between Prevotella histicola and P. melaninogenica in the forward direction allowed for an 'and' instead of an 'and/or' identification with the universal protocol. A rational approach will be to use a single universal primer pair for specimens expected to be monobacterial (e.g. cerebrospinal fluid and synovial fluid) and reserve the group-specific primers for specimens known to frequently contain multiple organisms (e.g. abscesses, pleural fluid and bile).
It is clear from this and other studies that the results achieved by routine culture-based diagnostics alone are not sufficient when it comes to patients who have received one or more doses of antibiotics prior to sample collection, or the investigation of anaerobe infections (Al Masalma et al., 2009;Goldenberger et al., 1997;Kommedal et al., 2009;Petti et al., 2008b;Schuurman et al., 2004;Senn et al., 2005). Infectious-disease physicians are well aware of these limitations. As a consequence microbiological results are currently not so often used to tailor individual antibiotic treatment, but more to look for the unexpected organisms that eventually should lead to a diversion from standard empirical therapy. Compared to culture, sequencing with group-specific primers detected on average 2.5 times as many species per sample including important pathogens such as Actinomyces meyeri, Campylobacter gracilis (de Vries et al., 2008;Johnson et al., 1985), Escherichia coli, Fusobacterium nucleatum and HACEK group bacteria (Haemophilus parainfluenzae, Aggregatibacter aphrophilus and Eikenella corrodens). Failure to detect these organisms can lead to inadequate antimicrobial coverage, in particular in situations where first-line empirical treatment cannot be used or when patients are to be transferred to oral treatment. For the brain abscesses our findings correlate well with what was described in a study by Al Masalma et al. (2009) using cloning and second-generation sequencing. Although these methods can provide even higher resolution in some situations, today they remain research tools due to long investigation time and high costs.
For the uncultured organisms there will be no antimicrobial susceptibility results to guide treatment. Since some of these species may be unfamiliar to the clinical doctors, adding comments on the general susceptibility patterns for these species to the lab reports may contribute to assuring adequate patient treatment and also to preventing unnecessary use of broad-spectrum antibiotics. Many of the species detected exclusively by sequencing in this study (e.g. the anaerobic bacteria) have relatively unproblematic susceptibility patterns.
The purposes of this study were to demonstrate the potential of the group-specific PCRs and to establish a robust protocol. The approach can contribute to better characterization of polybacterial clinical samples and has the advantage that it can be readily implemented in any diagnostic laboratory with experience in standard direct sequencing.