Sequence-based typing of genetic targets encoded outside of the O-antigen gene cluster is indicative of Shiga toxin-producing Escherichia coli serogroup lineages

Serogroup classifications based upon the O-somatic antigen of Shiga toxin-producing Escherichia coli (STEC) provide significant epidemiological information on clinical isolates. Each O-antigen determinant is encoded by a unique cluster of genes present between the gnd and galF chromosomal genes. Alternatively, serogroup-specific polymorphisms might be encoded in loci that are encoded outside of the O-antigen gene cluster. Segments of the core bacterial loci mdh, gnd, gcl, ppk, metA, ftsZ, relA and metG for 30 O26 STEC strains have previously been sequenced, and comparative analyses to O157 distinguished these two serogroups. To screen these loci for serogroup-specific traits within a broader range of clinically significant serogroups, DNA sequences were obtained for 19 strains of 10 additional STEC serogroups. Unique alleles were observed at the gnd locus for each examined STEC serogroup, and this correlation persisted when comparative analyses were extended to 144 gnd sequences from 26 O-serogroups (comprising 42 O : H-serotypes). These included O157, O121, O103, O26, O5 : non-motile (NM), O145 : NM, O113 : H21, O111 : NM and O117 : H7 STEC; and furthermore, non-toxin encoding O157, O26, O55, O6 and O117 strains encoded distinct gnd alleles compared to STEC strains of the same serogroup. DNA sequencing of a 643 bp region of gnd was, therefore, sufficient to minimally determine the O-antigen of STEC through molecular means, and the location of gnd next to the O-antigen gene cluster offered additional support for the co-inheritance of these determinants. The gnd DNA sequence-based serogrouping method could improve the typing capabilities for STEC in clinical laboratories, and was used successfully to characterize O121 : H19, O26 : H11 and O177 : NM clinical isolates prior to serological confirmation during outbreak investigations.


INTRODUCTION
Shiga toxin-producing Escherichia coli (STEC) are bacterial pathogens that result in both outbreak and sporadic occurrences of human mortality and disease. Symptoms can include bloody and non-bloody diarrhoea, and children are susceptible to renal failure due to haemolytic uraemic syndrome. STEC are transmitted to humans by consumption of contaminated food or water, person-to-person contact or animal-to-person contact, where natural reservoirs include cattle, pigs and sheep (Karch et al., 2005). Serogroup classifications based upon the O-somatic or H-flagellar antigens of STEC provide significant epidemiological information on clinical isolates, and this measure can provide the first indication of relatedness between strains during outbreak investigations. The serogroup is also indicative of the overall genetic relatedness between E. coli strains, including virulence gene content, such as the locus for enterocyte effacement (LEE) pathogenicity island, and the stx1 and stx2 loci encoding Shiga toxins (Prager et al., 2005;Girardeau et al., 2005;Karmali et al., 2003).
The predominant O-serogroup of STEC that is observed clinically in North America is O157 (Johnson et al., 2006); however, biased sampling likely results from the availability of clinical media and detection reagents that target this serogroup. Directed studies for the isolation and characterization of both O157 and non-O157 STEC from clinical samples have indicated that the proportion of non-O157 in North America is likely higher than clinical records have indicated (Thompson et al., 2005;Jelacic et al., 2003;Fey et al., 2000). In Canada, over 90 % of STEC strains detected are serotype O157 : H7 or O157 : nonmotile (NM) (Woodward et al., 2002). The global prevalence of non-O157 includes significant outbreaks of O26, O121, O103, O111 and O145, and in some countries it is recognized that these serogroups exceed the prevalence of O157 STEC (Karch et al., 2005). Furthermore, non-O157 strains have been identified along with O157 strains in clinical samples (Paton et al., 1996), so it is possible that a diagnostic bias towards O157 may prevent the detection of the aetiological STEC serogroup during human illness.
Molecular methods for the characterization and identification of O-antigen determinants have been devised using restriction profiling and allele-specific PCR. The entire Oantigen-encoding gene cluster could be amplified using primers that targeted conserved regions in the neighbouring gnd sequence (encoding 6-phosphogluconate dehydrogenase) and JUMPstart sequence, and enzymic digestion of this amplicon identified RFLPs correlating to O-antigen determinants (Coimbra et al., 2000). This method was problematic due to the length of the amplicon (upwards of 20 kbp) and the absence of unique restriction profiles for all serotypes. Within the O-antigen gene cluster the wzx and wzy loci encode the O-antigen flippase and polymerase, respectively, and distinct alleles corresponding to each O-serogroup have been used for molecular serogrouping of O103, O157, O26, O113 and O111 strains (Perelle et al., 2005;DebRoy et al., 2004;Paton & Paton, 1999a;Fratamico et al., 2005;D'Souza et al., 2002). It has been suggested that these assays could replace traditional serological methods ; however, the individual tests currently detect only one to three Oserogroups. In the absence of a priori knowledge of a serogroup, a large number of reagents may be required to confirm serogroup identity with these methods. Robust platforms such as DNA microarrays containing wzx and wzy probes targeting up to four E. coli serogroups are currently being investigated (Liu & Fratamico, 2006), and broad subtyping of STEC has been achieved using allelic variants of a LEE-encoded determinant (Gilmour et al., 2006).
Multilocus sequence typing has been attempted for each of the STEC serotypes O26 : H11, O121 : H19, O103 : H2 or O157 : H7, but this method was not appropriate for subtyping because very few polymorphisms were observed between strains of the same serotype (Gilmour et al., 2005;Tarr et al., 2002;Noller et al., 2003;Beutin et al., 2005). The genetic differentiation and subtyping of E. coli serotype O26 : H11 was attempted by sequencing 10 loci for 30 strains encoding stx1, or both stx1 and stx2 (Gilmour et al., 2005). Amongst the O26 : H11 strains all loci were identical, with the exception of three alleles of mdh and two alleles of ppk that each differed by a single point mutation. Notably, comparative analyses of the mdh, gnd, gcl, ppk, metA, ftsZ, relA and metG alleles encoded by O26 : H11 STEC cumulatively distinguished this serotype from O157 : H7 (Gilmour et al., 2005). The conservation of these loci between O26 : H11 strains, and the genetic distance from the other E. coli serotypes suggested that sequence-based typing of additional STEC might reveal serotype-specific alleles. In this study, additional DNA sequence data at these loci was obtained for a range of STEC and a single locus was observed to encode allelic variants correlating to individual STEC O-serogroups. We therefore present a simple molecular method for the identification of STEC serogroups, including both O157 and non-O157 strains.

METHODS
Bacterial strains. STEC strains (Table 1) were obtained from the reference stocks of the Enteric Diseases Program at the National Microbiology Laboratory that originated from human sources at various Canadian provincial health laboratories during 1985-2005, or were recent clinical isolates obtained from the Alberta Provincial Laboratory for Public Health (nomenclature XX-YYYY, where XX generally refers to the year of isolation). During the course of these studies, five outbreak-associated STEC isolates were provided by Nova Scotia Public Health, Halifax, Nova Scotia, Canada. Confirmation of O : H serotype was completed with antisera prepared at the National Microbiology Laboratory (Ewing, 1986).
PCR and sequencing. Template DNA was prepared by centrifuging 1 ml exponential phase culture grown in brain heart infusion broth, resuspending the pellet in 1 ml TE buffer (Sigma; 10 mM Tris/HCl, 1 mM EDTA, pH 8.0) and boiling the cells for 15 min. Boiled cells were pelleted, and the supernatant was removed and used as the DNA template in PCR.

Sequence typing correlates to O-antigen serogroups
The alleles of mdh, gnd, gcl, ppk, metA, ftsZ, relA and metG encoded by O26 : H11 STEC cumulatively distinguished this serotype from O157 : H7 (Gilmour et al., 2005), and the corresponding segments of these loci were sequenced for STEC serotypes O111 :  (Table 1). This sequence dataset was compared to previously published sequence data for STEC serotypes O157 : H7 and O26 : H11, as well as nontoxin producing O26 : H32, O26 : H6, K12 and O6 : H1 (strain CFT073) strains using the 4464 nucleotide concatenate of the eight genetic determinants (Fig. 1). Each of the examined serogroups had distinct sequence types, including NM STEC strains of O121 and O157, were 99.8 and 99.9 % identical to O121 : H19 and O157 : H7 strains, respectively. The observed phylogenetic separation between serogroups, and homogeneity within strains of the same serogroup, indicated that these genetic traits have been acquired by and vertically inherited within individual STEC serogroup lineages.

Molecular-based serogrouping with four loci
Additional sequencing was performed at selected loci in an expanded panel of strains to determine if the phylogenetic separation observed between serogroups was maintained in a larger dataset (Table 1). The genetic determinants that contributed the majority of the observed genetic diversity (gnd and gcl;  (Gilmour et al., 2006).

Molecular serogrouping of STEC
strains from the serotypes represented in Fig. 1, as well as seropathotype D and non-toxin encoding E. coli strains recovered from paediatric stool samples (L. Chui, unpublished data). The overall genetic distinction between STEC serogroups (as determined in the eight locus scheme) was also represented amongst these four loci, and the additional strains and serogroups (Fig. 2).

Molecular-based serogrouping with the gnd locus
The gnd locus was the most genetically diverse of all examined loci (Table 3), and notably, this determinant is immediately adjacent to the O-antigen gene cluster. Additional sequencing of the 643 bp region of gnd was performed (Table 1), and gnd sequence data available in GenBank for O157, O113 and O111 STEC, as well as nontoxin encoding O157 and O55 strains, were also included in comparative analyses. In total, gnd DNA sequences were collected from 144 strains and 26 O-serogroups (comprising 42 O : H-serotypes). The overall genetic distinction between serogroups (as determined in the eight and four loci schemes) was also represented in this single locus, as each examined STEC O-serogroup encoded a unique gnd allele (Fig. 3). For some of the most clinically significant STEC serogroups (O157, O26, O121, O145, O111 and O103) the gnd DNA sequences were compared between multiple strains (from 5 to 43 sequences), and for each serogroup all STEC strains encoded an identical gnd allele (Fig. 3). The only exception was O157 : H7 strain 87-16 (GenBank accession no. AF176360), which encoded a Fig. 1. Phylogeny of the concatenated segments of mdh, gnd, gcl, ppk, metA, ftsZ, relA and metG encoded by E. coli. This is based upon a neighbour-joining tree constructed with Hasegawa-Kishino-Yano (HKY85) distance correction. Sequences obtained from GenBank are identified in Methods. The serotype of strain K-12 was not designated, and the serotype of uropathogenic strain CFT073 was O6 : K2 : H1. Shiga toxin-producing serotypes are indicated in black type, and strains not encoding stx are indicated in grey. The number of sequences per serotype is indicated in parentheses. Bar, scale of the distance score. relA ACAATACGTACCGCACGCACATC single nucleotide polymorphism compared to the other O157 strains, but otherwise the gnd alleles were conserved within STEC serogroup classifications. Furthermore, nontoxin encoding strains of O157, O26, O55, O6 and O117 encoded distinct gnd alleles compared to STEC strains of the same serogroup. Sequence typing of gnd was, therefore, a promising molecular method correlating minimally with the O-serogroup of clinical STEC strains. The O111 : NM STEC and non-toxin-producing O55 strains encoded gnd sequences outlying from the main cluster (Fig. 3) and these were homologous to Citrobacter spp. gnd alleles (Nelson & Selander, 1994). However, since pure bacterial isolates are preferred for preparation of DNA sequencing template, all isolates undergoing gnd DNA sequence-based serogrouping should previously be classified as STEC.
During the course of this study, outbreak-related isolates of non-O157 STEC were sent to the National Microbiology Laboratory for serotyping and genetic characterization. The gnd sequence data for each of isolates 05-6541 to 05-6543 clustered with known O121 strains (Fig. 3). A concurrent non-O157 sporadic isolate (05-6544) was also examined at gnd and this sequence clustered with known O26 : H11 strains (Fig. 3). Strain 06-5121 was isolated from a hospitalized patient with haemolytic uraemic syndrome and the gnd sequence of this strain was 99.8 % identical to a known O177 : NM isolate (Fig. 3). In correlation with these molecular data, subsequent serotyping using traditional methodologies characterized these isolates as O121 : H19, O26 : H11 and O177 : NM. The gnd DNA sequence-based serogrouping method therefore provided an advantageous alternative to O-specific immunoreagents during these crises. Over 55 serogroups of STEC have been reported to be associated with human disease (Johnson et al., 2006), and an international panel of STEC strains from each serogroup, including the emerging sorbitol-fermenting O157, will be required to further validate this method.
The proportion of synonymous and nonsynonymous mutations was calculated for each locus from the accumulated DNA sequence data (Table 3). As expected for core loci, the majority of mutations were synonymous (dN/ dS ,1), but the gnd locus had the greatest number of nonsynonymous sites. This locus has already been identified as a polymorphic E. coli locus compared to other core loci (Bisercic et al., 1991;Nelson & Selander, 1994;Dykhuizen & Green, 1991). A comparable ratio of synonymous versus nonsynonymous mutations was also reported by Bisercic et al. (1991). Genetic diversity at gnd arose in parallel to the extensive diversity and recombination that occurred at the neighbouring O-antigen gene cluster, and it is likely that these two genetic traits were Fig. 2. Phylogeny of the concatenated segments of gnd, gcl, ppk and relA encoded by E. coli. This is based upon a neighbourjoining tree constructed with Hasegawa-Kishino-Yano (HKY85) distance correction. Sequences obtained from GenBank are identified in Methods. Shiga toxin-producing serotypes are indicated in black type, and strains not encoding stx are indicated in grey. The number of sequences per serotype is indicated in parentheses. Bar, scale of the distance score. Fig. 3. Phylogeny of the gnd locus encoded by E. coli. This is based upon a neighbour-joining tree constructed with Hasegawa-Kishino-Yano (HKY85) distance correction. Sequences obtained from GenBank are identified in Methods. Shiga toxin-producing serotypes are indicated in black type, and strains not encoding stx are indicated in grey. The number of sequences per serotype is indicated in parentheses. Strain identification numbers are indicated for outbreak-associated clinical isolates. The dotted line indicates outlying gnd sequences, which are presented in relation to the entire dataset in the inset. Bar, scale of the distance score.
co-inherited between lineages (Tarr et al., 2000;Nelson & Selander, 1994). To our knowledge, there is no indication that O-serogroups that encode similar gnd alleles (e.g. STEC O121 and O55) also encode similar O-antigen gene clusters, nor are the antigens themselves similar. The potential utility of a locus subject to recombination between genera might be seemingly limited for the purpose of molecular-based serogrouping; however, we currently observed conserved STEC serogroup-specific genetic polymorphisms at the gnd locus. Between strains of an individual STEC O-serogroup we observed conserved gnd alleles, and no serogroup encoded a gnd allele that was identical to another serogroup. This study provides a simple method for molecular-based serogrouping of E. coli strains encoding stx, which can be detected by a wealth of molecular reagents (Gilmour et al., 2006;Hsu et al., 2005;Nielsen & Andersen, 2003;Reischl et al., 2002;Wang et al., 2002). This method was used to characterize O121 : H19, O26 : H11 and O177 : NM clinical isolates prior to serological confirmation during an outbreak investigation, and could, therefore, improve the scope of STEC molecular diagnostics beyond the O157 serogroup.