The data presented here is as can be downloaded from Pourcel et al., BMC microbiology 2004 with the following minor modifications :

1-ms46 was incorrectly considered as an 17 bp repeat unit whereas it varies as a 7 bp repeat unit. The data has been recoded accordingly (see Li et al., PlosOne 2009 for more details).

Consequently, the allele calling code is now ms46_7bp_252bp_5U (which means, the locus is considered as a 7 bp repeat unit locus, and a 252 bp PCR product for this locus in sequenced genome CO92 accession number AL590842 as produced for instance by the primer set published in Le Flèche 2001 is coded 5U.

2-ms51 was incorrectly coded as a 18bp repeat unit tandem repeat with 7 U in the CO92 genome. This is not a correct interpretation, ms51 is now coded as a 21 bp repeat unit, with 2 units in the CO92 genome. Only two alleles are observed at this locus (1 and 2, instead of 6 and 7 as initially reported).

3-ms09 was recoded using the "lowres" (low resolution) convention. This convention takes advantage of the observation that the large-size alleles at this locus fall into 2 non overlapping populations, one centered around the 25 repeat unit allele, and the other around the 33 repeat units alllele. When using "low resolution" size measurement methods such as agarose, these relatively large-size alleles cannot always be confidently resolved (the 33U allele is 682 bp long when using the Le Flèche 2001 primers ; the repeat unit being 18bp long, it may be difficult on agarose gels to correctly assign +-1 U variants). The use of the "lowres" convention is a good compromise. The corresponding data set is called ms09lowres. The two datasets are conserved as separate data, and can be used according to the available data.

