Interlaboratory comparison of fig (Ficus carica L.) microsatellite genotyping data and determination of reference alleles

Microsatellites have been identified as the marker of choice in plant genotyping projects. However, due to length discrepancies obtained between different laboratories for the same allele, interlaboratory comparison of fingerprinting results is often a difficult task. The objectives of this study were to compare genotyping results of two laboratories, to evaluate genetic parameters of microsatellite markers and to determine reference allele sizes for fig cultivars from the Istrian peninsula. Genotyping results of ninety fig (Ficus carica L.) accessions were comparable between the laboratories despite differences observed when comparing electropherograms of different capillary electrophoresis systems. Differences in lengths of the same alleles were detected due to different PCR methods and laboratory equipment, but the distances between alleles of the same locus were preserved. However, locus FSYC01 exhibited one allele dropout which led to misidentification of 28 heterozygotes as homozygote individuals suggesting this locus as unreliable. Allele dropout was assigned to the tail PCR technology or to a touchdown PCR protocol. Genotypes of twenty-four reference cultivars from the Istrian peninsula were confirmed by both laboratories. These results will contribute to the usage of markers with greater reliability, discrimination power and consequently, to more reliable standardization with other fig genotyping projects.


INTRODUCTION
The integration of DNA molecular marker technology into fingerprinting studies of agricultural plants is extremely widespread and has become a standard procedure.The techniques rely on independence from environmental factors and phenotype stage of the plant under investigation and are thus complementary to traditional approaches that often include laborious morphological evaluations.
Microsatellite markers combine several properties of an ideal molecular marker including high polymorphism in the number of tandem repeats, co-dominant inheritance, abundance in genome, excellent reproducibility and ease of use.They are also considered as a marker of choice in plant genetic research for many applications (e.g.diversity studies, paternity testing, mapping and fingerprinting studies) (Nybom et al., 2014).The employment of fluorescently labeled microsatellite markers in genotyping procedures significantly improves the throughput, automation and lowers the error rate (Wenz et al., 1998).However, the use of microsatellites can be costly due to the high price of fluorescent tags which must be carried by one of the primers in the primer pair.To overcome this problem, an inexpensive and flexible procedure was introduced with the three primer protocol incorporating the addition of modified locus specific primer and the universal fluorescent-labelled M13 (-21) primer (Schuelke, 2000).This method was used in multiple genotyping projects (Bandelj et al., 2004;Kyung-Ho et al., 2009;Mandel et al., 2011;Soriano et al., 2011) and it was recognized as a good economic alternative to conventional method.
Simple numerical output makes microsatellite technology very attractive for exchanging data among laboratories and for the establishment of global genotyping databases (De Valk et al., 2009), but several authors discuss the problem of consistency of microsatellite genotyping data in different laboratories and suggest standardization procedures for allele sizing (Cryer et al., 2006;De Valk et al., 2009;Deemer & Nelson, 2010;Jones et al., 2008;Vemireddy et al., 2007).Variation in results among laboratories could be due to human factors, differing methodologies, technological limitations, poor DNA quality or locus specific properties, since some microsatellite markers are more prone to errors and produce more stutters (Deemer & Nelson, 2010;Doveri et al., 2008;Ellis et al., 2011;Sutton et al., 2011;This et al., 2004).
Genotyping errors are often neglected even though they affect the data and can markedly influence the biological conclusions (Pompanon et al., 2005).

DNA extraction
The SI research group extracted genomic DNA from leaves of 84 fig accessions by a modified cetyl trimethylammonium bromide (CTAB) method following the procedure described by Kump & Javornik (1996).The FR research group extracted genomic DNA from leaves of six accessions with the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the supplier's instructions and minor modification described by Achtak et al. (2010).DNA concentration was determined using the Invitrogen Qubit® Fluorometer (Turner Biosystems, Sunnyvale, CA, USA) and the Qubit dsDNA BR Assay Kit (Molecular Probes, Thermo Fisher Scientific, Carlsbad, CA, USA) by SI researchers and spectrofluorometry (GENios Plus, TECAN, Grödig, Austria) by the FR research group.Dilutions of DNA with a concentration of 50 ng/µl were prepared and exchanged between research groups.Both research groups analysed the same DNA of 90 accessions as described in the following sections.

Microsatellite assay
Six primer pairs from different sets of the developed microsatellites have been selected for the genotyping procedure: MFC1, MFC2, MFC3 (Khadari et al., 2001), MFC9 (Khadari B., Hochu I., Santoni S., unpublished data), LMFC30 (Giraldo et al., 2005) and FSYC01 (Ahmed et al., 2007).According to each group's laboratory equipment and preferences for different chemicals various individual strategies for optimization and generalization of PCR conditions were employed.In general, the FR group used a conventional method with each primer pair labeled with the required dye, while the SI group used the economic three-primer method developed by Schuelke (2000).PCR and electrophoresis conditions for individual microsatellite locus are summarized in Table S1and Table S2.Primer sequences used for conventional and economic methods are listed in Table S3.

Data analysis
The software packages GeneMapper version 3.7 (FR) and 4.1 (SI) (Applied Biosystems, Foster City, CA, USA) were used for determination of allele sizes, peak intensities (in relative fluorescence units, rfu), banding patterns, and number of amplified alleles per primer pair.SPSS (IBM Corp. Released 2010.IBM SPSS Statistics for Windows, Version 19.0.Armonk, NY: IBM Corp.) were used to illustrate differences of peak balance (also peak height ratio or heterozygote balance).Peak balance was calculated for each heterozygous combination (at least five individuals per allele combination) according to Method #2, developed by Leclair et al. (2004), and defined as the ratio of peak height of the longer allele over that of the shorter allele.For comparison of allele sizes standard deviation and range were calculated for each marker using Microsoft Excel (2010).
Genetic parameters were calculated for 90 samples over all six analyzed microsatellite loci.Expected heterozygosity (H e ), observed heterozygosity (H o ), probability of identity (PI), polymorphic information content (PIC) and test for deviation from Hardy-Weinberg equilibrium (HWE) across all loci (chi-square (χ 2 ) test, p-value was assessed using Bonferroni correction (Kalinowski et al., 2007)) using the CERVUS 3.0.7 and Identity 1.0 (Wagner & Sefc, 1999) programs.Frequency of null alleles (F null ) was calculated with FreeNA (Chapuis & Estoup, 2007) and N e was computed using GenAlEx 6.5 (Peakall & Smouse, 2006, 2012).The mean error rate per locus (e l = m l / nt ) was calculated as ratio between m l , the number of single-locus genotypes including at least one allelic mismatch, and nt, the number of replicated single-locus genotypes (Pompanon et al., 2005).
The identification of the minimum number of markers required to distinguish all the observed multilocus genotypes was performed with the AMaCAID program written in the R language, using model one (Caroli et al., 2011).

Visual and morphological characterization of amplified alleles
In order to assess visual characteristics of alleles amplified in both laboratories caused by instrument resolution, peak signal strength and peak morphology were examined.Altogether 33 different alleles were identified over six microsatellite loci in both laboratories.
The shape of the peaks and number of stutter bands were nearly the same for all alleles regardless of the methods used in each laboratory.The exceptions were the alleles of the LMFC30 locus, produced by the economic method, which exhibit more stuttering and additional n+1 peak (where n indicates allele length), and at the MFC9 locus where higher stutter bands were observed.Alleles of the FSYC01 locus exhibited single stutter band in the FR laboratory, while the procedure in the SI lab yielded no stuttering but showed n+1 peak (Figure 1).
The peak signal strength for each locus resulting from two different amplification techniques showed noticeable differences with much lower intensity values recorded in the SI laboratory (Table 1).0.609 / 0.590 SI n (number of alleles), Ne (effective number of alleles), Ho (observed heterozygosity), He (expected heterozygosity), PIC (polymorphic information content), PI (probability of identity), F null (frequency of null alleles), HWE (Hardy-Weinberg equilibrium), SI (Calculated for Slovenian genotyping data), rfu (relative fluorescent values), * and ***: p < 0.05 and p < 0.001 (chi-square test, significance with Bonferroni correction) The peak balance value was compared between laboratories to see the influence that distance between alleles in a heterozygote has on peak balance.In general, intensities were higher in shorter alleles for the majority of comparisons (peak balances lower than one).At two loci, LMFC30 and MFC3, the peak balance value was decreasing when the difference in allelic lengths was increasing (Figure 2).A similar pattern with similar peak balance values was observed in both the SI and FR laboratories.Slika 2: Prikaz vrednosti razmerij med intenziteto fluorescentnega signala daljšega in krajšega alela pri heterozigotih z okvirji z ročaji, na osnovi rezultatov slovenske (SI) in francoske (FR) raziskovalne skupine.Heterozigotni vzorci z enako kombinacijo alelov so uvrščeni v isto skupino.

Comparison of allele lengths among laboratories
Actual allele sizes determined by the GeneMapper software were sorted according to their length to determine the groups of alleles which differ by less than 1 bp.Alleles were also manually reviewed and final sizes were rounded to the nearest full number representing the final called allele length.For easier comparison of genotypes between laboratories letters were also assigned to alleles, as suggested by Doveri et al. (2008), where A represents the shortest allele of the locus (Table 2).As expected, sizes of alleles between the laboratories differ (from 14.65 bp to 21.44 bp for actual lengths and 15 bp to 21 bp for called allele lengths) due to the distinct PCR technology, dye analysis matrices and internal standards used in analyses.The differences were consistent between alleles of the same locus.
The range between the minimum and maximum allele lengths were calculated as the simplest measure of variability.The highest difference of 1.02 bp was observed at locus LMFC30 in FR data for allele 261 bp (H), while for SI data difference of 0.71 bp was encountered at locus MFC1 for allele 212 (D).The range between actual allele sizes within the allele were relatively low with an average of 0.33 bp and 0.23 bp for the FR and SI teams, respectively.
Further comparison showed that the standard deviations for actual sizes of individual alleles were relative low, but varied among teams.Standard deviations were between 0.021 to 0.405 for alleles genotyped by the FR team, while somewhat lower standard deviations have been calculated for alleles scored by the SI team and were between 0.005 and 0.151 (Table 2).
Table 2 List of alleles with average actual size, letter designation, total number of individual alleles used for calculation (n), standard deviation (SD), range, differences between average actual sizes obtained by Slovenian (SI) and French (FR) research group, differences between actual sizes after the removal of elongated primer sequence M13 (-21) and differences between allele sizes rounded to nearest integer with and without primer elongation sequences.

Discriminatory power of microsatellite loci
In order to estimate the discriminatory power of the loci used in the study, several variability parameters were calculated; the number of alleles, H o , H e , PI and PIC.All six microsatellite loci were polymorphic, revealing a total of 33 alleles with an average number of 5.5 alleles and an average of 3.1 effective alleles per locus (Table 1).The highest number of alleles (eight) was amplified on locus LMFC30, seven alleles were characteristic to locus MFC3, five alleles were found on loci MFC2 and FSYC01, and four alleles were characteristic to loci MFC9 and MFC1.Only one taxon specific allele A (104 bp (FR) / 123 bp (SI)) was found on locus MFC3 and was characteristic to the LBS16 fig genotype.In general, the number of effective alleles was relatively low, indicating that rare and frequent alleles are present in the examined population of samples.The highest number of effective alleles (5.25) was observed at locus LMFC30, where the frequencies of alleles were equally distributed.
At two loci, MFC3 and MFC9, three alleles were observed in the cultivar 'Belica' (SI / FR accesion code: 19F / SLCV06).At MFC3 the third allele length was 97 bp (FR) / 117 bp (SI).Since this allele was discovered only at this accession, it was discarded.At MFC9 all three alleles (SI allele lengths 209 / 215 / 227 bp; FR allele lengths 192 / 198 / 211 bp) were identified more than once, therefore we decided to exclude the longest allele from this analysis.
The observed heterozygosity was higher than expected on four loci (MFC3, MFC9, LMFC30, and FSYC01 at the FR laboratory), showing an excess of heterozygotes, while excess of homozygotes was found on loci MFC1 and MFC2.An excess of homozygotes and a statistically significant deviation between expected and observed heterozygosity was also noted at locus FSYC01 (χ 2 = 14.35 (using Yates correction), p < 0.001) calculated for the SI data set, where allele D (152 bp / 172 bp) was not amplified and thus influences the variability statistics of this locus.Statistically significant deviation from HWE was observed for MFC1 as well (χ 2 = 12.73 (using Yates correction), p < 0.05).As expected, the frequencies of null alleles for FSYC01 SI data and for locus MFC1 were higher due to the deviation from HWE.Since the null allele frequencies of MFC1 and FSYC01 for the SI data were between 0.05 and 0.2, both loci were classified into the moderate class (Chapuis & Estoup, 2007), while the null allele frequency calculated for other loci were negligible (F null < 0.05).
Calculated PIC values were in a range from 0.506 to 0.782 and classified all loci as informative markers (PIC > 0.5) and locus LMFC30 as suitable for mapping (PIC > 0.7).Regarding the probability of identity, the highest values were observed on loci MFC2, MFC9, and FSYC01.The minimum PI value (0.118) was calculated for loci LMFC30.The overall probability that the two samples in our study share the same genetic profile by chance was 2.66x10 -4 (calculated for the FR data).

Fingerprinting and identification of reference cultivars
The genotyping data of twenty-four Istrian cultivars over five microsatellite loci are presented in Table S4 (Ahmed et al., 2007;Giraldo et al., 2005;Khadari et al., 2001).
To introduce as much experimental variation as possible, each laboratory was allowed to optimize its own PCR condition and amplification protocols with their preferred supplier of chemicals (Table S1 and Table S2).

Comparison of genotyping results
Since instrument sensitivity is extremely important for interpretation, poor signal strength can result in poor morphology and potential for errors in sizing (Koumi et al., 2004).With the aim to assess similarity of electropherograms of the SI and FR groups, a comparison of peak morphology, signal strength, and peak balance were performed (Table 1, Figure 1, Figure 2).The lower signal intensity obtained by the SI group in comparison with the FR group may be due to the different PCR amplification protocols, different electrophoresis settings (e.g., injection time) and fluorescent dyes used for microsatellite labelling.Use of different fluorescent dyes has a strong impact on the results due to their different relative intensity values.Lower intensity dyes are also associated with the threeprimer protocol (Culley et al., 2013), where part of the amplified fragments remains unlabelled.
Lower fluorescent values did not have influence on the proper allele calling step of the SI laboratory electropherograms.The differences observed at the electropherograms allelic patterns between the SI and the FR laboratory did not influence genotyping either, since results were comparable and the distances between alleles of the same locus were consistent.
Peak balances of different allele pairs per each locus was comparable between laboratories, despite the different fluorescence values.Comparable peak balances were also obtained by Koumi et al. (2004), where they analysed comparability of the results of STR multiplex AmpFLSTR TM SGMplus TM (Thermo Fisher Scientific) (multiplex assay for human identification applications) between three different electrophoresis instruments (ABI 377, ABI 3700, ABI 3100).
Peak balance can be used in genetic studies as a threshold for determining two heterozygous alleles as a possible genotype, where values of 50 % or 60 % are typically used (calculated by dividing the weaker intensity allele peak height by the stronger intensity allele peak height) (Butler, 2014).Debernardi et al. (2011) observed that a threshold at 60 % to be too stringent when analysing genotypes, obtained with AmpFLSTR TM Identifiler TM STR kit (Thermo Fisher Scientific), while in our study even a threshold at 50 % would be too stringent at loci FSYC01, LMFC30, MFC2 and MFC3.However, at loci MFC1 and MFC9 a threshold at 60 % could be applied.
Lower peak balance values indicate favourable amplification of the shortest allele.This phenomenon was most noticeable at loci LMFC30 and MFC3 with greater differences between short and long allelic combinations in heterozygous individuals.Such phenomenon can lead to a dropout effect of the longest allele (Tvedebrink et al., 2012), which is contributed by non-amplification of the allele.Analysis of peak balance in plant SSR genotyping studies is not the practice, but according to our opinion, it could improve the genotyping process because it helps to identify samples with larger deviations from the median and these should be checked once again with greater caution.

Comparison of allele length
Comparison of the allele lengths (after removal of 17 or 18 bp from the SI called allele lengths) showed differences between 2 bp and 4 bp which are in the range of previously reported investigations.This et al. (2004) have compared microsatellites of grape cultivars obtained from different laboratories, and mostly similar alleles were obtained, in some cases the raw data of identical alleles differed by as much as 5 bp.The differences are mainly contributed due to the use of different dyes, which contributes different molar weights to the final PCR products and due to the use of different molecular standards.
Standard deviation values of allele lengths were low, indicating that the sizing of identical alleles was very reproducible; differences among research groups could be assigned to different platform technologies used in the analysis.Differences in allele size are observed even if the same allele is repeatedly typed by the same CE machine (Pasqualotto et al., 2007).Very similar results have been obtained by Haberl & Tautz (1999) in comparative allele sizing of microsatellites of honey bees (0.05 -0.17).

Genotyping discrepancies
Altogether twenty-eight discrepancies were observed, but all were a consequence of FSYC01 locus D allele dropout.We assume that this is an experimental problem, associated with the PCR protocol due to the tailed primers creating conditions that encourage competition between alleles and prevent amplification of some alleles.Different amplification temperature profiles, i.e. touchdown protocol used by SI group could also influence the amplification of the D allele, since increased specificity allowed by touchdown PCR protocol could cause allele dropout due to polymorphism in primer-binding sites (Mullins et al., 2007).
However, since long-allele dropout was observed in one sample, provided by FR group with DNA extracted with Dneasy Plant Mini Kit (Qiagen), we assume that different DNA extraction methods did not influence the D allele dropout, although it is known that different methods of DNA extraction can cause different results (Benjak et al., 2006).
According to Pompanon et al. (2005) error rates between 0.5 % and 1 % are common in many laboratories.In our study, this measure was calculated for locus FSYC01 only, where the allelic dropout was the main cause of error.Due to the high mean error rate (15.56 %) associated with locus FSYC01 it should be considered as error-prone and thus its use in identification studies is unreliable.

Diversity parameters of selected microsatellite loci
Six microsatellite loci used in this study were chosen based on their confirmed discriminatory power and ease of scoring as observed in previous studies.In the study of Achtak et al. (2009)

Reference allele sizes for Istrian cultivars
Genotyping results of 24 reference cultivars from the Istrian peninsula, confirmed by the SI and FR research groups, can serve as a reference for identification purposes, fig collection management, and for standardising fig genotyping projects from the Balkan and surrounding regions.
A minimum set of loci (LMFC30, MFC1, MFC9) was determined to be sufficient to identify all 17 genotypes among the 24 reference cultivars examined in this study.This set of loci can be used for preliminary screening of fig genetic resources, while for discrimination of all 24 reference cultivars additional microsatellite loci should be utilized.

CONCLUSIONS
The analysis performed in this study showed that the comparability of allele sizes between two laboratories was very good and deviations in allele sizes were in the expected range, although different PCR technology, chemicals, and laboratory machinery were used.

SUPPLEMENTARY MATERIAL
The supplementary material for this article can be found online at the repository at University of Primorska.

Figure 2 :
Figure 2: Box plots of peak balance values calculated for the Slovenian research group (SI) and French research group (SI) laboratories.Heterozygous samples with the same allele combinations were grouped together.

MATERIAL AND METHODS 2.1 Plant material
(Vinson et al., 2005)diversity of figs reflects their domestication process, their complex pollination biology, exchange of cultivars between the growing regions, clonal propagation, and the coexistence of wild, feral, and cultivated forms in natural and agro ecosystems.A common germplasm database to support fig research should be available in order to solve the confusion in naming varieties (synonyms, homonyms), support management of fig genetic resources with genetic tools across all growing regions, and to facilitate the exchange of genotyping data among different laboratories.Several properties of figs as agricultural products support this need: (1) the economic potential of fig fruits (figs are widely cultivated in North African and Middle Eastern countries, where they represent a significant source of agricultural income);(2) the nutritional value and functional properties (high antioxidant content(Solomon et al., 2006), rich fibre content, vitamins, and minerals(Vinson et al., 2005)); and (3) special pollination biology (mutualism with fig wasps, different sexual systems, distinct flower formation, parthenocarpy, and the existence of all these forms in fig production).
In the present study we compared microsatellite genotyping data of fig trees generated in two different laboratories using their own protocols with the aim of comparing the data and the suitability of the used fig microsatellite loci for genotyping purposes.Ninety fig samples representing cultivars, feral, and wild figs were included in the analysis and genotyping was performed at six microsatellite loci proven to be suitable for discrimination of fig samples and genotyping cultivars Petrovka', 'Črnica' / 'Rovinj', 'Pinčica' / 'Zelenka', 'Termenjača' / 'Zuccherina', 'Vodenjača' / 'Bružetka bela' and 'Cikulina' / 'Kanora' / 'Grška črna'.
Coding alleles with letters, after the results standardisation, simplified genotypes comparisons.The published allele sizes of 24 reference cultivars from the Istrian peninsula (north-east Adriatic coast) in this work will serve for standardisation of new genotyping projects.The defined minimum subset of markers represents a step toward efficient identification of fig genetic resources in other fig growing countries.Such studies are essential because they enable identification of error prone loci under different PCR technologies.The high error rate encountered with Locus FSYC01 (15.56 %) indicated it should be excluded from genotyping projects.All other loci were identified as reliable for fig genotyping studies.A very important goal which can be achieved with standardized molecular identification tools is the identification of unique local genotypes.This would support management of fig collections and promote cultivation and breeding of new and interesting cultivars, either through exchanging and introducing different cultivars in new regions or to give importance to newly identified local unique cultivars.Traditional local products with protected designation of origin or with other quality schemes are in great demand and genetic analysis can help to identify cultivars which are characteristic for a specific geographical region.
This work is part of bilateral project Slovenia-France and was supported by the Slovenian Research Agency [BI-FR / 09-10-INRA-001].