GBS data and SNP detection

The sequencing produced an average of 1,700,000 reads per
sample (range 203,996 to 5,897,687 of good barcoded reads); from these reads a
total of 1,804,668 tags across all 640 plant samples were identified. The SNP
analysis in Tassel5, using the red clover genome assembly(De Vega, et al., 2015) as a reference, resulted in
264,927 SNP across all 640 plants. After substantial clean-up of the data (see
SNP discovery in the methods section), a panel of 12,577 robust polymorphic SNP
were produced, and 10 samples were lost due to poor sequencing results. Of
these 12,577 SNP, 8,118 were present in the seven chromosomes, and 4,459 were
located in the unmapped contigs of the assembly. For information the SNP were
identified as both transition and transversion SNP (Table 4). Overall the, SNP
markers showed reasonable levels of heterozygosity; however a small proportion
had greater than expected levels of homozygosity (Figure 2). This is most
likely due to the GBS analysis, where there is a slight tendency to
underestimate heterozygosity. It is a matter of sequencing depth, and poorly
sequenced variants can lead to under estimation of true heterozygosity. The
data represented in the rest of this paper concentrates on the 8,118 SNP
present in the seven chromosomes.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Phylogenetic relationships of the ecotype panel

The 8,118 SNP data were grouped according to UPGMA in the
package Cluster (Figure 3a, Table 5). The change in slope angle in Figure 3a
indicated four groups to the data. These groups were Asia (including Iran),
Mainland Europe, UK, and the Iberian Peninsula. Using these groupings, a relationship
tree was produced in the same package (Figure 3b). The relationship tree did
not show any distinction between the varieties and the ecotypes; Milvus, Britta
and AberRuby were allocated to group 2 (Mainland Europe), and Broadway and
Crossway to group 4 (Iberian Peninsula). However, there was a secondary change
in slope angle at k=2, dividing the population into Asia and Europe. There were
two anomalies in the data classification that of the Italian ecotypes Aa4445
and Aa4441 in the clusters represented by Asia and UK respectively (Table 5).

A principal component analysis of the data was coloured by
accession as defined by the relationship tree. The PCA plot (Figure 3c) clearly
showed that the Asian accessions were separate to the European ones, and that the
Iberian and UK populations were distinct from Mainland Europe; with the two
anomalous Italian accessions coloured to their proposed groups. The PCA
inferred that accession Aa4445 was not part of the Asian cluster, and is genetically
related to Mainland European accessions (red spots in European cluster).

STRUCTURE analysis assumed nine groups of ancestry to the
population, as inferred by ?K
(Figure 3f, Table 6). STRUCURE also described a large secondary peak at K=2, again
subdividing the population into European and Asian accessions. The nine group
population (Figure 2e) could also be classified in general into four sets, of
Asia (group 3), Iberia (group 2 and 8), the UK (group 5), and more ambiguously
Mainland Europe (group 1, 4, 6, 7, 9). Group one consisted in the main of
Croatian and Bosnian accession; group four of Italian and Slovakian accessions;
interestingly group seven contained only one accession that of Italy (Aa4445),
and accession Aa4441 again was predicted to have shared ancestry with the UK
population; group nine was a mix of accessions from Hungary, Poland, The Czech
Republic and Slovakia. These four groups appeared to be related by geographic
distribution, the collection sites all being fairly close together. However,
group six, which contained three of the varieties (Milvus, Britta and AberRuby),
consisted of accessions from Northern Europe, Eastern Europe and Italy. There
were four oddly placed accessions; these were Aa4441 from Italy in group five
(UK), and Aa4480 and Aa4487 from Spain and Aa4406 from Pakistan in group six
(Europe 3). There were four accessions, from Bosnia (Aa4280), Bulgaria (Aa355,
Aa4358) and The Czech Republic (Aa4304), which showed mixed populations of
individuals. Individuals from these accessions covered groups 1, 4, 6 and 9.

AMOVA was used to test the hypothesis that the varieties
were indistinguishable, in terms of genetic diversity and differentiation, from
the ecotypes. The AMOVA (Table 7) supports this hypothesis, and the
domestication process has not significantly altered the genetic diversity of
the varieties in comparison to the ecotypes, and that they are
indistinguishable from each other. However,
the AMOVA also described the individual accessions to be relatively well
differentiated as shown by the ? score = 0.14 and ?2 =345.55, and this accounted for 13.66%
of the variation seen in the population. Therefore, the AMOVA showed that there
is more variation within the different accessions rather than between them.
This is actually typical of obligate outbreeding populations and was not

Genetic diversity of the ecotype panel

Population genetics parameters

Table 8 shows the basic statistics produced during the
analysis in Pegas. We evaluated the
possibility of a bottleneck during domestication by comparing observed to
expected heterozygosity and using the FIS values with a null
hypothesis of random mating within each accession. The varieties total showed
no decrease in the HO as compared to the HE (HO=
0.364, HE=0.279), this indicated no bottleneck during domestication,
and the FIS supported no inbreeding depression as caused by small
populations during the breeding process. This was also the case for the
analysis of Milvus, Britta and AberRuby and Broadway and Crossway, when
assessed separately.

A breakdown
analysis of the varieties and the ecotypes (Table 8) revealed that the
varieties were more related to each other than were the ecotypes (FST
= 0.039 v 0.076 Varieties v Ecotypes total), and there were similar levels of genetic
diversity in both the varieties and ecotypes total (DST = 0.011 v
0.02). The analysis also showed that there was very low levels of genetic
diversity within the samples of Milvus, Britta and AberRuby (DST=0.007)
and that they were related populations (FST = 0.022, DEST
= 0.014), this was also the case for Broadway and Crossway (DST=0.004,
FST = 0.013, DEST = 0.012) although this was expected as
they are related varieties.

The four groups, as
defined in UPGMA, (Table 8) revealed that the ecotypes from Asia were the least
related to each other (FST = 0.093, DEST = 0.045) and had
the highest levels of genetic diversity (DST = 0.029). Cluster two
(Europe) was made up of related individuals (FST = 0.047, DEST
= 0.019), that had low levels of genetic diversity (DST = 0.013).
Cluster three (UK) mirrored the European analysis in terms of relatedness and
gene diversity. Cluster four (Iberian Peninsula) proved to be relatively well
differentiated (FST = 0.06, DEST = 0.029), however with
low levels of gene diversity within the samples (DST = 0.029). STRUCTURE
analysis inferred two Iberian populations, this may account for the level of
differentiation in these accessions.

Regions under selection: via outlier detection

Three methods were used to detect outlier markers; basic R
identified 869 loci, BayeScan identified 548 loci and Sam?ada identified 1,021
loci. In basic R, outlier SNP of interest included position Chr7_5219303
(Figure 4a). This SNP is within gene number 38211, a 7 transmembrane MLO family
protein that is involved in mildew resistance. There is evidence in the field
to suggest that the Iranian accessions do have a degree of mildew resistance
over other ecotype accessions. However, this has not been verified under
experimental conditions.

In BayeScan the 584 loci (Figure 4b.) were found in gene
models that have been identified amongst others as transcription factors
regulating growth responses, senescence, seed dormancy, flowering and disease
resistance. Transcription factors are key regulators of gene expression and may
have a direct effect on a plants ability to respond to its environment. These
are therefore important factors when considering selective advantage.

In Sam?ada
1,021 SNP loci significantly correlated to the geography of the collection
sites, after Bonferroni correction for both the Wald and G Score. The majority
of the SNP (85.6% of the 1,020 outliers) were deemed to be under the selective
influence of longitude to some extent (Table 9).

Regression analysis was performed using the principal
components against the geographic co-ordinates. This analysis used all of the
SNP data not just the outliers, and a very high strength of the regression line
(r = 0.93) between PC1 and longitude of the collection sites was found (Figure
5), the other two components of latitude and altitude were not so strongly
correlated (data not shown). This is contributable evidence to the outliers
being under the influence of longitude as a selective force. There were 56
common SNP to all three analyses (Figure 4c., Table 10); these SNP are
potentially adaptive loci and are involved in many diverse functions such as
RNA silencing, photosynthesis, floral development, plant defence and biotic and
abiotic stress.

Haplotype reconstruction

Haplotypes were examined from the SNP that were found to be
outliers, potentially under selective pressure. There were too many haplotype
variates for phastPHASE to identify common haplotypes within the ecotype panel,
therefore the outlier SNP were analysed by PCA to distinguish haplotype
patterns. The selection of SNP on chromosome 7, that were proposed to be
involved in mildew resistance were analysed by PCA and graphed. As a control a
random selection of 100 SNP, from chromosome 7 were also graphed. The PCA
analysis was able to separate the haplotypes of accession Aa3507, from the rest
of the accessions (Figure 6a.), the control SNP showed a clustering pattern that
reflected the population structure PCA (Figure 6.b). These graphs could indicate
further evidence that the haplotypes for the Iranian accessions are different
to those from the rest of the ecotype panel (for these selective SNP). This
process was repeated for the 56 common SNP, however due to the strong
association of longitude no informative clustering was produced. Finally the 548
SNP identified from BayeScan were also graphed by PCA, but again no informative
clustering was seen. In both these cases the pattern of SNP reflected the
population structure PCA graph.

Linkage disequilibrium

The decay plots and heatmaps (Figure 7) show r2
values plotted against physical distance on each of the chromosomes for the
entire population. There was a significant peak in the heatmap for chromosome
seven, and further analysis in Synbreed using the varieties and ecotypes
separately, revealed that this peak was evident in the ecotypes only (figure 8).
This region is close to SNP Chr7_5219303 that was proposed to be under
selection for mildew resistance in the Iranian accessions Aa3506 and Aa3507.

The critical r2 for each chromosome was
calculated for the varieties and the ecotypes separately. For the varieties
this value ranged from 0.026 – 0.032 per chromosome (Table 11), however in the
ecotypes this value was much lower at 0.007 – 0.008 (Table 11). In the
varieties some 0.1% (418 of 3219126) of pairs were completely linked with an r2
= 1 (Table 12); there were no completely linked pairs in the ecotypes. The
average global LD for the varieties was 0.028 with 80% of the markers pairs
being in LD. The average global LD for the ecotypes was 0.008 again with 80% of
the markers pairs being in LD.

Phenotypic assessment

During the experiment, there was considerable loss of plant
material (Table 13). This loss of material was due to poor winter survival and
the inability of the ecotypes to survive the cutting regime. The plants were
given a lax cut at the end of each year to over winter. During the second year,
two harvests were collected. The number of plants that survived into year three
(2017) was little over 30% and so the experiment was scrapped and no further
data were collected. This meant that phenotype measurements were collected in
one year only; however, two sets of measurements were collected where possible.

Flowering date

From the 514 plants that survived into year two, flowering
dates were collected. Of these plants, 415 flowered within the time constraint
and a further 99 had flowered by day 100, the day of harvest. There was a large
range of flowering across the ecotype panel, with the average being at day 70
(Figure 9). All of the plants in three accessions (Britta, Denmark and Finland)
did not flower before the cut off day 81; however, they had flowered by day
100. These are all Scandinavian in origin, and Britta was developed as a late
flowering variety to maximise vegetative growth (Lundin, et al., 1974). As the data were not
normally distributed, it was log10 transformed to improve
homogeneity, and subjected to a GWAS analysis in GAPIT. GAPIT produced a
kinship matrix and this was used along with the PCA components as covariates
within the GWAS analysis to account for relatedness and population structure
(Figure 10a). The analysis produced one SNP of interest above the FDR (Figure
10b.). This SNP was identified as chr3_22736556, which lies within gene 10558
according to the genome assembly, this was a basic leucine zipper transcription
factor. The sequence was BLAST searched at NCBI (Figure 11) and was found to be
synonymous to the VEG2 gene in pea, which is involved in flowering time and
inflorescence development (Sussmilch, et al., 2015). An investigation into the
effects of the alleles at this locus was undertaken, and it was found that the
minor allele had the greatest effect reducing the time to flowering by 11 days.
The average flowering date for homozygous major allele was 73.3 days, for
heterozygotes was 66.6 days and for homozygous minor allele was 62.4 days.
Britta, Denmark and Finland genotypes were all homozygous for the major allele
for this SNP.

Vegetative analysis

In year two of the experiment two sets of measurements were
taken for plant height and width, and stem number. Box plots were drawn to
represent the spread of the data per country. To investigate the degree of
prostrateness in the panel a width:height ratio was calculated (Figure 12). The
data described Broadway and Crossway varieties to be quite prostrate, but not
as prostrate as the Portuguese accession. The UK and Spanish accessions showed
the next highest average prostrateness. The accessions from Spain and Portugal,
contain ecotypes that had the lowest stem heights recorded. The average number
of stems per plant per country was recorded in Figure 13. Broadway and Crossway
varieties had the greatest number of stems per plant; this probably reflects
the plants ability to regrow after cutting. These are average values from two
cuttings in year two.