Genotype calling in a tetraploid diversity panel with polyRAD

Packages for this tutorial

If you don’t already have the following packages, install them:

CRAN_packages <- c("polyRAD", "BiocManager", "qqman")
Bioconductor_packages <- c("pcaMethods", "VariantAnnotation")

BiocManager::install(Bioconductor_packages)
install.packages(CRAN_packages)

Then load them into your R workspace:

library(polyRAD)
library(VariantAnnotation)
library(qqman)

As another quick setup item, I’ll set the seed for random number generation to ensure that your results are identical to mine.

set.seed(50819)

Import the VCF

The VCF for this tutorial was generated by the GBSv2 pipeline in TASSEL 5. A large diversity panel of Miscanthus sacchariflorus was sequenced using a RAD-seq approach with the enzymes PstI and MspI. To make a file small enough for this tutorial, VariantAnnotation was used to filter the dataset by missing data rate and minor allele frequency, as well as subset it to only contain tetraploid individuals, and only SNPs on chromosome 5. The Miscanthus sinensis v7 reference from Phytozome was used.

We can get some basic information about the VCF, including the names of the 268 samples:

vcf_file <- "Msa_tetraploids_Chr05.vcf.bgz"
hdr <- scanVcfHeader(vcf_file)
hdr
## class: VCFHeader
## samples(268): PMS-458 KMS-widespread ... KMS410 JM2014-H-7
## meta(4): fileDate fileformat Tassel contig
## fixed(1): FILTER
## info(3): NS DP AF
## geno(5): GT AD DP GQ PL
samples(hdr)
##   [1] "PMS-458"         "KMS-widespread"  "UI10-00117"     
##   [4] "UI11-00005"      "UI11-00027"      "IGR-2011-005"   
##   [7] "PMS-457"         "JY186"           "JY192"          
##  [10] "JY182"           "JY009"           "KMS408"         
##  [13] "JY008"           "JY127"           "KMS417"         
##  [16] "JY006"           "JY129"           "JM2014-S-3"     
##  [19] "JM2014-M-3"      "JY183"           "JM2014-M-4"     
##  [22] "JY178"           "JM2014-H-68"     "JM2014-H-11"    
##  [25] "JM2014-H-41"     "JPN-2010-009"    "JY172"          
##  [28] "JMShiraN24"      "JY193"           "JY143"          
##  [31] "JPN-2010-014"    "KMS419"          "JY176"          
##  [34] "JM2014-H-67"     "JM2014-H-72"     "JM2014-M-14"    
##  [37] "JPN-2010-004"    "JM2014-K-8"      "JY141"          
##  [40] "JPN-2011-012"    "JPN-2011-004"    "JM11-037"       
##  [43] "JY185"           "JM2014-H-24"     "JPN-2011-019"   
##  [46] "JY168"           "JY166"           "KMS423"         
##  [49] "UI11-00007"      "UI11-00032"      "JPN-2011-001"   
##  [52] "JM1N24"          "JM2014-H-23"     "JPN-2011-023"   
##  [55] "JMMiyakonojoFSC" "JPN-2010-001"    "UI11-00039"     
##  [58] "JY165"           "JM2014-H-52"     "JY179"          
##  [61] "JM2014-H-28"     "KMS351"          "KMS268"         
##  [64] "JM2014-H-30"     "JPN-2010-008"    "KMS263"         
##  [67] "KMS262"          "JPN-2011-020"    "JM2014-H-58"    
##  [70] "JY162"           "JM2014-S-6"      "JY171"          
##  [73] "KMS446"          "JM2014-H-48"     "JPN-2011-003"   
##  [76] "KMS422"          "JY126"           "KMS451"         
##  [79] "KMS329"          "KMS468"          "KMS400"         
##  [82] "JM11-013"        "JY134"           "JPN-2011-013"   
##  [85] "JM11-027"        "JM2014-K-7"      "KMS405"         
##  [88] "JM2014-H-69"     "JPN-2010-002"    "JM2014-H-13"    
##  [91] "JM11-019"        "JPN-2010-012"    "RU2012-055"     
##  [94] "JM2014-H-42"     "JPN-2010-010"    "JPN-2011-011"   
##  [97] "JM2014-H-77"     "JPN-2011-025"    "JPN-2010-005"   
## [100] "JMKikuchi1N24"   "JM2014-M-5"      "JM2014-H-27"    
## [103] "JM2014-H-25"     "KMS448"          "JM2014-S-2"     
## [106] "KMS399"          "KMS257"          "JM2014-S-5"     
## [109] "JPN-2011-002"    "JM2014-H-22"     "RU2012-023"     
## [112] "JPN-2011-022"    "UI11-00037"      "KMS324"         
## [115] "JM2014-H-19"     "JM2014-M-7"      "JPN-2010-015"   
## [118] "JM2014-H-3"      "JM2014-M-2"      "JM2014-H-44"    
## [121] "JY173"           "JY140"           "JM2014-H-16"    
## [124] "JPN-2011-014"    "JM11-040"        "JM2014-H-29"    
## [127] "JM2014-H-71"     "KMS467"          "JPN-2011-010"   
## [130] "JM2014-K-4"      "JM11-006"        "KMS347"         
## [133] "JPN-2011-015"    "JY177"           "KMS278"         
## [136] "JPN-2010-007"    "JM2014-H-51"     "JY010"          
## [139] "JM2014-K-13"     "JPN-2011-005"    "JM2014-H-70"    
## [142] "JM2014-H-66"     "KMS450"          "KMS412"         
## [145] "JM2014-H-43"     "KMS453"          "KMS435"         
## [148] "JM2014-H-26"     "KMS392"          "KMS319"         
## [151] "JM2014-H-18"     "KMS444"          "JY137"          
## [154] "JM2014-M-12"     "JPN-2011-008"    "JPN-2010-006"   
## [157] "JPN-2010-003"    "JM2014-H-12"     "JM2014-S-1"     
## [160] "JPN-2010-013"    "JM2014-H-55"     "UI11-00033"     
## [163] "JM2014-H-47"     "UI11-00035"      "JPN-2011-016"   
## [166] "JM2014-H-46"     "UI11-00036"      "JPN-2011-017"   
## [169] "JM2014-H-78"     "KMS407"          "KMS341"         
## [172] "KMS345"          "JM2014-H-76"     "JPN-2011-006"   
## [175] "KMS427"          "JM2014-H-73"     "KMS452"         
## [178] "JM2014-K-15"     "JM2014-H-14"     "JPN-2010-011"   
## [181] "KMS395"          "KMS253"          "JM2014-H-63"    
## [184] "JM2014-H-10"     "JM2014-K-2"      "JY013"          
## [187] "JY159"           "JM2014-H-4"      "JM11-002"       
## [190] "KMS316"          "UI11-00038"      "KMS420"         
## [193] "JM2014-K-5"      "KMS413"          "JM2014-H-5"     
## [196] "KMS430"          "JM2014-H-75"     "JM2014-M-9"     
## [199] "JPN-2011-018"    "JM2014-H-20"     "JM2014-H-50"    
## [202] "JM2014-H-57"     "KMS330"          "KMS403"         
## [205] "JM2014-H-62"     "KMS354"          "KMS348"         
## [208] "JM2014-K-6"      "JM2014-H-2"      "KMS418"         
## [211] "JM2014-K-11"     "JM2014-H-8"      "KMS380"         
## [214] "JM2014-H-64"     "KMS449"          "KMS328"         
## [217] "KMS414"          "JM2014-H-56"     "JM2014-M-11"    
## [220] "JM2014-K-10"     "JY118-1"         "JY174"          
## [223] "JY152"           "JY191"           "KMS415"         
## [226] "JY175"           "JM2014-H-9"      "JM2014-M-1"     
## [229] "JM2014-H-54"     "JM2014-H-49"     "JY138"          
## [232] "KMS390"          "KMS465"          "KMS346"         
## [235] "KMS332"          "KMS402"          "KMS411"         
## [238] "JMNaganumaFSC"   "JM2014-H-6"      "JM11-024-1"     
## [241] "JM11-024-2"      "RU2012-056.1"    "UI11-00034"     
## [244] "KMS393"          "KMS397"          "JM2014-H-45"    
## [247] "KMS275"          "KMS409"          "JM2014-H-53"    
## [250] "JM2014-H-60"     "JY136"           "JY146"          
## [253] "JY149"           "KMS434-47"       "KMS437"         
## [256] "JM2014-M-6"      "JM2014-M-10"     "JM2014-M-13"    
## [259] "KMS352"          "JMKamishoN24"    "JMTomatoN24"    
## [262] "JM2014-H-21"     "KMS387"          "JY144"          
## [265] "KMS340"          "KMS342"          "KMS410"         
## [268] "JM2014-H-7"

The VCF can be imported into a RADdata object for genotype calling in polyRAD. There can be a lot of missing data using RAD-seq, so we’ll set the filtering threshold to only retain markers where at least 140 individuals (slightly more than half) have non-missing data. We’ll also say that we want at least five individuals to have the minor allele, in order to prevent false positives that can result from individuals with extreme phenotypes being the only ones to posess a rare allele. Using the possiblePloidies argument, we will indicate that the organism is autotetraploid. Since I know that this is a small dataset, I am also setting expectedAlleles and expectedLoci below their default values to save computation time and memory.

myRAD <- VCF2RADdata(vcf_file,
                     min.ind.with.reads = 140,
                     min.ind.with.minor.allele = 5,
                     possiblePloidies = list(4),
                     expectedAlleles = 5e4,
                     expectedLoci = 2e4)
## Reading file...

## Unpacking data from VCF...

## Filtering markers...

## Phasing 4268 SNPs on chromosome 05

## Reading file...

## Unpacking data from VCF...

## Filtering markers...

## Phasing 4333 SNPs on chromosome 05

## Reading file...

## Unpacking data from VCF...

## Filtering markers...

## Phasing 4322 SNPs on chromosome 05

## Reading file...

## Unpacking data from VCF...

## Filtering markers...

## Phasing 883 SNPs on chromosome 05

## Reading file...

## 5097 loci imported.

## Building RADdata object...

## Merging rare haplotypes...

## 5097 markers retained out of 5097 originally.
myRAD
## ## RADdata object ##
## 268 taxa and 5097 loci
## 54160327 total reads
## Assumed sample cross-contamination rate of 0.001
##
## Possible ploidies:
## Autotetraploid (4)

We can see that we imported 5097 loci. The default behavior of VCF2RADdata is also to phase SNPs that appeared to originate from the same tag location, assuming a reduced-representation sequencing technique. This means that for many loci, we will have multiallelic haplotype-based markers rather than biallelic SNPs. The following code gives us a sense of how many alleles per locus we tend to have.

table(table(myRAD$alleles2loc))
##
##    2    3    4    5    6    7    8    9   10   11   12   13   15   16   17
## 1829 1230  844  536  289  168   85   47   33    9    9    4    2    2    1
##   19   20   21
##    3    3    3

Most markers are biallelic or triallelic, but there are many with more alleles.

We can also get a quick picture of population structure.

myRAD <- AddPCA(myRAD)
plot(myRAD)

There appear to be three major groups, which makes sense given previous findings indicating mainland Asian tetraploids, S Japan tetraploids, and N Japan tetraploids being distinct populations. We can see which individuals are in which group.

sort(GetTaxa(myRAD)[myRAD$PCA[,1] < -5]) # N Japan
##  [1] "JM1N24"        "JM2014-H-10"   "JM2014-H-12"   "JM2014-H-13"  
##  [5] "JM2014-H-14"   "JM2014-H-18"   "JM2014-H-19"   "JM2014-H-2"   
##  [9] "JM2014-H-20"   "JM2014-H-21"   "JM2014-H-22"   "JM2014-H-23"  
## [13] "JM2014-H-24"   "JM2014-H-25"   "JM2014-H-26"   "JM2014-H-27"  
## [17] "JM2014-H-28"   "JM2014-H-29"   "JM2014-H-3"    "JM2014-H-30"  
## [21] "JM2014-H-4"    "JM2014-H-41"   "JM2014-H-42"   "JM2014-H-43"  
## [25] "JM2014-H-44"   "JM2014-H-45"   "JM2014-H-46"   "JM2014-H-47"  
## [29] "JM2014-H-48"   "JM2014-H-49"   "JM2014-H-5"    "JM2014-H-50"  
## [33] "JM2014-H-51"   "JM2014-H-52"   "JM2014-H-53"   "JM2014-H-54"  
## [37] "JM2014-H-55"   "JM2014-H-56"   "JM2014-H-57"   "JM2014-H-58"  
## [41] "JM2014-H-6"    "JM2014-H-60"   "JM2014-H-62"   "JM2014-H-63"  
## [45] "JM2014-H-66"   "JM2014-H-67"   "JM2014-H-68"   "JM2014-H-69"  
## [49] "JM2014-H-7"    "JM2014-H-70"   "JM2014-H-71"   "JM2014-H-72"  
## [53] "JM2014-H-73"   "JM2014-H-75"   "JM2014-H-76"   "JM2014-H-77"  
## [57] "JM2014-H-78"   "JM2014-H-8"    "JM2014-H-9"    "JMKamishoN24"
## [61] "JMNaganumaFSC" "JMTomatoN24"   "JPN-2010-003"  "JPN-2010-005"
## [65] "JPN-2011-008"  "JPN-2011-010"
sort(GetTaxa(myRAD)[myRAD$PCA[,1] > -5 & myRAD$PCA[,1] < 1]) # S Japan
##  [1] "IGR-2011-005"    "JM11-027"        "JM11-040"       
##  [4] "JM2014-H-11"     "JM2014-H-64"     "JM2014-K-10"    
##  [7] "JM2014-K-11"     "JM2014-K-13"     "JM2014-K-15"    
## [10] "JM2014-K-2"      "JM2014-K-4"      "JM2014-K-5"     
## [13] "JM2014-K-8"      "JM2014-M-1"      "JM2014-M-10"    
## [16] "JM2014-M-11"     "JM2014-M-12"     "JM2014-M-13"    
## [19] "JM2014-M-14"     "JM2014-M-2"      "JM2014-M-3"     
## [22] "JM2014-M-4"      "JM2014-M-5"      "JM2014-M-6"     
## [25] "JM2014-M-7"      "JM2014-M-9"      "JM2014-S-1"     
## [28] "JM2014-S-5"      "JM2014-S-6"      "JMKikuchi1N24"  
## [31] "JMMiyakonojoFSC" "JMShiraN24"      "JPN-2010-001"   
## [34] "JPN-2010-002"    "JPN-2010-004"    "JPN-2010-006"   
## [37] "JPN-2010-007"    "JPN-2010-008"    "JPN-2010-009"   
## [40] "JPN-2010-010"    "JPN-2010-011"    "JPN-2010-012"   
## [43] "JPN-2010-013"    "JPN-2010-014"    "JPN-2010-015"   
## [46] "JPN-2011-001"    "JPN-2011-002"    "JPN-2011-003"   
## [49] "JPN-2011-004"    "JPN-2011-005"    "JPN-2011-006"   
## [52] "JPN-2011-011"    "JPN-2011-012"    "JPN-2011-013"   
## [55] "JPN-2011-014"    "JPN-2011-015"    "JPN-2011-016"   
## [58] "JPN-2011-017"    "JPN-2011-018"    "JPN-2011-019"   
## [61] "JPN-2011-020"    "JPN-2011-022"    "JPN-2011-023"   
## [64] "JPN-2011-025"    "UI10-00117"      "UI11-00005"     
## [67] "UI11-00007"      "UI11-00027"      "UI11-00032"     
## [70] "UI11-00033"      "UI11-00034"      "UI11-00035"     
## [73] "UI11-00036"      "UI11-00037"      "UI11-00038"     
## [76] "UI11-00039"
sort(GetTaxa(myRAD)[myRAD$PCA[,1] > 1]) # mainland and a few from Japan
##   [1] "JM11-002"       "JM11-006"       "JM11-013"       "JM11-019"      
##   [5] "JM11-024-1"     "JM11-024-2"     "JM11-037"       "JM2014-H-16"   
##   [9] "JM2014-K-6"     "JM2014-K-7"     "JM2014-S-2"     "JM2014-S-3"    
##  [13] "JY006"          "JY008"          "JY009"          "JY010"         
##  [17] "JY013"          "JY118-1"        "JY126"          "JY127"         
##  [21] "JY129"          "JY134"          "JY136"          "JY137"         
##  [25] "JY138"          "JY140"          "JY141"          "JY143"         
##  [29] "JY144"          "JY146"          "JY149"          "JY152"         
##  [33] "JY159"          "JY162"          "JY165"          "JY166"         
##  [37] "JY168"          "JY171"          "JY172"          "JY173"         
##  [41] "JY174"          "JY175"          "JY176"          "JY177"         
##  [45] "JY178"          "JY179"          "JY182"          "JY183"         
##  [49] "JY185"          "JY186"          "JY191"          "JY192"         
##  [53] "JY193"          "KMS-widespread" "KMS253"         "KMS257"        
##  [57] "KMS262"         "KMS263"         "KMS268"         "KMS275"        
##  [61] "KMS278"         "KMS316"         "KMS319"         "KMS324"        
##  [65] "KMS328"         "KMS329"         "KMS330"         "KMS332"        
##  [69] "KMS340"         "KMS341"         "KMS342"         "KMS345"        
##  [73] "KMS346"         "KMS347"         "KMS348"         "KMS351"        
##  [77] "KMS352"         "KMS354"         "KMS380"         "KMS387"        
##  [81] "KMS390"         "KMS392"         "KMS393"         "KMS395"        
##  [85] "KMS397"         "KMS399"         "KMS400"         "KMS402"        
##  [89] "KMS403"         "KMS405"         "KMS407"         "KMS408"        
##  [93] "KMS409"         "KMS410"         "KMS411"         "KMS412"        
##  [97] "KMS413"         "KMS414"         "KMS415"         "KMS417"        
## [101] "KMS418"         "KMS419"         "KMS420"         "KMS422"        
## [105] "KMS423"         "KMS427"         "KMS430"         "KMS434-47"     
## [109] "KMS435"         "KMS437"         "KMS444"         "KMS446"        
## [113] "KMS448"         "KMS449"         "KMS450"         "KMS451"        
## [117] "KMS452"         "KMS453"         "KMS465"         "KMS467"        
## [121] "KMS468"         "PMS-457"        "PMS-458"        "RU2012-023"    
## [125] "RU2012-055"     "RU2012-056.1"

Preliminary genotype calling and estimation of overdispersion

We will do a quick genotype estimation so that we can test how overdispersed the distribution of sequencing reads is. We may want to adjust the model if it seems like read depth diverges very far from the expected ratios.

myRAD_pre <- IterateHWE(myRAD, tol = 0.001)
## Starting iteration 1

## Mean difference in allele frequencies of 0.00774877963203997

## Starting iteration 2

## Mean difference in allele frequencies of 0.00212604559575315

## Starting iteration 3

## Mean difference in allele frequencies of 0.000878928453068841
testod <- TestOverdispersion(myRAD_pre, to_test = 7:11)

The we can visualize the overdispersion results.

qq(testod[["7"]])

qq(testod[["8"]])

qq(testod[["9"]])

qq(testod[["10"]])

qq(testod[["11"]])

It seems that the dots follow the red line most closely for 8 and 9, so we will keep the default of overdispersion = 9. The lower the value that we choose, the messier the data are assumed to be.

Calling the genotypes

Now we will use the population structure pipeline to call the genotypes, since we know there is population structure. Miscanthus are wild, outcrossing grasses with very little linkage disequilibrium, so we will use the model without linkage disequilibrium.

myRAD <- IteratePopStruct(myRAD)
## Performing initial PCA and allele frequency estimation.

## Starting iteration 1

## PCs used: 4

## Mean difference in allele frequencies of 0.0110771855801767

## Starting iteration 2

## PCs used: 5

## Mean difference in allele frequencies of 0.0076788298312621

## Starting iteration 3

## PCs used: 6

## Mean difference in allele frequencies of 0.00728463433866155

## Starting iteration 4

## PCs used: 7

## Mean difference in allele frequencies of 0.00518479027794337

## Starting iteration 5

## PCs used: 8

## Mean difference in allele frequencies of 0.00476464299499636

## Starting iteration 6

## PCs used: 9

## Mean difference in allele frequencies of 0.00188961685124673

## Starting iteration 7

## PCs used: 9

## Mean difference in allele frequencies of 0.00112218001790855

## Starting iteration 8

## PCs used: 9

## Mean difference in allele frequencies of 0.000792682568668029

We can plot population structure again and see that it has tightened up now that we have estimated allele dosage more accurately.

plot(myRAD)

Exploring the results

We can look at the allele depth matrix, and see that the first four alleles belong to the same locus.

myRAD$alleleDepth[1:10,1:5]
##                S05_51928_TTC S05_51928_TTA S05_51928_TCC S05_51928_ATC
## PMS-458                   35             0             0             4
## KMS-widespread            62            25             0            25
## UI10-00117                 5             0             0             0
## UI11-00005                 0             0             0             6
## UI11-00027                 2             0             0             0
## IGR-2011-005              14             0             0             0
## PMS-457                   62             0             0             5
## JY186                      0             0             0             0
## JY192                      3             0             0             0
## JY182                      0             0             0             0
##                S05_81981_GC
## PMS-458                  13
## KMS-widespread           20
## UI10-00117                2
## UI11-00005                0
## UI11-00027                0
## IGR-2011-005              1
## PMS-457                  24
## JY186                     0
## JY192                     0
## JY182                     2

polyRAD not only estimated the overall frequency of these alleles in the population, but also the frequency in the local subpopulation that each individual originated from.

myRAD$alleleFreq[1:4]
## S05_51928_TTC S05_51928_TTA S05_51928_TCC S05_51928_ATC
##    0.79352144    0.04363243    0.03595037    0.11726594
myRAD$alleleFreqByTaxa[1:10,1:4]
##                S05_51928_TTC S05_51928_TTA S05_51928_TCC S05_51928_ATC
## PMS-458            0.7009366    0.06689353  0.0149683629    0.21720151
## KMS-widespread     0.7275313    0.05904669  0.0001000000    0.21332200
## UI10-00117         0.8117656    0.06977349  0.0272840775    0.09117678
## UI11-00005         0.8105773    0.09030955  0.0001000000    0.09901317
## UI11-00027         0.8334041    0.00010000  0.0541324680    0.11236340
## IGR-2011-005       0.8649998    0.00010000  0.0001000000    0.13480015
## PMS-457            0.6992147    0.06920085  0.0124894197    0.21909507
## JY186              0.7239559    0.07083529  0.0005745996    0.20463421
## JY192              0.7360167    0.06884988  0.0001000000    0.19503338
## JY182              0.7129562    0.07785057  0.0001000000    0.20909326

These local allele frequencies were used for estimating local genotype frequencies under Hardy-Weinberg equilibrium (although we could have also specified a self-fertilization rate).

myRAD$priorProb[[1]][,1:5,1:4]
## , , S05_51928_TTC
##
##      PMS-458 KMS-widespread  UI10-00117  UI11-00005   UI11-00027
## 0 0.00781727    0.005525035 0.001279212 0.001345205 0.0007583654
## 1 0.07389110    0.058960734 0.021939358 0.022715621 0.0152462297
## 2 0.26191504    0.235951078 0.141103098 0.143844112 0.1149417231
## 3 0.41261614    0.419660921 0.403335705 0.404834056 0.3851327316
## 4 0.24376046    0.279902232 0.432342628 0.427261005 0.4839209502
##
## , , S05_51928_TTA
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 7.604873e-01   7.827834e-01 7.481228e-01 6.804546e-01 9.996001e-01
## 1 2.155099e-01   1.976932e-01 2.251651e-01 2.749926e-01 3.998800e-04
## 2 2.290201e-02   1.872290e-02 2.541328e-02 4.167487e-02 5.998800e-08
## 3 1.081677e-03   7.880833e-04 1.274787e-03 2.807017e-03 3.999600e-12
## 4 1.915812e-05   1.243948e-05 2.397983e-05 7.090012e-05 1.000000e-16
##
## , , S05_51928_TCC
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 9.439547e-01   9.996001e-01 8.952131e-01 9.996001e-01 8.016712e-01
## 1 5.483903e-02   3.998800e-04 1.004782e-01 3.998800e-04 1.822031e-01
## 2 1.194702e-03   5.998800e-08 4.229104e-03 5.998800e-08 1.552910e-02
## 3 1.156770e-05   3.999600e-12 7.911200e-05 3.999600e-12 5.882400e-04
## 4 4.200159e-08   1.000000e-16 5.549677e-07 1.000000e-16 8.355915e-06
##
## , , S05_51928_ATC
##
##       PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 0.376123140    0.383329282 6.801985e-01 6.571191e-01 0.6215713693
## 1 0.416640878    0.415355963 2.751711e-01 2.909200e-01 0.3138448335
## 2 0.173071266    0.168771912 4.174468e-02 4.829857e-02 0.0594251427
## 3 0.031952552    0.030478766 2.814602e-03 3.563793e-03 0.0050008401
## 4 0.002212164    0.002064077 7.116462e-05 9.861022e-05 0.0001578145

Genotype likelihoods were estimated from allelic read depth for each individual, under a beta-binomial distribution using our selected overdispersion parameter.

myRAD$genotypeLikelihood[[1]][,1:5,1:4]
## , , S05_51928_TTC
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 3.298648e-10   2.680012e-07 1.126415e-06 9.961945e-01 7.991885e-05
## 1 4.680064e-05   3.543697e-03 6.646217e-03 2.598653e-01 8.154920e-02
## 2 4.348873e-03   1.927306e-02 6.654827e-02 4.495268e-02 2.752936e-01
## 3 6.093434e-02   8.683630e-03 3.107182e-01 3.417380e-03 5.813131e-01
## 4 2.143139e-04   4.193993e-07 9.991412e-01 1.035744e-07 9.996077e-01
##
## , , S05_51928_TTA
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 9.993247e-01   2.273458e-06 9.998185e-01 9.997904e-01 9.999171e-01
## 1 1.592011e-02   2.417721e-02 3.109856e-01 2.610470e-01 5.815493e-01
## 2 1.243339e-04   6.254557e-03 6.662716e-02 4.523083e-02 2.754566e-01
## 3 2.891469e-07   1.422989e-04 6.659240e-03 3.451559e-03 8.163895e-02
## 4 7.278120e-13   1.687061e-09 1.361707e-06 4.871612e-07 9.645993e-05
##
## , , S05_51928_TCC
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 9.994436e-01   9.991426e-01 9.998504e-01 9.998273e-01 9.999317e-01
## 1 1.592228e-02   1.900323e-03 3.109982e-01 2.610592e-01 5.815604e-01
## 2 1.243543e-04   1.653809e-06 6.663088e-02 4.523369e-02 2.754642e-01
## 3 2.892095e-07   3.980257e-10 6.659854e-03 3.451910e-03 8.164319e-02
## 4 7.338719e-13   9.652075e-17 1.372842e-06 4.911514e-07 9.724142e-05
##
## , , S05_51928_ATC
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 1.216964e-04   6.115610e-06 9.995122e-01 5.871567e-08 0.9997772071
## 1 6.096898e-02   2.417742e-02 3.108647e-01 3.413332e-03 0.5814424802
## 2 4.354402e-03   6.249826e-03 6.659148e-02 4.491968e-02 0.2753828659
## 3 4.689598e-05   1.420821e-04 6.653349e-03 2.597250e-01 0.0815983641
## 4 3.680480e-10   1.553405e-09 1.255141e-06 9.957674e-01 0.0000889747

Together, the genotype priors and likelihoods were used to estimate the genotype posterior probabilities.

myRAD$posteriorProb[[1]][,1:5,1:4]
## , , S05_51928_TTC
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 9.790863e-11   1.762603e-07 2.542070e-09 8.878984e-02 8.184699e-08
## 1 1.313028e-04   2.487151e-02 2.572437e-04 3.911142e-01 1.679025e-03
## 2 4.324811e-02   5.413214e-01 1.656608e-02 4.284285e-01 4.273156e-02
## 3 9.546370e-01   4.337929e-01 2.210954e-01 9.166447e-02 3.023399e-01
## 4 1.983552e-03   1.397387e-05 7.620813e-01 2.932085e-06 6.532494e-01
##
## , , S05_51928_TTA
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 9.955020e-01   3.632878e-04 9.125000e-01 9.022793e-01 9.997674e-01
## 1 4.494244e-03   9.757086e-01 8.542407e-02 9.520784e-02 2.326081e-04
## 2 3.729983e-06   2.390518e-02 2.065622e-03 2.500010e-03 1.652822e-08
## 3 4.096941e-10   2.289264e-05 1.035622e-05 1.284970e-05 3.266049e-13
## 4 1.826483e-17   4.284058e-12 3.983534e-11 4.580918e-11 9.648408e-21
##
## , , S05_51928_TCC
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 9.990752e-01   9.999992e-01 9.659718e-01 9.998956e-01 8.790577e-01
## 1 9.246638e-04   7.608570e-07 3.372350e-02 1.044412e-04 1.161987e-01
## 2 1.573292e-07   9.933346e-14 3.041073e-04 2.714750e-09 4.690967e-03
## 3 3.542815e-12   1.593946e-21 5.686043e-07 1.381273e-14 5.266538e-05
## 4 3.264185e-20   9.664215e-33 8.222260e-13 4.913815e-23 8.910376e-10
##
## , , S05_51928_ATC
##
##        PMS-458 KMS-widespread   UI10-00117   UI11-00005   UI11-00027
## 0 1.746851e-03   2.111270e-04 8.850054e-01 9.216317e-06 7.572093e-01
## 1 9.694351e-01   9.044040e-01 1.113516e-01 2.371982e-01 2.223532e-01
## 2 2.876083e-02   9.499486e-02 3.618612e-03 5.182391e-01 1.994017e-02
## 3 5.718591e-05   3.900037e-04 2.437696e-05 2.210983e-01 4.972172e-04
## 4 3.107204e-11   2.887637e-10 1.162730e-10 2.345520e-02 1.710942e-08

From these, we can estimate posterior mean genotypes that give us a rough idea of allele copy number for each individual and allele.

posteriormean <- GetWeightedMeanGenotypes(myRAD, maxval = 4,
                                          omit1allelePerLocus = FALSE)
posteriormean[1:10,1:4]
##                S05_51928_TTC S05_51928_TTA S05_51928_TCC S05_51928_ATC
## PMS-458             2.958473  4.501705e-03  9.249785e-04    1.02712836
## KMS-widespread      2.408949  1.023588e+00  7.608572e-07    1.09556374
## UI10-00117          3.745001  8.958638e-02  3.433342e-02    0.11866192
## UI11-00005          1.522976  1.002464e-01  1.044467e-04    2.03079216
## UI11-00027          3.607160  2.326412e-04  1.257386e-01    0.26372531
## IGR-2011-005        3.945351  3.636579e-05  3.636630e-05    0.05456639
## PMS-457             2.984761  1.617130e-03  2.678331e-04    1.00924317
## JY186               2.897223  2.821807e-01  1.829205e-03    0.81876693
## JY192               3.478811  1.295151e-01  1.000935e-03    0.37765718
## JY182               2.852301  3.108272e-01  4.000000e-04    0.83647140

Exporting the genotypes

We can export to GAPIT format so that we may perform GWAS next.

gapit_geno <- ExportGAPIT(myRAD)

Since many genotypes are uncertain, ExportGAPIT exports the posterior mean genotypes, that is, numeric genotypes averaged across the posterior probability distribution for allele copy number. To match the GAPIT documentation, homozygotes are coded as 0 and 2, with heterozygotes (or possible heterozygotes) as intermediate numbers.

gapit_geno$GD[1:20,1:6]
##                          taxa S05_51928_TTA S05_51928_TCC S05_51928_ATC
## PMS-458               PMS-458  2.250853e-03  4.624893e-04    0.51356418
## KMS-widespread KMS-widespread  5.117938e-01  3.804286e-07    0.54778187
## UI10-00117         UI10-00117  4.479319e-02  1.716671e-02    0.05933096
## UI11-00005         UI11-00005  5.012321e-02  5.222333e-05    1.01539608
## UI11-00027         UI11-00027  1.163206e-04  6.286932e-02    0.13186265
## IGR-2011-005     IGR-2011-005  1.818289e-05  1.818315e-05    0.02728319
## PMS-457               PMS-457  8.085649e-04  1.339165e-04    0.50462159
## JY186                   JY186  1.410903e-01  9.146026e-04    0.40938347
## JY192                   JY192  6.475757e-02  5.004677e-04    0.18882859
## JY182                   JY182  1.554136e-01  2.000000e-04    0.41823570
## JY009                   JY009  4.737684e-02  4.609314e-02    0.12572257
## KMS408                 KMS408  4.498019e-02  6.221143e-05    0.13935576
## JY008                   JY008  1.064737e-01  1.425136e-01    0.31704991
## JY127                   JY127  5.168304e-02  7.390955e-03    0.16899229
## KMS417                 KMS417  1.516569e-01  2.000000e-04    0.30250595
## JY006                   JY006  7.228857e-02  8.725464e-02    0.17663356
## JY129                   JY129  8.087092e-02  2.623120e-02    0.23759003
## JM2014-S-3         JM2014-S-3  1.395775e-01  1.813536e-01    0.17557857
## JM2014-M-3         JM2014-M-3  1.679849e-01  5.440694e-02    0.16818876
## JY183                   JY183  5.497007e-01  9.255229e-05    0.21393899
##                S05_81981_GG S05_81981_AC
## PMS-458           0.1059852  0.019013812
## KMS-widespread    0.7728801  0.568991397
## UI10-00117        0.9107656  0.051898185
## UI11-00005        0.7991817  0.172066627
## UI11-00027        0.6194249  0.137090278
## IGR-2011-005      0.7927086  0.060544777
## PMS-457           0.0446502  0.007844267
## JY186             1.0826404  0.119715338
## JY192             0.7864860  0.196589833
## JY182             0.4784271  0.085799790
## JY009             0.6868319  0.107967309
## KMS408            0.7262018  0.155990387
## JY008             0.7061634  0.135149795
## JY127             0.6742985  0.113454773
## KMS417            0.5145202  0.255603213
## JY006             0.6913453  0.118995214
## JY129             1.2165416  0.101933127
## JM2014-S-3        0.9370831  0.148361062
## JM2014-M-3        0.8034489  0.115272754
## JY183             0.7462249  0.148613452
gapit_geno$GM[1:5,]
##            Name Chromosome Position
## 1 S05_51928_TTA          5    51928
## 2 S05_51928_TCC          5    51928
## 3 S05_51928_ATC          5    51928
## 4  S05_81981_GG          5    81981
## 5  S05_81981_AC          5    81981

We don’t see many values near 2, since the common allele is omitted for each marker. A value around 0.5 indicates an AAAB genotype, 1 an AABB genotype, and 1.5 an ABBB genotype.

We’ll export this data to CSV.

write.csv(gapit_geno$GD, file = "Msa_tetraploids_Chr05_GD.csv",
          row.names = FALSE)
write.csv(gapit_geno$GM, file = "Msa_tetraploids_Chr05_GM.csv",
          row.names = FALSE)

For genomic prediction, we will export to rrBLUP. Similarly, posterior mean genotypes will be exported and common alleles omitted. Genotypes range from -1 to 1.

rrb_geno <- Export_rrBLUP_Amat(myRAD)
rrb_geno[1:20,1:5]
##                S05_51928_TTA S05_51928_TCC S05_51928_ATC S05_81981_GG
## PMS-458           -0.9977491    -0.9995375   -0.48643582  -0.89401482
## KMS-widespread    -0.4882062    -0.9999996   -0.45221813  -0.22711990
## UI10-00117        -0.9552068    -0.9828333   -0.94066904  -0.08923438
## UI11-00005        -0.9498768    -0.9999478    0.01539608  -0.20081827
## UI11-00027        -0.9998837    -0.9371307   -0.86813735  -0.38057506
## IGR-2011-005      -0.9999818    -0.9999818   -0.97271681  -0.20729142
## PMS-457           -0.9991914    -0.9998661   -0.49537841  -0.95534980
## JY186             -0.8589097    -0.9990854   -0.59061653   0.08264042
## JY192             -0.9352424    -0.9994995   -0.81117141  -0.21351398
## JY182             -0.8445864    -0.9998000   -0.58176430  -0.52157289
## JY009             -0.9526232    -0.9539069   -0.87427743  -0.31316810
## KMS408            -0.9550198    -0.9999378   -0.86064424  -0.27379817
## JY008             -0.8935263    -0.8574864   -0.68295009  -0.29383659
## JY127             -0.9483170    -0.9926090   -0.83100771  -0.32570147
## KMS417            -0.8483431    -0.9998000   -0.69749405  -0.48547984
## JY006             -0.9277114    -0.9127454   -0.82336644  -0.30865466
## JY129             -0.9191291    -0.9737688   -0.76240997   0.21654157
## JM2014-S-3        -0.8604225    -0.8186464   -0.82442143  -0.06291690
## JM2014-M-3        -0.8320151    -0.9455931   -0.83181124  -0.19655112
## JY183             -0.4502993    -0.9999074   -0.78606101  -0.25377510
##                S05_81981_AC
## PMS-458          -0.9809862
## KMS-widespread   -0.4310086
## UI10-00117       -0.9481018
## UI11-00005       -0.8279334
## UI11-00027       -0.8629097
## IGR-2011-005     -0.9394552
## PMS-457          -0.9921557
## JY186            -0.8802847
## JY192            -0.8034102
## JY182            -0.9142002
## JY009            -0.8920327
## KMS408           -0.8440096
## JY008            -0.8648502
## JY127            -0.8865452
## KMS417           -0.7443968
## JY006            -0.8810048
## JY129            -0.8980669
## JM2014-S-3       -0.8516389
## JM2014-M-3       -0.8847272
## JY183            -0.8513865

Since this is a matrix, we will save it to an R workspace to simplify import later.

save(rrb_geno, file = "Msa_tetraploids_Chr05_rrb.RData")