Cost-effective genomic selection in aquaculture breeding programmes: optimizing genotype imputation and incorporation of functional annotation
2024
Kriaridou, Christina | Robledo, Diego | Fraslin, Clemence | Gorjanc, Gregor | University of Edinburgh: Principal’s Career Development PhD Scholarship
Genomic selection has the potential to significantly enhance genetic progress in aquaculture breeding programmes. However, its widespread adoption has been limited to large companies and a few high value species, mainly due to the high costs associated with genotyping. To address this issue, genotype imputation emerges as a promising solution, offering the possibility of reducing genotyping expenses by predicting ungenotyped variants in low-density and low-coverage datasets. Additionally, it has been hypothesized that the prioritisation of functional variants could allow accurate selection across distant relatives, which could reduce the need for genotyping every generation. This study aims to explore genotype imputation strategies and variant prioritization with the goal of democratizing genomic selection in aquaculture. In the first chapter I investigated genotype imputation of low-density panels across four aquaculture species: Atlantic salmon, turbot, common carp, and Pacific oyster. A total of eight low-density panels were constructed in silico for each species, ranging from 300 to 6000 SNPs, and imputation to high-density was tested using three available genotype imputation software (AlphaImpute v.2, FImpute v.3 and findhap v.4). Subsequently, the genomic prediction accuracy of the various densities was evaluated for each species using the imputed genotypes. Results revealed that FImpute v.3 is the best performing software for parents-to-offspring imputation in aquaculture populations. In terms of prediction accuracy, the low-density and imputed panels generally performed comparably to the high-density panels in fish species. However, for Pacific oyster imputation and genomic prediction accuracy results were significantly lower. Nonetheless, the optimisation of SNP selection for the design of low-density panels may be sufficient to achieve near maximum prediction accuracy in most fish species/populations, suggesting a potential opportunity for reducing genotyping costs. In the second chapter, I examined the accuracy of imputation from low-coverage re-sequencing to whole-genome sequencing in a Nile tilapia population. For the target offspring, we tested six down-sampled datasets representing varying sequencing depths from 0.1X to 5X. These datasets were imputed using GLIMPSE v.1 to two reference panels with whole-genome sequence data: one at 5X sequencing depth and another at 26X sequencing depth. Finally, the cost of the different low-coverage whole-genome sequenced datasets was compared to that of SNP arrays in a hypothetical scenario involving 140 parents and 2100 offspring. Results revealed that imputation accuracy and the number of retained SNPs were higher with a 26X reference sequencing depth compared to 5X. Additionally, imputation accuracy exceeded 90% for all down-sampled target datasets, with higher accuracy at homozygous compared to heterozygous sites for coverages below 5X. While imputation of low-coverage whole-genome sequencing is cheaper than whole-genome sequencing and holds potential benefits for discovering causative variants as well as other genomic analyses between populations, it still remains more expensive than SNP arrays, and therefore is probably not a viable strategy for aquaculture breeding programmes. In the final experimental chapter, a dataset involving a turbot population exposed to the parasite Plilasterides dicentrarchi was analysed. The genotypes of the offspring were imputed to whole-genome genotypes using the whole-genome re-sequenced parents as reference. Various approaches for integrating functional information data into genomic prediction were explored to study the potential advantages of integrating such information. The methodology involved categorizing markers into functional annotations by examining their overlap with regions of the genome potentially influencing protein function or promoter and enhancer regions. Two Bayesian models were tested with and without annotation for comparison alongside GBLUP. It was observed that the integration of functional annotation data did not enhance genomic prediction accuracy, with BayesR and GBLUP demonstrating superior performance or comparable results in certain scenarios. Future investigations should explore different approaches to defining annotation categories for traits with different architectures to optimize the predictive models.
Mostrar más [+] Menos [-]Palabras clave de AGROVOC
Información bibliográfica
Este registro bibliográfico ha sido proporcionado por University of Edinburgh