A comparison of bioinformatic workflows for herbivore dietary analysis using a DNA metabarcoding approach
2023
Botha, Danielle | Barnard, S. | Siebert, F. | Du Plessis, M.G. | 11289856 - Barnard, Sandra (Supervisor) | 21074968 - Siebert, Frances (Supervisor) | 21074968
MSc (Environmental Sciences), North-West University, Potchefstroom Campus
显示更多 [+] 显示较少 [-]Dietary analysis based on faecal DNA metabarcoding is dependent on the taxonomic coverage and quality of records available in DNA barcode reference databases, as well as the subsequent choice of bioinformatics pipeline to filter and assign degraded DNA from herbivorous faecal samples to the correct taxonomies. The aim of this study was to (i) create a comprehensive, study area-specific reference database containing the plant barcodes for all possible herbivore food targets in an eastern semi-arid South African savanna, (ii) compare bioinformatics approaches applied to faecal samples of cattle foraging in an eastern semi-arid South African savanna, and (iii) quantify these diets to determine the relative importance of food items. Sequences for the reference sequence databases of the ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) and transfer RNA leucine (trnL) UAA gene intron, which are commonly used barcodes in herbivore diet metabarcoding studies, were obtained from GenBank and BOLDSystems using a study-area-specific plant species list. The taxonomic reliability of these reference libraries was evaluated based on the presence of a barcode gap, selecting an appropriate identification threshold, and evaluating the identification accuracy using primary distance-based criteria. The performance of common bioinformatics pipelines (VSEARCH, OBITOOLS, and DADA2), and in-silico filtering parameters were compared using a mock community and evaluated based on its ability to recover these species in their initial abundances (high precision and low Root Mean Square Error (RMSE) values). The chosen method was applied to next-generation sequencing (NGS) datasets obtained from cattle foraging at two sites in the eastern savanna ecosystems of South Africa, namely the Syferkuil experimental farm in Limpopo and the rural Welverdiend village in Mpumalanga. The reference database evaluation revealed that barcode gaps were present for 76% of rbcL and 68% of trnL species. The identification success rate of these rbcL and trnL datasets using the k-nn criterion was 85,86% and 73,72%, respectively. According to the mock community, the DADA2 pipeline with frequency- and abundance-based filters as well as post-clustering curation performed best with precision values of 0,6 for the rbcL and trnL datasets and RMSE values of 11,2 and 9,6 for rbcL and trnL, respectively. Applying this pipeline to the rbcL datasets led the recovery of 8 families and 12 genera for Welverdiend, and 9 families and 13 genera for Syferkuil. The trnL datasets performed better without the abundance-based filter, identifying 15 families and 9 genera for Welverdiend and 8 families and 10 genera for Syferkuil. Frequency of occurrence and relative read abundance calculations revealed that forbs, non-graminoid herbaceous plants, may be of greater importance than previously recorded. This study provides a study-area-specific rbcL and trnL reference database that should be used collectively to identify semi-arid eastern savanna flora in South Africa as well as the providing a suitable bioinformatics approach to process DNA amplified from faecal samples with rbcL and trnL barcodes. This reference database and bioinformatics pipeline will improve our understanding of the composition of the South African savanna flora and therefore improve conservation and management measures targeted at these ecosystems.
显示更多 [+] 显示较少 [-]Masters
显示更多 [+] 显示较少 [-]