DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.
2009
Austerlitz, Frédéric | David, Olivier | Schaeffer, Brigitte | Bleakley, Kevin | Olteanu, Madalina | Leblois, Raphael | Veuille, Michel | Laredo, Catherine | Ecologie Systématique et Evolution (ESE) ; Université Paris-Sud - Paris 11 (UP11)-AgroParisTech-Centre National de la Recherche Scientifique (CNRS) | Unité de recherche Mathématiques et Informatique Appliquées (MIA) ; Institut National de la Recherche Agronomique (INRA) | Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe ; Mines Paris - PSL (École nationale supérieure des mines de Paris) ; Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM) | Centre de Bioinformatique (CBIO) ; Mines Paris - PSL (École nationale supérieure des mines de Paris) ; Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL) | Institut Curie [Paris] | Origine, structure et évolution de la biodiversité (OSEB) ; Muséum national d'Histoire naturelle (MNHN)-Centre National de la Recherche Scientifique (CNRS) | Institut de Systématique, Evolution, Biodiversité (ISYEB) ; Muséum national d'Histoire naturelle (MNHN)-Université Pierre et Marie Curie - Paris 6 (UPMC)-École Pratique des Hautes Études (EPHE) ; Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Centre National de la Recherche Scientifique (CNRS) | Laboratoire de Probabilités et Modèles Aléatoires (LPMA) ; Université Pierre et Marie Curie - Paris 6 (UPMC)-Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS) | IFORA
International audience
اظهر المزيد [+] اقل [-]إنجليزي. BACKGROUND: DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. RESULTS: No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. CONCLUSION: The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
اظهر المزيد [+] اقل [-]الكلمات المفتاحية الخاصة بالمكنز الزراعي (أجروفوك)
المعلومات البيبليوغرافية
تم تزويد هذا السجل من قبل Institut national de la recherche agronomique