SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines
2017
Audoux, Jérôme | Salson, Mikaël | Grosset, Christophe | Beaumeunier, Sacha | Holder, Jean-Marc | Commes, Thérèse | Philippe, Nicolas | SeqOne [CHRU Montpellier] ; Centre Hospitalier Régional Universitaire [Montpellier] (CHRU Montpellier)-Hôpital Saint Eloi [CHU Montpellier] ; Centre Hospitalier Régional Universitaire [Montpellier] (CHRU Montpellier) | Institut de Biologie Computationnelle (IBC) ; Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS) | Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL) ; Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS) | Biothérapies des maladies génétiques et cancers ; Université Bordeaux Segalen - Bordeaux 2-Institut National de la Santé et de la Recherche Médicale (INSERM) | JA is a doctoral fellow of the Fondation pour la Recherche Médicale (FRM, BIOINFO2013 call, grant noDBI20131228566). SB is fellowship of SATT AxLR, la Région Occitanie, and Montpellier Métropole Méditerranée. Institute of Computational Biology, Investissement d’Avenir. | ANR-10-SATT-0006,SATT AxLR,SATT AxLR (EX LANGUEDOC ROUSSILLON)(2010) | ANR-11-BINF-0002,IBC,Institut de biologie Computationnelle(2011)
International audience
Show more [+] Less [-]English. BACKGROUND:The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices.RESULTS:To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved.CONCLUSION:Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/ .
Show more [+] Less [-]AGROVOC Keywords
Bibliographic information
This bibliographic record has been provided by Institut national de la recherche agronomique