Unsupervised multiple kernel learning to integrate various metagenomic sources
2017
Mariette, Jérôme | Villa-Vialaneix, Nathalie
In metagenomic analysis, the integration of various sources of information is a di cult task since produced datasets are often of heterogeneous types. These datasets can be composed of species counts, interaction networks or phylogenetic information. The combinations of all these types of data have been shown relevant to provide a better comparison between communities. However, standard integration methods (like PLS) can take advantage of external information but do not allow to analyse heterogenous multi-omics datasets in a generic way. We propose to use similarity functions, called kernels, to integrate multiple datasets of various types into a single exploratory analysis. Kernels can be computed for various data types, such as numerical vectors, phylogenetic trees but also any diversity indexes. They can also be combined into a single meta-kernel. In this work, we provide several solutions to learn either a consensual meta-kernel or a meta-kernel that preserves the original topology of the datasets. This kernel is subsequently used in kernel PCA to provide a fast and accurate visualisation of similarities between samples, in a non linear space and from the multiple source point of view. A generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. We applied our framework to the multiple metagenomic datasets collected during the TARA Oceans expedition. We demonstrated that our method is able to retrieve previous ndings in a single analysis as well as to provide a new image of the sample structures when a larger number of datasets from di erent sources are included in the analysis.
Show more [+] Less [-]Bibliographic information
This bibliographic record has been provided by Institut national de la recherche agronomique