Análisis y caracterización de secuencias de nucleótidos como series temporales
2021
Sánchez Díaz, Manuel | García-Gutiérrez Báez, Carlos | San José Martínez, Fernando
The analysis of time series has resulted in being particularly useful in several areas of knowledge and related studies (such as economic forecasting, budgetary analysis, yield processing), for which a significant number of tools have been developed. Because of the advance of se- quencing techniques, there are databases with millions of sequences of nucleotides with hardly any characterisation. It exits an algorithm which can convert nucleotide sequences into time series. The present piece of research frames a study of time series and transformed networks from nucleotides sequences. We also investigate different methods, based on machine learning and deep learning, which could characterize the sequences through several mathematical parameters. There are many parameters of time series that we are going to study in this research. Firstly, we studied the correlation of time series, using the Detrented Fluctuation Analysis method. Then we parameterize the time series to describe some processes, such autoregresive processes, processes of moving averages and integrated approaches. Finally, these time series were transformed into networks in the third place to study and quantify other associated properties: networks connectivity and the small world effect. We examined the resulting parameters obtaines from the DNA sequences and also their applicability as classifiers for several groupings: content in introns; the realm the organism; the type of Nucleic Acid (DNA or RNA ); functionality ( codifying or structural ), and genetic compartment where the sequence is located. Furthermore, we implemented different classification methods based on Machine Learning and Deep Learning. These two methods will classify according to the parameters of time series and networks present in this piece of research. First, we grouped the sequences by an unsupervised method, k-means. Then, the sequences were grouped by a monitored method, K-Nearest Neighbors. Per cent success rates increased between 50% and 70% in relation to the previous process. Finally, an Artificial Neural Network was trained with the before-mentioned parameters. This network could successfully characterise up to 95% of the sequences for certain features. Prospects resulting from this study aimed to increase the sequences to enhance the confidence level when classifying. It is also focused on studying other properties of the networks and time series, such as the fractal properties. Moreover, the number of features to be predicted with these types of parameters can also be increased.
Afficher plus [+] Moins [-]Mots clés AGROVOC
Informations bibliographiques
Cette notice bibliographique a été fournie par Universidad Politécnica de Madrid
Découvrez la collection de ce fournisseur de données dans AGRIS