Examinando por Materia "Metagenomics"
Mostrando 1 - 10 de 10
Resultados por página
Opciones de ordenación
Ítem Binning application in low-dimensional metagenomic sequences: performance of Barnes-Hut t-Stochastic Neighbor Embeddings, assessment of internal cluster validity indices(Universidad EAFIT, 2019) Ceballos Cano, Julián; Quintero Montoya, Olga Lucía; Pinel Peláez, Nicolás; Ariza Jiménez, Leandro FabioMetagenomic studies aim to reconstruct the structure of microbial communities through the use of DNA sequence data of complex composition. To this end, they generally embed multidimensional data into low dimensional spaces followed by a binning process. The performance of the dimensionality reduction techniques, the clustering methods, and the internal cluster validity indices vary depending on the biological, statistical and computational features that are part of the metagenomic analysis, yet it is seldom evaluated systematically. The explained problematic was explored through an unsupervised binning of metagenomic DNA sequences, based on the Subtractive and Fuzzy c-means algorithms applied to the two- and three-dimensional metagenomic sequences obtained via the Barnes-Hut t-Stochastic Neighbor Embedding (BH-SNE) algorithm in conjunction with Principal Component Analysis (PCA), with the aim of assessing the performance of the BH-SNE including and not including a preliminary PCA, besides the assessment of four Internal Cluster Validity Indices (ICVI) that conditioned the clustering procedure. In addition, the assessment of the ICVIs demonstrated that the Silhouette index had the best performances based on the median values of the F measure. Moreover, Silhouette index was also the most consistent index obtaining the highest values of F median in two- and three-dimensional treatments. In the case of high AAI ranges, the Silhouette index had equal results compared with Calinski-Harabasz index in terms of highest values of F median in three-dimensional treatment, although there were differences between their performance in two-dimensional treatments. In particular, Dunn index generated the worst performances in the low AAI percentages, while the Davies-Bouldin index was the worst in high AAI percentages. Additionally, the Dunn and Davies-Bouldin indices were the most consistent generating the lowest F median values. Moreover, the results of this research suggest that the biology of the metagenomic sequences could have an incidence over the best ICVIs performances. Finally, it was possible to determine that the highest F median values were obtained by the ICVIs in 3D embeddings, with equal results for BH-SNE including and not including preliminary PCA. Furthermore, it was also demonstrated that there was no significance between the results that included or not included a preliminary PCA. In terms of consistency, it was not possible to determine which was the most consistent treatment (2D or 3D embedding with BH-SNE including and not including preliminary PCA) that led the ICVIs to obtaining the best and worst F median results.Ítem Entropy-based graph construction methods for unsupervised data structure detection(Universidad EAFIT, 2021) Ariza Jiménez, Leandro Fabio; Quintero Montoya, Olga Lucía; Pinel Peláez, NicolásÍtem IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses(BioMed Central Ltd., 2016-12-16) Narayanasamy, S.; Jarosz, Y.; Muller, E.E.L.; Heintz-Buschart, A.; Herold, M.; Kaysen, A.; Laczny, C.C.; Pinel, N.; May, P.; Wilmes, P.; Universidad EAFIT. Departamento de Ciencias; Biodiversidad, Evolución y ConservaciónExisting workflows for the analysis of multi-omic microbiome datasets are lab-specific and often result in sub-optimal data usage. Here we present IMP, a reproducible and modular pipeline for the integrated and reference-independent analysis of coupled metagenomic and metatranscriptomic data. IMP incorporates robust read preprocessing, iterative co-assembly, analyses of microbial community structure and function, automated binning, as well as genomic signature-based visualizations. The IMP-based data integration strategy enhances data usage, output volume, and output quality as demonstrated using relevant use-cases. Finally, IMP is encapsulated within a user-friendly implementation using Python and Docker. IMP is available at http://r3lab.uni.lu/web/imp/ (MIT license).Ítem IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses(BioMed Central Ltd., 2016-12-16) Narayanasamy, S.; Jarosz, Y.; Muller, E.E.L.; Heintz-Buschart, A.; Herold, M.; Kaysen, A.; Laczny, C.C.; Pinel, N.; May, P.; Wilmes, P.; Universidad EAFIT. Departamento de Ciencias; Ciencias Biológicas y Bioprocesos (CIBIOP)Existing workflows for the analysis of multi-omic microbiome datasets are lab-specific and often result in sub-optimal data usage. Here we present IMP, a reproducible and modular pipeline for the integrated and reference-independent analysis of coupled metagenomic and metatranscriptomic data. IMP incorporates robust read preprocessing, iterative co-assembly, analyses of microbial community structure and function, automated binning, as well as genomic signature-based visualizations. The IMP-based data integration strategy enhances data usage, output volume, and output quality as demonstrated using relevant use-cases. Finally, IMP is encapsulated within a user-friendly implementation using Python and Docker. IMP is available at http://r3lab.uni.lu/web/imp/ (MIT license).Ítem Standardized Approaches for Assessing Metagenomic Contig Binning Performance from Barnes-Hut t-Stochastic Neighbor Embeddings(SPRINGER, 2020-01-01) Ceballos J.; Ariza-Jiménez L.; Pinel N.; Ceballos J.; Ariza-Jiménez L.; Pinel N.; Universidad EAFIT. Departamento de Ciencias; Ciencias Biológicas y Bioprocesos (CIBIOP)The performance of unsupervised methods for metagenomic binning is often assessed using simulated microbial communities. The lack of well-characterized evaluation protocols and approaches to community construction cognizant of biological realities impedes the rigorous assessment and standardization of the binning process. This work attempted to standardize performance evaluation using benchmark communities constructed according to the genome similarity metric Average Amino Acid identity. This approach allowed us to extend and deepen our previous research on the unsupervised binning of metagenomic sequence fragments based on low-dimensional embeddings of pentamer frequency profiles. Experimental results evidenced our method’s potential for the binning of metagenomic contigs to become an alternative to state-of-the-art methods such as MetaCluster 3.0. © 2020, Springer Nature Switzerland AG.Ítem Standardized Approaches for Assessing Metagenomic Contig Binning Performance from Barnes-Hut t-Stochastic Neighbor Embeddings(SPRINGER, 2020-01-01) Ceballos J.; Ariza-Jiménez L.; Pinel N.; Ceballos J.; Ariza-Jiménez L.; Pinel N.; Universidad EAFIT. Departamento de Ciencias; Modelado MatemáticoThe performance of unsupervised methods for metagenomic binning is often assessed using simulated microbial communities. The lack of well-characterized evaluation protocols and approaches to community construction cognizant of biological realities impedes the rigorous assessment and standardization of the binning process. This work attempted to standardize performance evaluation using benchmark communities constructed according to the genome similarity metric Average Amino Acid identity. This approach allowed us to extend and deepen our previous research on the unsupervised binning of metagenomic sequence fragments based on low-dimensional embeddings of pentamer frequency profiles. Experimental results evidenced our method’s potential for the binning of metagenomic contigs to become an alternative to state-of-the-art methods such as MetaCluster 3.0. © 2020, Springer Nature Switzerland AG.Ítem Standardized Approaches for Assessing Metagenomic Contig Binning Performance from Barnes-Hut t-Stochastic Neighbor Embeddings(SPRINGER, 2020-01-01) Ceballos J.; Ariza-Jiménez L.; Pinel N.; Ceballos J.; Ariza-Jiménez L.; Pinel N.; Universidad EAFIT. Departamento de Ciencias; Bioiversidad, Evolución y ConservaciónThe performance of unsupervised methods for metagenomic binning is often assessed using simulated microbial communities. The lack of well-characterized evaluation protocols and approaches to community construction cognizant of biological realities impedes the rigorous assessment and standardization of the binning process. This work attempted to standardize performance evaluation using benchmark communities constructed according to the genome similarity metric Average Amino Acid identity. This approach allowed us to extend and deepen our previous research on the unsupervised binning of metagenomic sequence fragments based on low-dimensional embeddings of pentamer frequency profiles. Experimental results evidenced our method’s potential for the binning of metagenomic contigs to become an alternative to state-of-the-art methods such as MetaCluster 3.0. © 2020, Springer Nature Switzerland AG.Ítem Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings(Institute of Electrical and Electronics Engineers Inc., 2018-01-01) Ariza-Jimenez L.; Quintero O.L.; Pinel N.; Universidad EAFIT. Departamento de Ciencias; Ciencias Biológicas y Bioprocesos (CIBIOP)Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t -Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. © 2018 IEEE.Ítem Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings(Institute of Electrical and Electronics Engineers Inc., 2018-01-01) Ariza-Jimenez L.; Quintero O.L.; Pinel N.; Universidad EAFIT. Departamento de Ciencias; Biodiversidad, Evolución y ConservaciónShotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t -Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. © 2018 IEEE.Ítem Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings(Institute of Electrical and Electronics Engineers Inc., 2018-01-01) Ariza-Jimenez L.; Quintero O.L.; Pinel N.; Universidad EAFIT. Escuela de Ciencias; Modelado MatemáticoShotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t -Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. © 2018 IEEE.