Repositorio Institucional Universidad EAFIT :: Examinando por Materia "Clustering"

Examinando por Materia "Clustering"

Mostrando 1 - 20 de 22

A new segmentation approach using dynamic variables on individuals
(Universidad EAFIT, 2021) Prieto Escobar, Nicolás; Laniado Rodas, Henry; Monroy Osorio, Juan Carlos
An Entropy-Based Graph Construction Method for Representing and Clustering Biological Data
(SPRINGER, 2019-10-01) Ariza-Jiménez L.; Pinel N.; Villa L.F.; Quintero O.L.; Universidad EAFIT. Departamento de Ciencias; Biodiversidad, Evolución y Conservación
Unsupervised learning methods are commonly used to perform the non-trivial task of uncovering structure in biological data. However, conventional approaches rely on methods that make assumptions about data distribution and reduce the dimensionality of the input data. Here we propose the incorporation of entropy related measures into the process of constructing graph-based representations for biological datasets in order to uncover their inner structure. Experimental results demonstrated the potential of the proposed entropy-based graph data representation to cope with biological applications related to unsupervised learning problems, such as metagenomic binning and neuronal spike sorting, in which it is necessary to organize data into unknown and meaningful groups. © 2020, Springer Nature Switzerland AG.
Análisis de discurso basado en modelos grandes de lenguaje
(Universidad EAFIT, 2024) Jiménez Jaimes, Edgar Leandro; Montoya Múnera, Edwin Nelson
This thesis explores the implementation of natural language processing techniques and large language models (LLMs) to support discourse analysis tasks in the context of the "Tenemos que hablar Colombia" program. Techniques such as topic modeling, sentiment analysis, clustering, visualization, and the creation of a conversational assistant based on Retrieval Augmented Generation (RAG) have been addressed using advanced text modeling, vector embeddings, and prompt engineering approaches. A text classification model focused on predicting the label of the verbal indicator variable, assigned manually by the interviewer, is also presented, although this model is not directly applied to discourse analysis. This work adds to the studies of the " Tenemos que hablar Colombia " program, where other authors have contributed through computational linguistics analysis and machine learning techniques. Using advanced NLP techniques, we have sought to improve the interpretation of text data and its application in discourse analysis. The results have shown improvements in the accuracy of data classification and analysis through the techniques explored, providing a better understanding of citizen perceptions.
Analizando patrones de éxito en YouTube : un sistema de recomendación para creadores de contenidos educativos
(Universidad EAFIT, 2024) Osorio Urrea, Vanessa; Ortiz Arias, Santiago; del Castillo Cortázar, Francisco Javier
Aplicación de técnicas no-lineales de reducción de dimensionalidad y clustering para detección de observaciones anómalas multidimensionales
(Universidad EAFIT, 2024) Romero Cardona, Daniel; Ortiz Arias, Santiago
Binning application in low-dimensional metagenomic sequences: performance of Barnes-Hut t-Stochastic Neighbor Embeddings, assessment of internal cluster validity indices
(Universidad EAFIT, 2019) Ceballos Cano, Julián; Quintero Montoya, Olga Lucía; Pinel Peláez, Nicolás; Ariza Jiménez, Leandro Fabio
Metagenomic studies aim to reconstruct the structure of microbial communities through the use of DNA sequence data of complex composition. To this end, they generally embed multidimensional data into low dimensional spaces followed by a binning process. The performance of the dimensionality reduction techniques, the clustering methods, and the internal cluster validity indices vary depending on the biological, statistical and computational features that are part of the metagenomic analysis, yet it is seldom evaluated systematically. The explained problematic was explored through an unsupervised binning of metagenomic DNA sequences, based on the Subtractive and Fuzzy c-means algorithms applied to the two- and three-dimensional metagenomic sequences obtained via the Barnes-Hut t-Stochastic Neighbor Embedding (BH-SNE) algorithm in conjunction with Principal Component Analysis (PCA), with the aim of assessing the performance of the BH-SNE including and not including a preliminary PCA, besides the assessment of four Internal Cluster Validity Indices (ICVI) that conditioned the clustering procedure. In addition, the assessment of the ICVIs demonstrated that the Silhouette index had the best performances based on the median values of the F measure. Moreover, Silhouette index was also the most consistent index obtaining the highest values of F median in two- and three-dimensional treatments. In the case of high AAI ranges, the Silhouette index had equal results compared with Calinski-Harabasz index in terms of highest values of F median in three-dimensional treatment, although there were differences between their performance in two-dimensional treatments. In particular, Dunn index generated the worst performances in the low AAI percentages, while the Davies-Bouldin index was the worst in high AAI percentages. Additionally, the Dunn and Davies-Bouldin indices were the most consistent generating the lowest F median values. Moreover, the results of this research suggest that the biology of the metagenomic sequences could have an incidence over the best ICVIs performances. Finally, it was possible to determine that the highest F median values were obtained by the ICVIs in 3D embeddings, with equal results for BH-SNE including and not including preliminary PCA. Furthermore, it was also demonstrated that there was no significance between the results that included or not included a preliminary PCA. In terms of consistency, it was not possible to determine which was the most consistent treatment (2D or 3D embedding with BH-SNE including and not including preliminary PCA) that led the ICVIs to obtaining the best and worst F median results.
Clasificación de inventarios multicriterio mediante el uso de Modelos de Aprendizaje Automático (ML) en la industria automotriz
(Universidad EAFIT, 2024) Vesga Vesga, Luis Rodrigo; Castro Zualuaga, Carlos Alberto
Dinámica espacio temporal en la superposición y concentración de delitos : un caso aplicado para Medellín
(Universidad EAFIT, 2020) Peláez Romero, Andrea Julieth; Gómez Toro, Catalina; Beca condonable otorgada por universidad EAFIT
An Entropy-Based Graph Construction Method for Representing and Clustering Biological Data
(SPRINGER, 2019-10-01) Ariza-Jiménez L.; Pinel N.; Villa L.F.; Quintero O.L.; Universidad EAFIT. Escuela de Ciencias; Modelado Matemático
Unsupervised learning methods are commonly used to perform the non-trivial task of uncovering structure in biological data. However, conventional approaches rely on methods that make assumptions about data distribution and reduce the dimensionality of the input data. Here we propose the incorporation of entropy related measures into the process of constructing graph-based representations for biological datasets in order to uncover their inner structure. Experimental results demonstrated the potential of the proposed entropy-based graph data representation to cope with biological applications related to unsupervised learning problems, such as metagenomic binning and neuronal spike sorting, in which it is necessary to organize data into unknown and meaningful groups. © 2020, Springer Nature Switzerland AG.
An Entropy-Based Graph Construction Method for Representing and Clustering Biological Data
(SPRINGER, 2019-10-01) Ariza-Jiménez L.; Pinel N.; Villa L.F.; Quintero O.L.; Universidad EAFIT. Departamento de Ciencias; Ciencias Biológicas y Bioprocesos (CIBIOP)
Unsupervised learning methods are commonly used to perform the non-trivial task of uncovering structure in biological data. However, conventional approaches rely on methods that make assumptions about data distribution and reduce the dimensionality of the input data. Here we propose the incorporation of entropy related measures into the process of constructing graph-based representations for biological datasets in order to uncover their inner structure. Experimental results demonstrated the potential of the proposed entropy-based graph data representation to cope with biological applications related to unsupervised learning problems, such as metagenomic binning and neuronal spike sorting, in which it is necessary to organize data into unknown and meaningful groups. © 2020, Springer Nature Switzerland AG.
Estudio de la relación entre los valores sociales y la aceptación de sobornos como conducta corrupta : un estudio con modelos SEM y datos de la encuesta mundial de valores
(Universidad EAFIT, 2024) Gómez Convers, Giovanny Hernando; Castrillón-Orrego, Sergio A.; Almonacid Hurtado, Paula María
In a global context of rapid social change, investigating the relationship between social values and corruption has become increasingly urgent and significant. Which behaviors are desirable? Which do we manifest in daily life? The World Values Survey (WVS) serves as a crucial data source for understanding social values in various contexts. However, how these values influence the acceptance of bribery, and thus corruption, has not been sufficiently explored. This study examines the underlying patterns in response clusters and systematically analyzes them using the holistic possibilities offered by the institutionalism theoretical framework. The objective is to identify the most significant causalities and influences in the relationship between social values and corruption. Through robust data analysis, imputation techniques, dimensionality reduction, clustering analysis, and SEM modeling, we identify the main factors impacting the acceptance of bribery. The results demonstrate that the three pillars of institutionalism provide a valuable approach to understanding corruption by simultaneously considering key variables and components. When internalized, social values facilitate the acceptance of bribery in certain contexts, highlighting the influence of the cognitive dimension. Although legal frameworks can enhance transparency, cultural environment and customs have a more determining influence on the acceptance of corrupt practices. These findings underscore the need to foster a strong ethical culture and implement educational programs that promote integrity and transparency to effectively mitigate corruption.
Fuzzy nonlinear regression model for railways ride quality
(Universidad EAFIT, 2007) Raigosa Montoya, Dorian Wilmer; Maya Toro, Jairo; Castañeda Heredia, Leonel Francisco; Hennequin, Sophie
Incorporating a predictive component in a dynamic segmentation approach
(Universidad EAFIT, 2021) Saldarriaga Aristizábal, Pablo Andrés; Laniado, Heny; Monroy, Juan Carlos
Information retrieval on documents methodology based on entropy filtering methodologies
(Inderscience Enterprises Ltd., 2015-01-01) Montoya, O.L.Q.; Villa, L.F.; Muñoz, S.; Arenas, A.C.R.; Bastidas, M.; Universidad EAFIT. Escuela de Ciencias; Modelado Matemático
Information retrieval problem occurs when the target information is not available 'literally' into the set of documents. In problems in which the goal is to find 'hidden' information, it is important to develop hybrid methodologies or improve and design a new one. In this work the authors are dealing with identifying the most informative piece of data on a collection of documents, in order to obtain the best result on a posterior fuzzy clustering stage. The aim is to find similarities between the documents and a reference target, to establish relationships related to a non-literal feature. We propose to apply the well-known entropy term weighting scheme and then show a posterior different procedures to the right election of the interest data. This procedure brings the biggest amount of information within the smallest amount of data. Applying a specific selection procedure for a group of words, gives more information to differentiate and separate the documents after using the entropy weighting. This returns considerable results on the processing time and the right fuzzy clustering of the documents collection. Copyright © 2015 Inderscience Enterprises Ltd.
Modelación probabilística y dinámica de la ansiedad mediante técnicas de clustering y modelos ocultos de Márkov
(Universidad EAFIT, 2026) Giraldo Tirado, Diego Alexander; Peña Palacio, Juan Alejadro
Anxiety constitutes a growing mental health issue with significant impacts on individual well-being, workplace productivity, and the costs associ-ated with disability. Despite advances in analytics applied to mental health, most existing approaches address anxiety from a static perspective, limiting themselves to detection or one-time classification tasks. This work proposes a probabilistic framework to model anxiety as a dynamic and stochastic process, integrating unsupervised learning techniques, Hidden Markov Models (HMM), and analysis of non-normal distributions. Based on psychological and behavior-al variables, observable profiles are identified through clustering, and latent anxiety states are inferred, along with their transition probabilities and long-term behavior. Additionally, a continuous distribution is fitted to the trans-formed psychological well-being indicator, and Value at Risk (VaR)-type met-rics are used to characterize extreme risk. The results show a dynamic dominat-ed by moderate and high anxiety states, with low well-being stability, and demonstrate the usefulness of the proposed approach for understanding and managing psychological risk in workplace contexts
Modelo de segmentación y monitoreo SARLAFT en el Grupo BIOS
(Universidad EAFIT, 2021) Pineda Gómez, Juan Esteban; Gómez Salazar, Elkin Arcesio
Morfología urbana y patrones de movilidad : un análisis topológico y espacial de redes
(Universidad EAFIT, 2025) Riascos Goyes, Juan Fernando ; Ospina Zapata, Juan Pablo; Guarín-Zapata, Nicolás
Urban morphology has long been recognized as a factor shaping human mobility, yet comparative and formal classifications of urban form across metropolitan areas remain limited. Building on theoretical principles of urban structure and advances in unsupervised learning, we systematically classified the built environment of nine U.S. metropolitan areas using structural indicators such as density, connectivity, and spatial configuration. The resulting morphological types were linked to mobility patterns through descriptive statistics, marginal effects estimation, and post hoc statistical testing. Here we show that distinct urban forms are systematically associated with different mobility behaviors, such as reticular morphologies being linked to significantly higher public transport use and reduced car dependence, while organic forms are associated with increased car usage, and substantial declines in public transport and active mobility. These effects are statistically robust, highlighting that the spatial configuration of urban areas plays a fundamental role in shaping transportation choices. Our findings extend previous work by offering a reproducible framework for classifying urban form and demonstrate the added value of morphological analysis in comparative urban research. These results suggest that urban form should be treated as a key variable in mobility planning and provide empirical support for incorporating spatial typologies into sustainable urban policy design.
Nueva Metodología Para Clasificar Datos de Series Temporales usando el Algoritmo Biclustering
(Universidad EAFIT, 2013) Cogollo F. M.; Palacios, Alejandro; Universidad EAFIT. Escuela de Ciencias. Grupo de Investigación Modelado Matemático
On a Combination of Skewness and Kurtosis Matrices for Pro jection Pursuit Exploratory Cluster Analysis
(Universidad EAFIT, 2025) Jaramillo Osorio, Esteban; Ortiz Arias, Santiago
Skewness and kurtosis are statistical measures critical for understanding distribu- tion characteristics, particularly in normality testing, clustering, and outlier detec- tion. While kurtosis has been widely explored in the literature, skewness remains un- derutilized despite its potential for identifying asymmetrical patterns in data. Com- bining these measures could create a robust tool for exploratory data analysis (EDA). This research proposes a novel approach by developing a convex combination of skew- ness and kurtosis matrices. Using iterative procedures to maximize or minimize this combination, we aim to construct a matrix serving as a projection index for a projec- tion pursuit algorithm. This matrix can identify clusters and outliers more effectively than either measure alone. To validate the methodology, experiments on artificial datasets and real-world data demonstrate the benefits of this combined approach in detecting non-normal features, evaluating clustering performance, and enhancing outlier detection.
Patrones espacio–acústicos en sonidos respiratorios multicanal : integración de procrustes, clustering y modelos supervisados en pacientes con EPOC
(Universidad EAFIT, 2025-03-03) Escobar Pajoy, Sebastián; Fonseca Valero, Diego Fernando
Auscultation is a fundamental tool for assessing respiratory conditions; however, its interpretation is limited by examiner subjectivity and by the strong influence that recording location exerts on the acoustic properties of lung sounds. This work proposes a multichannel analysis strategy aimed at characterizing the spatial organization of respiratory sounds in patients with COPD and evaluating the relative contribution of different thoracic regions to the detection of adventitious events. Using simultaneous recordings from seven chest locations in the ICBHI 2017 dataset, a collection of respiratory segments was constructed and described through spectral and cepstral features. Multichannel configurations were projected into low dimensional spaces and aligned using Procrustes analysis, enabling comparable geometric representations across subjects and breathing cycles. An unsupervised clustering scheme applied to these representations revealed recurrent spatial patterns associated with different distributions of wheezes and crackles. In addition, supervised Random Forest models were trained for adventitious sound detection, incorporating feature-importance analyses and channel-ablation experiments to examine the contribution of each thoracic region. The results indicate that multichannel spatial information contains structured patterns that can be leveraged both to group thoracic configurations and to enhance the interpretability of classification models, contributing to more robust and explainable representations of pulmonary acoustics

Examinando por Materia "Clustering"

Resultados por página

Opciones de ordenación