Examinando por Materia "Clustering algorithms"
Mostrando 1 - 4 de 4
Resultados por página
Opciones de ordenación
Ítem Aprendizaje automático para la identificación mineralógica de material particulado - Bogotá, Cali y Valle de Aburrá (Colombia)(Universidad EAFIT, 2023) Gutiérrez Silva, Juan Alberto; Duque Trujillo, José FernandoIdentifying the mineral components present in particulate matter can be of great help to understand the dynamics of air pollution, especially to detect the presence of minerals that are dangerous for inhalation (such as asbestos). In this work is developed a methodology for the clustering of chemical data obtained through scanning electron microscopy with energy dispersive spectroscopy (SEM-EDX) in samples located in Bogota, Cali and Valle de Aburrá (Colombia). Rausch et al. (2022) and Avellaneda et al. (2020) develop and apply a methodology based on random forest algorithms to separate categories of particles, including minerals. In this work, a generalized algorithm based on DBSCAN is proposed as a complement. It allowed to analyze a set of 3716 samples previously classified as "mineral". The results reveal the presence of at least 15 different minerals. Despite a relatively low classification effectiveness (~20%), this work represents a significant advance in this area, as precedents are few or non-existent for this type of application. It is notable, also, that the presence of Serpentine (Antigorite variety) was detected in Medellín. The findings of this study reveal that most of the particles correspond to quartz, calcite, kaolinite and plagioclase. Despite the limitations, the algorithm demonstrates its effectiveness in mineral identification. However, improvements that could increase its accuracy are recognized. Overall, this study establishes a starting point for future chemical characterization analyses of particulate matter.Ítem An Automatic Merge Technique to Improve the Clustering Quality Performed by LAMDA(Institute of Electrical and Electronics Engineers Inc., 2020-01-01) Morales, Luis; Aguilar, Jose; Morales, Luis; Aguilar, Jose; Universidad EAFIT. Departamento de Ingeniería de Sistemas; I+D+I en Tecnologías de la Información y las ComunicacionesClustering is a research challenge focused on discovering knowledge from data samples whose goal is to build good quality partitions. In this paper is proposed an approach based on LAMDA (Learning Algorithm for Multivariable Data Analysis), whose most important features are: a) it is a non-iterative fuzzy algorithm that can work with online data streams, b) it does not require the number of clusters, c) it can generate new partitions with objects that do not have enough similarity with the preexisting clusters (incremental-learning). However, in some applications, the number of created partitions does not correspond with the number of desired clusters, which can be excessive or impractical for the expert. Therefore, our contribution is the formalization of an automatic merge technique to update the cluster partition performed by LAMDA to improve the quality of the clusters, and a new methodology to compute the Marginal Adequacy Degree that enhances the individual-cluster assignment. The proposal, called LAMDA-RD, is applied to several benchmarks, comparing the results against the original LAMDA and other clustering algorithms, to evaluate the performance based on different metrics. Finally, LAMDA-RD is validated in a real case study related to the identification of production states in a gas-lift well, with data stream. The results have shown that LAMDA-RD achieves a competitive performance with respect to the other well-known algorithms, especially in unbalanced benchmarks and benchmarks with an overlapping of around 9%. In these cases, our algorithm is the best, reaching a Rand Index (RI) >98%. Besides, it is consistently among the best for all metrics considered (Silhouette coefficient, modification of the Silhouette coefficient, WB-index, Performance Coefficient, among others) in all case studies analyzed in this paper. Finally, in the real case study, it is better in all the metrics.Ítem Comparison on the estimation of the biomass of a batch bioreactor through fuzzy systems, neural networks and adaptive neuro-fuzzy inference system(2011-01-01) Muñoz, A.A.G.; Quintero, O.L.; Muñoz, A.A.G.; Quintero, O.L.; Universidad EAFIT. Departamento de Ciencias; Modelado MatemáticoThe estimation of biomass production of d-endotoxins of the Bacillus thuringiensis (Bt) is a major problem in biotechnological processes, as bio-insecticides, which has been addressed with different methodologies such as extended Kalman filters (EKF), phenomenological observers, among others. This paper presents a comparison in the estimation of biomass concentration of d - endotoxins of the Bacillus thuringiensis (Bt), using Mamdani fuzzy inference systems (FIS), neural networks (NN) and adaptive neuro-fuzzy inference system (ANFIS) trained with differents clustering algorithms; and comparing the associated outcomes among these. © 2011 IEEE.Ítem Memberships Networks for High-Dimensional Fuzzy Clustering Visualization(Springer Verlag, 2019-01-01) Ariza-Jiménez L.; Villa L.F.; Quintero O.L.; Universidad EAFIT. Escuela de Ciencias; Modelado MatemáticoVisualizing the cluster structure of high-dimensional data is a non-trivial task that must be able to deal with the large dimensionality of the input data. Unlike hard clustering structures, visualization of fuzzy clusterings is not as straightforward because soft clustering algorithms yield more complex clustering structures. Here is introduced the concept of membership networks, an undirected weighted network constructed based on the fuzzy partition matrix that represents a fuzzy clustering. This simple network-based method allows understanding visually how elements involved in this kind of complex data clustering structures interact with each other, without relying on a visualization of the input data themselves. Experiment results demonstrated the usefulness of the proposed method for the exploration and analysis of clustering structures on the Iris flower data set and two large and unlabeled financial datasets, which describes the financial profile of customers of a local bank. © 2019, Springer Nature Switzerland AG.