Examinando por Materia "DATOS ESTADÍSTICOS"
Mostrando 1 - 12 de 12
Resultados por página
Opciones de ordenación
Ítem Análisis del efecto que tienen los subsidios a la demanda para la adquisición de vivienda nueva en los ingresos monetarios de los beneficiarios(Universidad EAFIT, 2022) Betancur Londoño, David; Dávalos Álvarez, EleonoraÍtem Aplicación de técnicas no-lineales de reducción de dimensionalidad y clustering para detección de observaciones anómalas multidimensionales(Universidad EAFIT, 2024) Romero Cardona, Daniel; Ortiz Arias, SantiagoÍtem Assessing the effects of Multivariate Functional outlier identification and sample robustification on identifying critical PM2.5 air pollution episodes in Medellín, Colombia(Universidad EAFIT, 2022) Roldán Alzate, Luis Miguel; Zuluaga Díaz, Francisco IvánIdentification of critical episodes of environmental pollution, both as a outlier identification problem and as a classification problem, is a usual application of multivariate functional data analysis. This article addresses the effects of robustifying multivariate functional samples on the identification of critical pollution episodes in Medellín, Colombia. To do so, it compares 18 depth-based outlier identification methods and highlights the best options in terms of precision through simulation. It then applies the two methods with the best performance to robustify a real dataset of air pollution (PM2.5 concentration) in the Metropolitan Area of Medellín, Colombia and compares the effects of robustifying the samples on the accuracy of supervised classification through the multivariate functional DD-classifier. Our results show that 10 out of 20 methods revised perform better in at least one kind outliers. Nevertheless, no clear positive effects of robustification were identified with the real dataset.Ítem Definición de una metodología para análisis de discurso basado en lingüística computacional y técnicas de aprendizaje de máquina(Universidad EAFIT, 2023) Fajardo Becerra, Daian Paola; Montoya Múnera, Edwin Nelson; Ariza Jiménez, Leandro FabioThe different actions carried out by a state regulatory body generate multiple opinions among citizens, which form debates among people, causing them to agree, disagree or partially agree with the decisions or strategies proposed. In order to know the opinions of the citizens, in Chile a project called "Tenemos que hablar Chile" (We have to talk Chile) was created, which asked structured questions to a group of citizens, where the answer of each person was classified by the moderator. each person's answer was classified by the moderator. This label was used for different discourse analyses that began to be developed without any specific order. This project was replicated in Colombia, under the same dynamics in order to know the opinions of the citizens, however, the techniques used were different from the Chilean project. As a result, it is observed that although both projects had the same dynamics and sought a similar result, it was not possible to reuse the techniques developed in the Chilean project in Colombia. Due to this, the proposal of this master's project seeks the implementation of a methodology that allows the use of different techniques of discourse analysis based on computational linguistics and machine learning that will provide the team of analysts with a scheme of stages which will have tools and techniques of Natural Language processing (NLP) to improve the efficiency of this type of projects. Within this project we can highlight the strengths of the director who has a high experience in Machine Learning (ML) and NLP, in addition to the strengths of the co-director with a broad understanding of the project "Tenemos que Hablar Colombia" (TQHC), and finally the student of this project with a base in the Master of Data Science and Analytics to generate a research on NLP techniques.Ítem Desarrollo de un algoritmo de aprendizaje por refuerzo profundo para resolver el despacho hidrotérmico colombiano considerando escenarios hidrológicos y de demanda bajo incertidumbre(Universidad EAFIT, 2022) Ramírez Arango, Alejandro; Aguilar Castro, José LisandroEconomic dispatch is a widely analyzed optimization problem in the electricity sector, which seeks to make the best use of available resources to meet demand at minimum cost. This problem has a great complexity in its solution due to the uncertainty of multiple parameters, being of special interest the hydrological uncertainty for the Colombian case due to its high dependence on hydroelectric plants. In this paper, we view economic dispatch as a multistage decision making problem and propose a Reinforcement Learning to solve the Colombian economic dispatch problem considering hydrological scenarios, due to its ability to handle uncertainty and sequential decisions. The policy performance of our algorithm is compared with classic deterministic method. The main advantage of our method is it can learn from a robust policy to deal the inflow and load demand scenarios.Ítem Nonparametric Generation of Synthetic Data Using Copulas(Universidad EAFIT, 2023) Restrepo Lopera, Juan Pablo; Laniado Rodas, Henry; Rivera Agudelo, Juan CarlosThis article presents a novel nonparametric approach to generate synthetic data using copulas, which are functions that explain the dependency structure of the real data. The proposed method addresses several challenges faced by existing synthetic data generation techniques, such as the preservation of complex multivariate structures presented in real data. By using all the information from real data and verifying that the generated synthetic data follows the same behavior as the real data under homogeneity tests, our method is a significant improvement over existing techniques. Our method is easy to implement and interpret, making it a valuable tool for solving class imbalance problems in machine learning models, improving the generalization capabilities of deep learning models, and anonymizing information in finance and healthcare domains, among other applications.Ítem On a robust linear discriminant analysis version based on shrinkage estimators(Universidad EAFIT, 2022) Goez Mora, Luis Miguel; Ortiz Arias, Santiago; Laniado Rodas, HenryÍtem Precision Matrix Estimation using Preconditioned Conjugate Gradient with Regularization(Universidad EAFIT, 2024) Barrientos Osorio, Alejandro; Ortiz Arias, SantiagoÍtem Predicción de pacientes crónicos de alto costo con marca de riesgo cardiovascular a través de técnicas de aprendizaje estadístico(Universidad EAFIT, 2023) Escobar Bedoya, José Ignacio; Ortiz, Santiago; Arias Hernández, Olga LilianaÍtem Revisión de técnicas estadísticas bayesianas para la coincidencia de entidades en conjuntos de datos grandes(Universidad EAFIT, 2024) López Valencia, Sebastián; Suárez Sierra, Biviana MarcelaIn the context of data analysis, entity matching is a crucial task that involves identifying and pairing records that represent the same entity across different data sources. This work provides a review of various statistical techniques, with a particular focus on Bayesian methods, to address this problem in large datasets. In the theoretical framework and state of the art, various matching techniques are reviewed, including rule-based methods, text distance functions, and machine learning-based methods. Several optimization strategies are also presented to reduce the computational cost associated with entity matching, including heuristics, less complex distance measures, and fast-converging learning algorithms. A notable approach is to group and then compare entities, which significantly reduces the complexity of the necessary comparisons. In the data description section, the procedures for data acquisition and preprocessing are detailed, which are fundamental to ensure the quality and relevance of the datasets used in the experiments. The work methodology is described in detail, covering everything from business knowledge to data acquisition, understanding, and modeling. Finally, in the development of methods and results, the findings obtained through the application of the reviewed and proposed techniques in this thesis are presented. The conclusions highlight the effectiveness of Bayesian techniques and suggest areas for future research.Ítem Rutas seguras de transporte de residuos hospitalarios en Medellín(Universidad EAFIT, 2022) Urcuqui Henao, Ana Cristina; Valencia Díaz, ÉdisonÍtem Segmentación de los flujos migratorios en Colombia : identificación de subgrupos y características comunes(Universidad EAFIT, 2024) Aguirre Marín, Cindy Vanessa; Martínez Vargas, Juan David; Sepúlveda Cano, Lina MaríaThe increase in global migration has intensified migratory flows, emerging as a relevant phenomenon for global, regional, and national policies. In Colombia, since 2015, Venezuelan migration has sparked interest in migratory flows. This study analyzes migratory flows to Colombia in 2023 using Machine Learning techniques. K-Means was applied in order to segment data from Migration Colombia, while UMAP was used to reduce the dimensionality of the data itself. The results reveal four main clusters, defined by the region of origin, reason for travel, host region, and month of arrival. Most flows correspond to tourists, suggesting that the data from official migration points primarily reflect tourist movements and not necessarily other types of migration. Machine Learning techniques proved effective in uncovering complex patterns in categorical data, and interpretation using SmartExplainer by SHAPash facilitated the understanding of these patterns. This study not only adequately segmented migratory flows but also provided interpretative tools for future analyses of categorical data.