Maestría en Ciencias de los Datos y Analítica (tesis)

URI permanente para esta colección

https://hdl.handle.net/10784/17488

Examinar

Mostrando 1 - 20 de 202

A Dynamic Approach to Modeling Count Data Based on Intensity Functions of Non-Homogeneous Poisson Processes and Functional Data Techniques
(Universidad EAFIT, 2024) Chavarría Serna, Juan Esteban; Ortiz Arias, Santiago; Velasco, Henry
A Multivariate Outlier Detection Methodology Based on S-Orthogonal DOBIN Projections
(Universidad EAFIT, 2024) Cano Campiño, Andrés Mauricio; Ortiz Arias, Santiago
A new segmentation approach using dynamic variables on individuals
(Universidad EAFIT, 2021) Prieto Escobar, Nicolás; Laniado Rodas, Henry; Monroy Osorio, Juan Carlos
A predictive approach based on fuzzy cognitive maps with federated learning
(Universidad EAFIT, 2023) Garatejo Vargas, Edison Camilo; Aguilar Castro, José Lizandro; Hoyos, William
A retail demand forecasting system of product groups characterized by time series based on “ensemble machine learning” techniques with feature enginnering
(Universidad EAFIT, 2022) Mejía Chitiva, Santiago; Aguilar Castro, José Lisandro
A Robust Version of a Risk-Inverse Weighing Methodology for Portfolio Selection
(Universidad EAFIT, 2024) Renza Chavarría, Juan Felipe; Ortiz Arias, Santiago
Agente de inteligencia artificial para el apoyo a la primera impresión diagnóstica a partir de descripciones sintomáticas expresadas en lenguaje natural
(Universidad EAFIT, 2025-11-24) Bertel Morales, Juan Pablo; Jaramillo Múnera, Yomin Estiven
This thesis proposes the development of an artificial intelligence (AI) agent capable of supporting the generation of an initial diagnostic impression based on symptoms expressed in natural language. The project is grounded in the recognition that medical diagnosis is a complex task prone to errors, particularly when it relies on subjective and unstructured descriptions. To support clinical decision-making, natural language processing and machine learning techniques were applied following the CRISP-DM methodology. The model was trained using the synthetic DDxPlus dataset, which enabled the simulation of clinical scenarios without compromising real patient information. In the process, symptoms were transformed into synthetic anamneses through semantic normalization and subsequently vectorized using various biomedical embedding models. These representations were then used to train a supervised model tasked with associating each narrative with the confirmed diagnosis. As an additional evaluation, a “stress test” was conducted in a simulated environment, in which a healthcare professional interacted directly with the system to assess its ability to interpret real symptomatic descriptions and generate preliminary diagnostic suggestions in a coherent, consistent, and safe.
Ajuste fino de un modelo LLM para realizar reportes resumidos de expertos en trading, con integración de datos desde redes sociales
(Universidad EAFIT, 2025) Restrepo Acevedo, Andrés Felipe; Martínez Vargas, Juan David
The contemporary financial market is characterized by its high complexity and the massive volume of structured and unstructured data generated daily, posing significant challenges for individual investors in terms of analysis and informed decision making. This project proposes the fine-tuning of a Small Language Model (SLM) integrated into a tool capable of generating financial analysis reports similar to those produced by experts. For the proof of concept (PoC), transcripts from financial analysis videos published by experts on their YouTube channels are utilized. The SLM is fine-tuned using instruction-based techniques and the incorporation of the LoRa(Low-Rank Adapters) method, with the aim of extracting and summarizing key information relevant to individual investors. The main objective of this tool is to assist individual investors by generating efficient and accessible reports, facilitating access to valuable information in natural language, and enhancing their ability to make data-driven decisions from unstructured data, all with minimal investment of time and resources. Experimental results demonstrate the viability of using fine-tuned Small Language Models (SLMs) for the generation of high-quality financial reports. Specifically, the selected model, finetune qlora unsloth llama 3.1 8B Instruct bnb 4bit v2 Q8 0, achieved an average score of 5.67 out of 10 in the evaluation conducted by a judge LLM, with an average cosine distance of 0.159 compared to the reference summaries generated by the foundational pretrained model GPT-4.1. This improvement represents a 97.5% increase in performance compared to the same base model, Llama 3.1 8B Instruct, without fine-tuning. Qualitatively, the model exhibits high fidelity and coherence in the extraction and synthesis of key information in moderately long contexts, although it faces challenges in thematic interpretation when dealing with considerably lengthy transcripts. Additionally, implementation of this tool is projected to save 560 hours annually for individual investors, along with an estimated annual reduction in API costs ranging from 7.52 to 25 for the channels analyzed in the proof of concept.
Algoritmo evolutivo para resolver el problema de enrutamiento de vehículos tiempo dependiente con ventanas de tiempo en una compañía del sector de alimentos y bebidas en Colombia
(Universidad EAFIT, 2023) Ramírez Guilombo, Camilo; Rivera Agudelo, Juan Carlos
Análisis comparativo de modelos predictivos para la estimación de PM2.5 : un enfoque basado en aprendizaje automático y predicción conformal
(Universidad EAFIT, 2024) Camelo Valera, Matías; Martínez Vargas, Juan David; Sepúlveda Cano, Lina Maria
Fine particulate matter (𝑃𝑀2.5pollution poses a significant environmental and public health challenge, requiring accurate predictive models for its monitoring and control. This study compares different machine learning approaches, including Linear Regression, Random Forest, and XGBoost, with and without the inclusion of mobility variables, to estimate 𝑃𝑀2.5 levels. Additionally, inductive conformal prediction is implemented to quantify uncertainty in the estimates and provide confidence intervals with 𝛼=0.05. The results show that while XGBoost experiences performance deterioration during training when mobility variables are included, it achieves the best validation performance with the lowest mean absolute error and the highest coefficient of determination. Conformal prediction enabled the establishment of confidence intervals with 89.26% coverage, close to the expected 95%, ensuring model reliability across different spatial and temporal scenarios. In conclusion, the use of machine learning models combined with advanced validation and calibration techniques, such as conformal prediction, enhances the accuracy and reliability of 𝑃𝑀2.5 estimation. However, the quality of input variables, particularly mobility-related data, remains a challenge, highlighting the need to incorporate meteorological information and improve data resolution. These findings contribute to the development of more reliable predictive tools for environmental management and air quality policy decision-making.
Análisis de discurso basado en modelos grandes de lenguaje
(Universidad EAFIT, 2024) Jiménez Jaimes, Edgar Leandro; Montoya Múnera, Edwin Nelson
This thesis explores the implementation of natural language processing techniques and large language models (LLMs) to support discourse analysis tasks in the context of the "Tenemos que hablar Colombia" program. Techniques such as topic modeling, sentiment analysis, clustering, visualization, and the creation of a conversational assistant based on Retrieval Augmented Generation (RAG) have been addressed using advanced text modeling, vector embeddings, and prompt engineering approaches. A text classification model focused on predicting the label of the verbal indicator variable, assigned manually by the interviewer, is also presented, although this model is not directly applied to discourse analysis. This work adds to the studies of the " Tenemos que hablar Colombia " program, where other authors have contributed through computational linguistics analysis and machine learning techniques. Using advanced NLP techniques, we have sought to improve the interpretation of text data and its application in discourse analysis. The results have shown improvements in the accuracy of data classification and analysis through the techniques explored, providing a better understanding of citizen perceptions.
Análisis de discurso de los máximos responsables de las empresas participantes en el COLCAP
(Universidad EAFIT, 2024) Cuervo Garcia, Dairo Alberto; Pantoja Robayo, Javier Orlando; Ceballos Cañón, Johan Armando
Análisis de explicabilidad en modelos predictivos basados en técnicas de aprendizaje automático sobre el riesgo de re-ingresos hospitalarios
(Universidad EAFIT, 2023) Lopera Bedoya, Juan Camilo; Aguilar Castro, José Lisandro
Big Data and medical care are essential to analyze the risk of re-hospitalization of patients with chronic diseases and can even help prevent their deterioration. By leveraging the information, healthcare institutions can deliver accurate preventive care, and thus, reduce hospital admissions. The level of risk calculation will allow planning the spending on in-patient care, in order to ensure that medical spaces and resources are available to those who need it most. This article presents several supervised models to predict when a patient can be hospitalized again, after its discharge. In addition, an explainability analysis will be carried out with the predictive models to extract information associated with the predictions they make, in order to determine, for example, the degree of importance of the predictors/descriptors. In this way, it seeks to make the results obtained more understandable for health personnel.
Análisis de la tendencia de la solución de una interacción con un Chatbot de atención al cliente, basado en análisis de sentimiento y otras variables
(Universidad EAFIT, 2023) Flórez Salazar, Luz Stella; Montoya Múnera, Edwin Nelson
A chatbot is a program created with artificial intelligence. In the context of customer service, can establish conversations with customers and they are trained to resolve their queries, problems and complaints. A chatbot’s skill to identify when a customer is not meeting their request represents a challenge for companies that currently use this technology. One of the strategies to avoid quitting the conversation for this reason, is to escalate or transfer the conversation to a human agent. Therefore, it is essential to detect when it is time to carry out this escalation. This project evaluates different Natural Language Processing (NLP) techniques, rule-based labeling algorithms, classical supervised machine learning models and a simple neural network for classification, applied to interactions between a customer service chatbot and a user, in order to find a mechanism for automatic labeling of the data and to build a model that can be used to make the decision on whether the customer should continue interacting with the chatbot or if he should be transferred to a conversation with a human agent. The labeling mechanism could also be used to classify historical data, to later train a model. Different models and techniques are evaluated and those with the best performance in detecting the conversations that should escalate to a human agent are presented.
Análisis de la utilidad potencial del mercado colombiano a través de modelos de segmentación y customer life value para una empresa originadora de créditos de libranza
(Universidad EAFIT, 2022) González Cano, Juan José; Montoya Cano, Jorge Esteban; Ochoa, Natalia
Currently companies define their target market to have a greater focus on certain individuals and groups of the population, however, they fail to understand in depth what is the future economic benefit that these market niches represent, to understand if their business model is attractive from a financial point of view. This project is directly focused on the Colombian financial sector, seeking to make a direct contribution to the way in which companies in this sector analyze and define the economic potential of their target market, through the use of analytical and financial tools such as segmentation models and Customer Life Value analysis, resulting in the value that each niche can possibly represent in utility for the company, allowing it to outline a business strategy that ensures sustainability over time and in the market. Thanks to the comprehensive capabilities of the project team, segmentation techniques will be used to support different types of variables to find very homogeneous groups in their individuals, but very heterogeneous among them and thus get to know which clusters will lead the company to obtain a greater benefit.
Análisis de los resultados de la aplicación del instrumento para la evaluación docente de la universidad EAFIT
(Universidad EAFIT, 2024) Fernández Carmona, Laura Catalina; Guarín Zapata, Nicolás; Mola Ávila, José Antonio; Universidad EAFIT
Análisis de patrones de violencia armada en la frontera de Colombia con Venezuela usando algoritmos de aprendizaje automático
(Universidad EAFIT, 2025) Lopera Pai, Daniela; Aguilar Castro, José Lisandro
Análisis de patrones espaciales emergentes de lluvia en la ciudad de Medellín
(Universidad EAFIT, 2025) Rangel Velásquez, Diego; Olarte Hernández, Tomás; Sepúlveda Berrio, Julián
El aprovechamiento de datos meteorológicos es importante para la pronta atención de emergencias causadas por fenómenos climatológicos. Los sistemas de monitoreo climático brindan información valiosa para la gestión de riesgos, pero su aprovechamiento está estrechamente relacionado a los modelos predictivos que se puedan construir basándose en esta información. En este caso, mediante análisis de mediciones pluviométricas se buscó identificar patrones espaciales emergentes en picos de lluvia que pueden llevar a emergencias que requieran intervención de entidades de atención a desastres. Aunque existen estudios sobre distribución de precipitaciones, potencial de desarrollar modelos que se ajusten mejor a las condiciones ambientales de ciudades específicas. Esta investigación desarrolló un modelo que se adapta a las condiciones específicas del valle de Aburrá, anticipando la llegada de torrenciales a zonas de riesgo específicas. Se encontró que es posible anticipar la evolución de precipitaciones en escenarios específicos de precipitaciones convencionalmente elevadas.
Análisis de quiebra empresarial ante escenarios de contracción de la oferta y la demanda ocasionados por el Covid-19 : un estudio del sector comercio colombiano
(Universidad EAFIT, 2021) Urán González, Ana María; Arjona, Mateo
Análisis de registros de mantenimiento de centrales de generación de energía con técnicas de procesamiento de lenguaje natural
(Universidad EAFIT, 2024) Ocampo Davila, Andrés Alonso; Salazar Martínez, Carlos Andres

Examinar

Examinando Maestría en Ciencias de los Datos y Analítica (tesis) por Título

Resultados por página

Opciones de ordenación