Maestría en Ciencias de los Datos y Analítica (tesis)

URI permanente para esta colección

Examinar

Envíos recientes

Mostrando 1 - 20 de 198
  • Publicación
    Pronóstico de aportes hídricos para operación energética a partir de modelos directos (ML-TS)
    (Universidad EAFIT, 2026-03-09) Díaz Giraldo, Harold Nolberto; Saldarriaga Aristizábal, Pablo Andrés; No aplica
  • Publicación
    Predicción de direcciones de activos financieros basados en la volatilidad en series temporales utilizando machine learning
    (Universidad EAFIT, 2026-02-24) Holguín Carvalho, Mateo; Velasco Vera, Henry Giovanny
    Identifying effective trading signals in financial assets is a challenge that draws attention across multiple disciplines due to the volatile and dynamic nature of financial markets. The complexity investors face stems from the wide range of factors that influence asset prices, including macroeconomic variables, corporate decisions, and unexpected events, making it difficult to obtain precise estimates of future movements. This is particularly relevant for investors seeking to build portfolios that maximize returns. In this context, some variables exhibit stronger relationships with market-driven factors, making them useful indicators for anticipating price direction. Nevertheless, recent advances in computing and in Machine Learning and Deep Learning techniques have enabled the development of more sophisticated models that facilitate this task. This study compares time-series-based machine learning methodologies, specifically LSTM neural networks and LightGBM decision-tree models, while incorporating Conditional Heteroskedasticity models (GARCH) to improve the classification of buy and sell signals in financial instruments, accounting for both historical patterns and external variables affecting asset behavior. The results show that LightGBM achieved the best predictive performance, with notable metrics such as an F1 Score of 0.823 and an AUC-ROC of 0.923 in validation, whereas LSTM delivered the best financial performance, reaching a cumulative return of 28.05% and a Sharpe Ratio of 0.70, clearly outperforming a Buy-and-Hold strategy. These findings suggest that although daily directional prediction is inherently complex, advanced Machine Learning models can transform weak signals into profitable trading strategies.
  • Publicación
    Detección de anomalías visuales en terrenos propensos a deslizamientos mediante análisis multitemporal de imágenes de punto fijo
    (Universidad EAFIT, 2026-02-10) Sánchez Martínez, Fabián David; Saldarriaga Aristizábal, Pablo Andrés; Arbeláez Estrada, Juan Carlos
  • Publicación
    Predicción del precio óptimo de compra de sacos de papel en la industria cementera : un enfoque basado en modelos SARIMAX, ARIMA y red neuronal LSTM
    (Universidad EAFIT, 2025-12-09) Morales Martínez, Andrés Felipe; Fonseca Valero, Diego Fernando
    This research addresses the problem of forecasting the purchase price of kraft paper sacks in the Colombian cement industry, within a context characterized by high price uncertainty and the absence of financial hedging instruments. Using internal transactional data extracted from the ERP system for the 2014–2023 period, together with public exogenous variables such as the exchange rate (TRM) and stock prices of global suppliers in the paper industry, a reproducible dataset was built for purchase price forecasting. The study compared the performance of four time-series approaches: ARIMA, SARIMAX, LSTM, and LightGBM, under a 90/10 temporal validation protocol and strict control of information leakage. As a central part of the methodology, a daily time series was reconstructed using the LOCF technique, and feature engineering was incorporated to better represent the stepwise nature of the price series. The results show that linear models and the LSTM applied to the original series produced high forecasting errors, whereas the best performance was achieved by nonlinear models applied to the transformed series. In particular, the LSTM with the transformed daily series achieved the best overall result (MAE = 4.57 COP), followed by LightGBM (MAE = 8.03 COP), clearly outperforming ARIMA and SARIMAX. It is concluded that an adequate representation of the time series is as important as the selection of the predictive model, and that the combination of internal data, exogenous variables, and nonlinear methods can generate useful operational signals to support more timely, objective, and data-driven purchasing decisions in industrial sourcing contexts.
  • Publicación
    Modelación y análisis de la relación entre el aerobioma y el material particulado (pm2.5) en un periodo de 18 meses en el Valle de Aburrá, Antioquia - Colombia
    (Universidad EAFIT, 2025-06-04) Puerta González, Andrés; Almonacid Hurtado, Paula María; Cuesta Astroz, Yesid; El Sistema General de Regalías por la financiación del proyecto de investigación con código SIGP 75842. Las instituciones ejecutoras, el Politécnico Colombiano Jaime Isaza Cadavid y la Universidad de Antioquia,
  • Publicación
    Predicción de concentración de SO2 en el aire usando machine learning
    (Universidad EAFIT, 2026-02-17) Gómez Jiménez, José Manuel; Saldarriaga Aristizábal, Pablo Andrés
  • Publicación
    Inteligencia del mercado laboral colombiano : detección automatizada de habilidades mediante modelos grandes de lenguaje (LLM) y recuperación aumentada (RAG)
    (Universidad EAFIT, 2025-12-03) Zapata Posada, Jorge Mario; Álvarez Barrera, Claudia Patricia; Padilla Buritica, Jorge Iván
    The demand for skills in the labor market has evolved significantly in recent decades, driven by changes in the economic environment and constant technological advances. In this context, the detailed description of each job offer, available on employment web portals, provides accurate information on the specific skills required by the market in real time. Labor Market Intelligence (LMI) research uses this data along with machine learning algorithms to anticipate trends and understand the evolution of talent demand. Despite advances in artificial intelligence and the availability of large data volumes, there remains a gap in adapting these technologies to local contexts. Regional markets, such as Colombia, require customized approaches to ensure that technological solutions respond to the specific needs of the labor market, effectively aligning talent supply and demand. This study analyzes data from the Talent.com employment platform for Colombia using a state-of-the-art approach based on Large Language Models (LLM) combined with Retrieval Augmented Generation (RAG) to identify emerging, traditional, technical, and soft skills. In the first stage, a multilingual LLM extracts skill mentions from job descriptions. In the second stage, a semantic retrieval module queries the European Commission’s open ESCO skills taxonomy to propose standardized candidate labels, the LLM then selects the most appropriate label and delivers validated, structured JSON outputs. Preliminary results show improvements in precision, coverage, and auditability compared to purely supervised approaches, reducing hallucinations through candidate-constrained selection and standardizing categories using ESCO skill classification. This framework provides valuable insights that, in future work, may support universities in designing academic programs aligned with labor market needs, thus facilitating strategic decision-making for employers, policymakers, and educators, and contributing to talent development and the reduction of unemployment in Colombia.
  • Publicación
    Modelación probabilística y dinámica de la ansiedad mediante técnicas de clustering y modelos ocultos de Márkov
    (Universidad EAFIT, 2026-01-30) Giraldo Tirado, Diego Alexander; Peña Palacio, Juan Alejadro
    Anxiety constitutes a growing mental health issue with significant impacts on individual well-being, workplace productivity, and the costs associ-ated with disability. Despite advances in analytics applied to mental health, most existing approaches address anxiety from a static perspective, limiting themselves to detection or one-time classification tasks. This work proposes a probabilistic framework to model anxiety as a dynamic and stochastic process, integrating unsupervised learning techniques, Hidden Markov Models (HMM), and analysis of non-normal distributions. Based on psychological and behavior-al variables, observable profiles are identified through clustering, and latent anxiety states are inferred, along with their transition probabilities and long-term behavior. Additionally, a continuous distribution is fitted to the trans-formed psychological well-being indicator, and Value at Risk (VaR)-type met-rics are used to characterize extreme risk. The results show a dynamic dominat-ed by moderate and high anxiety states, with low well-being stability, and demonstrate the usefulness of the proposed approach for understanding and managing psychological risk in workplace contexts
  • Publicación
    Pronóstico de la inflación con modelos MIDAS : evidencia para Colombia
    (Universidad EAFIT, 2025-11-28) Hurtado Rivera, Isaac; Almonacid Hurtado, Paula María
    Including variables sampled at different frequencies is an empirical challenge in economics. While macroeconomic series are typically released monthly or quarterly, financial series are available daily. A common practice is to aggregate or average the higher-frequency variables (e.g., monthly or daily data) in order to incorporate them into a single model. However, doing so can discard information and distort the temporal dynamics across variables. MIDAS (Mixed Data Sampling) regressions provide a solution to this problem, while also controlling parameter proliferation and yielding unbiased and efficient estimators. Using an application to monthly inflation in Colombia, this study empirically assesses whether high-frequency information improves forecasting performance and whether the use of MIDAS-type models is warranted. The results suggest that monthly inflation is a low-persistence process that can be adequately modeled with a restricted MIDAS specification, whereas U-MIDAS tends to overfit. In addition, forecasts become more accurate and less volatile when combined forecast methods are used.
  • Publicación
    Sistema multi-agente para la preselección de candidatos en vacantes públicas de empleo utilizando inteligencia artificial generativa
    (Universidad EAFIT, 2025-10-22) Blandón Londoño, Cristian Mauricio; Álvarez Barrera, Claudia Patricia; Martínez Vargas, Juan David
    Historically, recruitment processes have been carried out manually. In such processes, candidates go through a series of filters that vary depending on the specific requirements of each vacancy. As these procedures are not standardized, their duration can be extended, leading to an increase in unfilled positions and, consequently, negatively impacting organizational competitiveness. Identifying the ideal candidate for a job vacancy is a task that demands both time and resources. Today, this represents a significant challenge for organizations within the Human Resources sector, where each day spent searching for the right talent translates into operational costs. As a result, delays in recruitment activities directly affect the achievement of strategic organizational goals. Despite the growing adoption of Applicant Tracking Systems (ATS), these tools often face semantic limitations and do not easily adapt to local contexts—especially in countries like Colombia, where a significant portion of the population is employed informally. This reality hinders not only the objective assessment of candidate suitability but also increases the likelihood of evaluative biases. Recent studies have begun exploring the integration of multi-agent architectures with Large Language Models (LLMs) to automate pre-screening processes. In line with this, the present project proposes the implementation of a multi-agent system for candidate evaluation. By combining Natural Language Processing (NLP) techniques with LLMs, the system aims to analyze applicant and job posting data to support human resources professionals in determining candidate-job fit. The system will be designed to optimize evaluation procedures in recruitment, with the goal of reducing the average time required for candidate assessment and selection. Furthermore, implementation seeks to minimize manual operations and mitigate bias in the evaluation process, thereby contributing to the sustainable development of human capital. It is anticipated that this solution will increase the efficiency of recruitment workflows and promote greater alignment between the skills demanded by employers and those offered in the labor market. This, in turn, is expected to benefit both employers and job seekers, and indirectly support efforts to bridge the skills gap in the Colombian labor market—particularly in a context characterized by high levels of informality—by establishing fair and competency-based evaluation criteria.
  • Publicación
    Exploración de modelos de ML para la identificación de defectos en materiales dieléctricos a partir de la interpretación de patrones de fase resuelta de descargas parciales
    (Universidad EAFIT, 2025-06-25) Santiago Castañeda, Gabriel Arcángel; Martínez Vargas, Juan David; Sepúlveda Cano, Lina María
  • Publicación
    Comparación de algoritmos de control tradicionales con control por medio de IA en un entorno simulado 2D
    (Universidad EAFIT, 2025) Penagos Ramírez, Mateo Fernando; Puerta Echandía, Alejandro
    In automation and control technology, the efficiency of control algorithms is crucial for the performance and safety of complex systems. Traditionally, Proportional–Integral–Derivative (PID) controllers have been the cornerstone of regulating these systems due to their simplicity and robustness, being used in more than 90% of industrial applications. Methods such as PID work very well in environments that are easy to model mathematically and have little variability; such systems can be found in industrial plants and production equipment. However, in dynamic and nonlinear environments, performance issues or difficulties in tuning their parameters may arise. This project aims to compare the performance of traditional control algorithms—specifically PID—with those based on neural networks trained using reinforcement learning techniques. To this end, a simulated 2D environment will be developed to replicate the dynamic behavior of a nonlinear system; in this case, a drone in flight was chosen. This nonlinear environment will allow evaluation of both types of controllers under a series of conditions and operational challenges, including stabilization, trajectory tracking, and response to external disturbances. The research will focus on measuring the accuracy, efficiency, and adaptability of each algorithm, providing an objective basis for comparison that considers key performance metrics. In addition, the inherent advantages and limitations of each approach will be analyzed, including implementation complexity, computational requirements, and scalability.
  • Publicación
    Análisis de patrones espaciales emergentes de lluvia en la ciudad de Medellín
    (Universidad EAFIT, 2025) Rangel Velásquez, Diego; Olarte Hernández, Tomás; Sepúlveda Berrio, Julián
    El aprovechamiento de datos meteorológicos es importante para la pronta atención de emergencias causadas por fenómenos climatológicos. Los sistemas de monitoreo climático brindan información valiosa para la gestión de riesgos, pero su aprovechamiento está estrechamente relacionado a los modelos predictivos que se puedan construir basándose en esta información. En este caso, mediante análisis de mediciones pluviométricas se buscó identificar patrones espaciales emergentes en picos de lluvia que pueden llevar a emergencias que requieran intervención de entidades de atención a desastres. Aunque existen estudios sobre distribución de precipitaciones, potencial de desarrollar modelos que se ajusten mejor a las condiciones ambientales de ciudades específicas. Esta investigación desarrolló un modelo que se adapta a las condiciones específicas del valle de Aburrá, anticipando la llegada de torrenciales a zonas de riesgo específicas. Se encontró que es posible anticipar la evolución de precipitaciones en escenarios específicos de precipitaciones convencionalmente elevadas.
  • Publicación
    Implementación de redes neuronales recurrentes LSTM para el pronóstico de material particulado PM2,5 en el Valle de Áburra
    (Universidad EAFIT, 2025) González, Javier; Coy, Jairo; Suarez, Biviana; Suarez Sierra Biviana Marcela; Coy Coy Jairo Iván; SIATA
    Air pollution is one of the environmental issues of growing global relevance. Poor air quality, particularly in urban areas, has become a scourge that, in addition to having a direct impact on health, significantly contributes to the increase in mortality rates in various parts of the world. In this context, the metropolitan region of the Aburrá Valley is particularly affected due to its topographical and meteorological conditions, combined with a high level of industrialization and the continuous growth of the vehicle fleet, which favors the accumulation of gases and particulate matter in the atmosphere. This project aims to develop a simple and robust model based on LSTM neural networks, in order to forecast PM$_{2.5}$ pollutant concentrations. The main purpose of this tool is to contribute to the risk management associated with critical air quality episodes in the metropolitan region, helping to mitigate the adverse effects on public health.
  • Publicación
    Extracción de información no estructurada de extractos bancarios a partir de modelos de lenguaje de gran tamaño
    (Universidad EAFIT, 2025) Acevedo Jaramillo, Jairo David; Valencia Díaz, Edison
  • Publicación
    Integración de modelos estadísticos y de aprendizaje automático para predecir y mitigar la rotación voluntaria de empleados
    (Universidad EAFIT, 2025) González Ruiz, John Jairo; Almonacid Hurtado, Paula María
  • Publicación
    Discriminación étnica en préstamos hipotecarios en estados unidos : un análisis predictivo con métodos causales y de aprendizaje automático
    (Universidad EAFIT, 2025) Galeano Naranjo, Juan Pablo; Almonacid Hurtado, Paula María; Álvarez Franco, Pilar Beatriz; Cruz Castañeda, Vivian
  • Publicación
    Identificación de riesgos emergentes por medio del análisis de sentimientos con técnicas de aprendizaje automático
    (Universidad EAFIT, 2025) Merizalde Maya, Pablo; Peña Palacio, Juan Alejandro
Toda persona que consulte en este repositorio podrá copiar apartes del texto citando siempre la fuentes, es decir el título del trabajo y el autor. Esta autorización no implica la renuncia a la facultad que tiene el autor de publicar total o parcialmente la obra.
La Universidad no será responsable de ninguna reclamación que pudiera surgir de terceros que invoquen autoría de la obra que presenta el autor.
Todos los derechos reservados.