Maestría en Ciencias de los Datos y Analítica (tesis)

URI permanente para esta colección

Examinar

Envíos recientes

Mostrando 1 - 20 de 188
  • Publicación
    Sistema multi-agente para la preselección de candidatos en vacantes públicas de empleo utilizando inteligencia artificial generativa
    (Universidad EAFIT, 2025-10-22) Blandón Londoño, Cristian Mauricio; Álvarez Barrera, Claudia Patricia; Martínez Vargas, Juan David
    Historically, recruitment processes have been carried out manually. In such processes, candidates go through a series of filters that vary depending on the specific requirements of each vacancy. As these procedures are not standardized, their duration can be extended, leading to an increase in unfilled positions and, consequently, negatively impacting organizational competitiveness. Identifying the ideal candidate for a job vacancy is a task that demands both time and resources. Today, this represents a significant challenge for organizations within the Human Resources sector, where each day spent searching for the right talent translates into operational costs. As a result, delays in recruitment activities directly affect the achievement of strategic organizational goals. Despite the growing adoption of Applicant Tracking Systems (ATS), these tools often face semantic limitations and do not easily adapt to local contexts—especially in countries like Colombia, where a significant portion of the population is employed informally. This reality hinders not only the objective assessment of candidate suitability but also increases the likelihood of evaluative biases. Recent studies have begun exploring the integration of multi-agent architectures with Large Language Models (LLMs) to automate pre-screening processes. In line with this, the present project proposes the implementation of a multi-agent system for candidate evaluation. By combining Natural Language Processing (NLP) techniques with LLMs, the system aims to analyze applicant and job posting data to support human resources professionals in determining candidate-job fit. The system will be designed to optimize evaluation procedures in recruitment, with the goal of reducing the average time required for candidate assessment and selection. Furthermore, implementation seeks to minimize manual operations and mitigate bias in the evaluation process, thereby contributing to the sustainable development of human capital. It is anticipated that this solution will increase the efficiency of recruitment workflows and promote greater alignment between the skills demanded by employers and those offered in the labor market. This, in turn, is expected to benefit both employers and job seekers, and indirectly support efforts to bridge the skills gap in the Colombian labor market—particularly in a context characterized by high levels of informality—by establishing fair and competency-based evaluation criteria.
  • Publicación
    Exploración de modelos de ML para la identificación de defectos en materiales dieléctricos a partir de la interpretación de patrones de fase resuelta de descargas parciales
    (Universidad EAFIT, 2025-06-25) Santiago Castañeda, Gabriel Arcángel; Martínez Vargas, Juan David; Sepúlveda Cano, Lina María
  • Publicación
    Comparación de algoritmos de control tradicionales con control por medio de IA en un entorno simulado 2D
    (Universidad EAFIT, 2025) Penagos Ramírez, Mateo Fernando; Puerta Echandía, Alejandro
    In automation and control technology, the efficiency of control algorithms is crucial for the performance and safety of complex systems. Traditionally, Proportional–Integral–Derivative (PID) controllers have been the cornerstone of regulating these systems due to their simplicity and robustness, being used in more than 90% of industrial applications. Methods such as PID work very well in environments that are easy to model mathematically and have little variability; such systems can be found in industrial plants and production equipment. However, in dynamic and nonlinear environments, performance issues or difficulties in tuning their parameters may arise. This project aims to compare the performance of traditional control algorithms—specifically PID—with those based on neural networks trained using reinforcement learning techniques. To this end, a simulated 2D environment will be developed to replicate the dynamic behavior of a nonlinear system; in this case, a drone in flight was chosen. This nonlinear environment will allow evaluation of both types of controllers under a series of conditions and operational challenges, including stabilization, trajectory tracking, and response to external disturbances. The research will focus on measuring the accuracy, efficiency, and adaptability of each algorithm, providing an objective basis for comparison that considers key performance metrics. In addition, the inherent advantages and limitations of each approach will be analyzed, including implementation complexity, computational requirements, and scalability.
  • Publicación
    Análisis de patrones espaciales emergentes de lluvia en la ciudad de Medellín
    (Universidad EAFIT, 2025) Rangel Velásquez, Diego; Olarte Hernández, Tomás; Sepúlveda Berrio, Julián
    El aprovechamiento de datos meteorológicos es importante para la pronta atención de emergencias causadas por fenómenos climatológicos. Los sistemas de monitoreo climático brindan información valiosa para la gestión de riesgos, pero su aprovechamiento está estrechamente relacionado a los modelos predictivos que se puedan construir basándose en esta información. En este caso, mediante análisis de mediciones pluviométricas se buscó identificar patrones espaciales emergentes en picos de lluvia que pueden llevar a emergencias que requieran intervención de entidades de atención a desastres. Aunque existen estudios sobre distribución de precipitaciones, potencial de desarrollar modelos que se ajusten mejor a las condiciones ambientales de ciudades específicas. Esta investigación desarrolló un modelo que se adapta a las condiciones específicas del valle de Aburrá, anticipando la llegada de torrenciales a zonas de riesgo específicas. Se encontró que es posible anticipar la evolución de precipitaciones en escenarios específicos de precipitaciones convencionalmente elevadas.
  • Publicación
    Implementación de redes neuronales recurrentes LSTM para el pronóstico de material particulado PM2,5 en el Valle de Áburra
    (Universidad EAFIT, 2025) González, Javier; Coy, Jairo; Suarez, Biviana; Suarez Sierra Biviana Marcela; Coy Coy Jairo Iván; SIATA
    Air pollution is one of the environmental issues of growing global relevance. Poor air quality, particularly in urban areas, has become a scourge that, in addition to having a direct impact on health, significantly contributes to the increase in mortality rates in various parts of the world. In this context, the metropolitan region of the Aburrá Valley is particularly affected due to its topographical and meteorological conditions, combined with a high level of industrialization and the continuous growth of the vehicle fleet, which favors the accumulation of gases and particulate matter in the atmosphere. This project aims to develop a simple and robust model based on LSTM neural networks, in order to forecast PM$_{2.5}$ pollutant concentrations. The main purpose of this tool is to contribute to the risk management associated with critical air quality episodes in the metropolitan region, helping to mitigate the adverse effects on public health.
  • Publicación
    Extracción de información no estructurada de extractos bancarios a partir de modelos de lenguaje de gran tamaño
    (Universidad EAFIT, 2025) Acevedo Jaramillo, Jairo David; Valencia Díaz, Edison
  • Publicación
    Integración de modelos estadísticos y de aprendizaje automático para predecir y mitigar la rotación voluntaria de empleados
    (Universidad EAFIT, 2025) González Ruiz, John Jairo; Almonacid Hurtado, Paula María
  • Publicación
    Discriminación étnica en préstamos hipotecarios en estados unidos : un análisis predictivo con métodos causales y de aprendizaje automático
    (Universidad EAFIT, 2025) Galeano Naranjo, Juan Pablo; Almonacid Hurtado, Paula María; Álvarez Franco, Pilar Beatriz; Cruz Castañeda, Vivian
  • Publicación
    Identificación de riesgos emergentes por medio del análisis de sentimientos con técnicas de aprendizaje automático
    (Universidad EAFIT, 2025) Merizalde Maya, Pablo; Peña Palacio, Juan Alejandro
  • Publicación
    Ajuste fino de un modelo LLM para realizar reportes resumidos de expertos en trading, con integración de datos desde redes sociales
    (Universidad EAFIT, 2025) Restrepo Acevedo, Andrés Felipe; Martínez Vargas, Juan David
    The contemporary financial market is characterized by its high complexity and the massive volume of structured and unstructured data generated daily, posing significant challenges for individual investors in terms of analysis and informed decision making. This project proposes the fine-tuning of a Small Language Model (SLM) integrated into a tool capable of generating financial analysis reports similar to those produced by experts. For the proof of concept (PoC), transcripts from financial analysis videos published by experts on their YouTube channels are utilized. The SLM is fine-tuned using instruction-based techniques and the incorporation of the LoRa(Low-Rank Adapters) method, with the aim of extracting and summarizing key information relevant to individual investors. The main objective of this tool is to assist individual investors by generating efficient and accessible reports, facilitating access to valuable information in natural language, and enhancing their ability to make data-driven decisions from unstructured data, all with minimal investment of time and resources. Experimental results demonstrate the viability of using fine-tuned Small Language Models (SLMs) for the generation of high-quality financial reports. Specifically, the selected model, finetune qlora unsloth llama 3.1 8B Instruct bnb 4bit v2 Q8 0, achieved an average score of 5.67 out of 10 in the evaluation conducted by a judge LLM, with an average cosine distance of 0.159 compared to the reference summaries generated by the foundational pretrained model GPT-4.1. This improvement represents a 97.5% increase in performance compared to the same base model, Llama 3.1 8B Instruct, without fine-tuning. Qualitatively, the model exhibits high fidelity and coherence in the extraction and synthesis of key information in moderately long contexts, although it faces challenges in thematic interpretation when dealing with considerably lengthy transcripts. Additionally, implementation of this tool is projected to save 560 hours annually for individual investors, along with an estimated annual reduction in API costs ranging from 7.52 to 25 for the channels analyzed in the proof of concept.
  • Publicación
    Estrategias de optimización para canales transaccionales físicos en el sector bancario colombiano
    (Universidad EAFIT, 2025) Zapata Jiménez, John Fredy; López Moreno, Ana María
    Digital transformation is rapidly reshaping the landscape of traditional banking, creating a dilemma for financial institutions: integrate new digital channels or improve the distribution of existing physical ones. This thesis explores how multi-objective optimization techniques, such as integer linear programming and discrete-event stochastic simulation, can help address this dilemma within the context of the Colombian financial system. In an environment where customer habits and distribution models are constantly evolving, decision-makers must consider the impact of technology, implementation costs, and the adaptability of channels. This research addresses these challenges by developing a theoretical framework based on heuristic modeling and advanced techniques such as clustering and NLP algorithms. The aim is to provide recommendations for optimizing the distribution of transactional channels to enhance operational efficiency and customer experience. The thesis focuses on four specific objectives: retrieving and storing transaction data from distribution channels; preparing this information for clustering modeling; developing an optimization model for the distribution of physical channels; and analyzing the information to segment channels according to the optimization model. Optimizing distribution channels is essential to maintaining a competitive advantage in an increasingly digital environment. By effectively combining digital and physical channels, the banking system can improve operational efficiency, broaden its reach, and respond more agilely to market demands. This study offers a comprehensive perspective and practical solutions to address current challenges in the distribution of physical transactional channels in the Colombian banking sector.
  • Publicación
    Modelo de predicción de venta en una compañía textil con técnicas de Machine Learning
    (Universidad EAFIT, 2025) Lezcano Echeverri, Jhon Wilder; Puerta Puerta, Henry Daniel
    This study explores the implementation of sales forecasting models in a Colombian textile company, combining traditional techniques with Machine Learning-based approaches. Daily sales data from 187 stores between 2021 and 2025 were analyzed. The methodology followed five stages: (1) exploratory analysis, (2) feature engineering, (3) model implementation, (4) model optimization and fine-tunning, and (5) comparative validation. The models implemented were: Prophet, XGBoost, Random Forest, and regularized Linear Regression. Prophet achieved the best overall performance for units sold (R² = 0.7121), standing out for its ability to capture complex seasonal patterns and adapt to store-level variability. XGBoost demonstrated high accuracy in non-linear scenarios, Random Forest showed robustness to noise, and Linear Regression provided greater interpretability. Feature engineering resulted in 83 variables, including temporal components, trends, volatility, and special effects. A cross-sectional analysis revealed common patterns such as peak underestimation, higher error in smaller stores and weekends, and lower accuracy in predicting monetary values compared to units. The findings confirm that sales forecasting using Machine Learning offers substantial improvements over traditional methods, enhancing operational efficiency, inventory optimization, and financial planning. Prophet is recommended as the primary model, along with the establishment of monthly recalibration cycles to maintain accuracy.
  • Publicación
    Detección de tópicos con aprendizaje automático para la identificación de riesgos emergentes
    (Universidad EAFIT, 2025) Hernández Martínez, Felipe; Peña Palacio, Juan Alejandro
  • Publicación
    On a Combination of Skewness and Kurtosis Matrices for Pro jection Pursuit Exploratory Cluster Analysis
    (Universidad EAFIT, 2025) Jaramillo Osorio, Esteban; Ortiz Arias, Santiago
    Skewness and kurtosis are statistical measures critical for understanding distribu- tion characteristics, particularly in normality testing, clustering, and outlier detec- tion. While kurtosis has been widely explored in the literature, skewness remains un- derutilized despite its potential for identifying asymmetrical patterns in data. Com- bining these measures could create a robust tool for exploratory data analysis (EDA). This research proposes a novel approach by developing a convex combination of skew- ness and kurtosis matrices. Using iterative procedures to maximize or minimize this combination, we aim to construct a matrix serving as a projection index for a projec- tion pursuit algorithm. This matrix can identify clusters and outliers more effectively than either measure alone. To validate the methodology, experiments on artificial datasets and real-world data demonstrate the benefits of this combined approach in detecting non-normal features, evaluating clustering performance, and enhancing outlier detection.
  • Publicación
    Respuestas a preguntas en contratos de arrendamiento bajo la normativa ASC (Accounting Standards Codification) 842 utilizando grandes modelos de lenguaje
    (Universidad EAFIT, 2025) Armendáriz Peña, David Adrián; Olarte Hernández, Tomás
    The ASC 842 standard, part of GAAP (Generally Accepted Accounting Principles) in the United States, establishes rules for recording leases in financial statements, enhancing transparency and comparability. However, its implementation poses significant challenges, such as interpreting complex contracts and extracting key information, tasks often performed manually, leading to high costs and errors. This thesis develops an automated system to address relevant questions about lease contracts using Natural Language Processing, Large Language Models, and Retrieval Augmented Generation. The goal is to reduce reliance on external consultants by identifying the information needed to draft technical accounting memos automatically. The GenAI Lifecycle methodology was employed, including text vectorization using embedding models and data storage in vector databases like Pinecone. Using lease contracts obtained from the Security Exchange Comission, the system was developed to answer key questions such as dates, purchase options, or renewal terms, achieving at least 70% accuracy. The results demonstrate that the system significantly reduces the time and costs associated with contract analysis, improving the accuracy in compliance with ASC 842. This approach has practical implications for the accounting industry, offering a scalable solution that democratizes access to advanced artificial intelligence tools, enabling companies to efficiently manage their regulatory processes. This work represents a significant step forward in integrating artificial intelligence to solve real-world accounting problems, fostering innovation in the extraction and analysis of regulatory information.
  • Publicación
    Characterization of Phytosanitary Risks in Agricultural Crops using Multispectral Images
    (Universidad EAFIT, 2025) García Montenegro, Michell; Peña Palacio, Juan Alejandro; Martínez Vargas, Juan David; Royal Academy of Engineering, Distinguished International Associates Program and RiSE group.
  • Publicación
    Modelo predictivo para optimizar el proceso de selección de aspirantes a becas talento en la Universidad EAFIT
    (Universidad EAFIT, 2025) Acosta Ospina, Juan Pablo; Tabares Betancur, Marta Silvia del Socorro
  • Publicación
    Comparación de métodos de aprendizaje de máquina en el análisis de series temporales para la predicción de tasas de cambio
    (Universidad EAFIT, 2025) Restrepo Vallejo, Stevens; Almonacid Hurtado, Paula María
    The study of global financial markets represents a complex field of research, characterized by high competitiveness and volatility. The analysis of exchange rates serves as a focal point for investors and firms aiming to maximize profitability while minimizing risks. Although various techniques currently exist for estimating exchange rate price changes, the inherent stochastic nature of the market, coupled with the influence of political-economic factors, continues to pose significant challenges for precise and reliable data analysis. This study addresses the prediction of the prices of some of the most significant exchange rates in this market. Machine learning methods, which have demonstrated outstanding performance in the literature on time series forecasting, are compared and evaluated against a baseline linear model. The study primarily employs Random Forest models, Long Short-Term Memory (LSTM) neural networks, and a hybrid model combining Convolutional Neural Networks (CNNs) with LSTMs. Additionally, the robustness of these models is explored in the presence of outliers, with the aim of mitigating the risks associated with predictions involving highly variable data behaviors. The goal is to develop an adaptable analytical framework that enables investors and financial analysts to anticipate market movements, thereby enhancing their ability to make data-driven, informed decisions.
Toda persona que consulte en este repositorio podrá copiar apartes del texto citando siempre la fuentes, es decir el título del trabajo y el autor. Esta autorización no implica la renuncia a la facultad que tiene el autor de publicar total o parcialmente la obra.
La Universidad no será responsable de ninguna reclamación que pudiera surgir de terceros que invoquen autoría de la obra que presenta el autor.
Todos los derechos reservados.