Maestría en Ciencias de los Datos y Analítica (tesis)

URI permanente para esta colección

https://hdl.handle.net/10784/17488

Examinar

Mostrando 1 - 20 de 185

Comparación de algoritmos de control tradicionales con control por medio de IA en un entorno simulado 2D
(Universidad EAFIT, 2025) Penagos Ramírez, Mateo Fernando; Puerta Echandía, Alejandro
In automation and control technology, the efficiency of control algorithms is crucial for the performance and safety of complex systems. Traditionally, Proportional–Integral–Derivative (PID) controllers have been the cornerstone of regulating these systems due to their simplicity and robustness, being used in more than 90% of industrial applications. Methods such as PID work very well in environments that are easy to model mathematically and have little variability; such systems can be found in industrial plants and production equipment. However, in dynamic and nonlinear environments, performance issues or difficulties in tuning their parameters may arise. This project aims to compare the performance of traditional control algorithms—specifically PID—with those based on neural networks trained using reinforcement learning techniques. To this end, a simulated 2D environment will be developed to replicate the dynamic behavior of a nonlinear system; in this case, a drone in flight was chosen. This nonlinear environment will allow evaluation of both types of controllers under a series of conditions and operational challenges, including stabilization, trajectory tracking, and response to external disturbances. The research will focus on measuring the accuracy, efficiency, and adaptability of each algorithm, providing an objective basis for comparison that considers key performance metrics. In addition, the inherent advantages and limitations of each approach will be analyzed, including implementation complexity, computational requirements, and scalability.
Análisis de patrones espaciales emergentes de lluvia en la ciudad de Medellín
(Universidad EAFIT, 2025) Rangel Velásquez, Diego; Olarte Hernández, Tomás; Sepúlveda Berrio, Julián
El aprovechamiento de datos meteorológicos es importante para la pronta atención de emergencias causadas por fenómenos climatológicos. Los sistemas de monitoreo climático brindan información valiosa para la gestión de riesgos, pero su aprovechamiento está estrechamente relacionado a los modelos predictivos que se puedan construir basándose en esta información. En este caso, mediante análisis de mediciones pluviométricas se buscó identificar patrones espaciales emergentes en picos de lluvia que pueden llevar a emergencias que requieran intervención de entidades de atención a desastres. Aunque existen estudios sobre distribución de precipitaciones, potencial de desarrollar modelos que se ajusten mejor a las condiciones ambientales de ciudades específicas. Esta investigación desarrolló un modelo que se adapta a las condiciones específicas del valle de Aburrá, anticipando la llegada de torrenciales a zonas de riesgo específicas. Se encontró que es posible anticipar la evolución de precipitaciones en escenarios específicos de precipitaciones convencionalmente elevadas.
Implementación de redes neuronales recurrentes LSTM para el pronóstico de material particulado PM2,5 en el Valle de Áburra
(Universidad EAFIT, 2025) González, Javier; Coy, Jairo; Suarez, Biviana; Suarez Sierra Biviana Marcela; Coy Coy Jairo Iván; SIATA
Air pollution is one of the environmental issues of growing global relevance. Poor air quality, particularly in urban areas, has become a scourge that, in addition to having a direct impact on health, significantly contributes to the increase in mortality rates in various parts of the world. In this context, the metropolitan region of the Aburrá Valley is particularly affected due to its topographical and meteorological conditions, combined with a high level of industrialization and the continuous growth of the vehicle fleet, which favors the accumulation of gases and particulate matter in the atmosphere. This project aims to develop a simple and robust model based on LSTM neural networks, in order to forecast PM$_{2.5}$ pollutant concentrations. The main purpose of this tool is to contribute to the risk management associated with critical air quality episodes in the metropolitan region, helping to mitigate the adverse effects on public health.
Extracción de información no estructurada de extractos bancarios a partir de modelos de lenguaje de gran tamaño
(Universidad EAFIT, 2025) Acevedo Jaramillo, Jairo David; Valencia Díaz, Edison
Integración de modelos estadísticos y de aprendizaje automático para predecir y mitigar la rotación voluntaria de empleados
(Universidad EAFIT, 2025) González Ruiz, John Jairo; Almonacid Hurtado, Paula María
Discriminación étnica en préstamos hipotecarios en estados unidos : un análisis predictivo con métodos causales y de aprendizaje automático
(Universidad EAFIT, 2025) Galeano Naranjo, Juan Pablo; Almonacid Hurtado, Paula María; Álvarez Franco, Pilar Beatriz; Cruz Castañeda, Vivian
Identificación de riesgos emergentes por medio del análisis de sentimientos con técnicas de aprendizaje automático
(Universidad EAFIT, 2025) Merizalde Maya, Pablo; Peña Palacio, Juan Alejandro
Ajuste fino de un modelo LLM para realizar reportes resumidos de expertos en trading, con integración de datos desde redes sociales
(Universidad EAFIT, 2025) Restrepo Acevedo, Andrés Felipe; Martínez Vargas, Juan David
The contemporary financial market is characterized by its high complexity and the massive volume of structured and unstructured data generated daily, posing significant challenges for individual investors in terms of analysis and informed decision making. This project proposes the fine-tuning of a Small Language Model (SLM) integrated into a tool capable of generating financial analysis reports similar to those produced by experts. For the proof of concept (PoC), transcripts from financial analysis videos published by experts on their YouTube channels are utilized. The SLM is fine-tuned using instruction-based techniques and the incorporation of the LoRa(Low-Rank Adapters) method, with the aim of extracting and summarizing key information relevant to individual investors. The main objective of this tool is to assist individual investors by generating efficient and accessible reports, facilitating access to valuable information in natural language, and enhancing their ability to make data-driven decisions from unstructured data, all with minimal investment of time and resources. Experimental results demonstrate the viability of using fine-tuned Small Language Models (SLMs) for the generation of high-quality financial reports. Specifically, the selected model, finetune qlora unsloth llama 3.1 8B Instruct bnb 4bit v2 Q8 0, achieved an average score of 5.67 out of 10 in the evaluation conducted by a judge LLM, with an average cosine distance of 0.159 compared to the reference summaries generated by the foundational pretrained model GPT-4.1. This improvement represents a 97.5% increase in performance compared to the same base model, Llama 3.1 8B Instruct, without fine-tuning. Qualitatively, the model exhibits high fidelity and coherence in the extraction and synthesis of key information in moderately long contexts, although it faces challenges in thematic interpretation when dealing with considerably lengthy transcripts. Additionally, implementation of this tool is projected to save 560 hours annually for individual investors, along with an estimated annual reduction in API costs ranging from 7.52 to 25 for the channels analyzed in the proof of concept.
Estrategias de optimización para canales transaccionales físicos en el sector bancario colombiano
(Universidad EAFIT, 2025) Zapata Jiménez, John Fredy; López Moreno, Ana María
Digital transformation is rapidly reshaping the landscape of traditional banking, creating a dilemma for financial institutions: integrate new digital channels or improve the distribution of existing physical ones. This thesis explores how multi-objective optimization techniques, such as integer linear programming and discrete-event stochastic simulation, can help address this dilemma within the context of the Colombian financial system. In an environment where customer habits and distribution models are constantly evolving, decision-makers must consider the impact of technology, implementation costs, and the adaptability of channels. This research addresses these challenges by developing a theoretical framework based on heuristic modeling and advanced techniques such as clustering and NLP algorithms. The aim is to provide recommendations for optimizing the distribution of transactional channels to enhance operational efficiency and customer experience. The thesis focuses on four specific objectives: retrieving and storing transaction data from distribution channels; preparing this information for clustering modeling; developing an optimization model for the distribution of physical channels; and analyzing the information to segment channels according to the optimization model. Optimizing distribution channels is essential to maintaining a competitive advantage in an increasingly digital environment. By effectively combining digital and physical channels, the banking system can improve operational efficiency, broaden its reach, and respond more agilely to market demands. This study offers a comprehensive perspective and practical solutions to address current challenges in the distribution of physical transactional channels in the Colombian banking sector.
Modelo de predicción de venta en una compañía textil con técnicas de Machine Learning
(Universidad EAFIT, 2025) Lezcano Echeverri, Jhon Wilder; Puerta Puerta, Henry Daniel
This study explores the implementation of sales forecasting models in a Colombian textile company, combining traditional techniques with Machine Learning-based approaches. Daily sales data from 187 stores between 2021 and 2025 were analyzed. The methodology followed five stages: (1) exploratory analysis, (2) feature engineering, (3) model implementation, (4) model optimization and fine-tunning, and (5) comparative validation. The models implemented were: Prophet, XGBoost, Random Forest, and regularized Linear Regression. Prophet achieved the best overall performance for units sold (R² = 0.7121), standing out for its ability to capture complex seasonal patterns and adapt to store-level variability. XGBoost demonstrated high accuracy in non-linear scenarios, Random Forest showed robustness to noise, and Linear Regression provided greater interpretability. Feature engineering resulted in 83 variables, including temporal components, trends, volatility, and special effects. A cross-sectional analysis revealed common patterns such as peak underestimation, higher error in smaller stores and weekends, and lower accuracy in predicting monetary values compared to units. The findings confirm that sales forecasting using Machine Learning offers substantial improvements over traditional methods, enhancing operational efficiency, inventory optimization, and financial planning. Prophet is recommended as the primary model, along with the establishment of monthly recalibration cycles to maintain accuracy.
Detección de tópicos con aprendizaje automático para la identificación de riesgos emergentes
(Universidad EAFIT, 2025) Hernández Martínez, Felipe; Peña Palacio, Juan Alejandro
On a Combination of Skewness and Kurtosis Matrices for Pro jection Pursuit Exploratory Cluster Analysis
(Universidad EAFIT, 2025) Jaramillo Osorio, Esteban; Ortiz Arias, Santiago
Skewness and kurtosis are statistical measures critical for understanding distribu- tion characteristics, particularly in normality testing, clustering, and outlier detec- tion. While kurtosis has been widely explored in the literature, skewness remains un- derutilized despite its potential for identifying asymmetrical patterns in data. Com- bining these measures could create a robust tool for exploratory data analysis (EDA). This research proposes a novel approach by developing a convex combination of skew- ness and kurtosis matrices. Using iterative procedures to maximize or minimize this combination, we aim to construct a matrix serving as a projection index for a projec- tion pursuit algorithm. This matrix can identify clusters and outliers more effectively than either measure alone. To validate the methodology, experiments on artificial datasets and real-world data demonstrate the benefits of this combined approach in detecting non-normal features, evaluating clustering performance, and enhancing outlier detection.
Optimización de lotes de fabricación en una industria cosmética para maximizar el GMROI : un enfoque integrado de algoritmos de aprendizaje automático y ARIMA
(Universidad EAFIT, 2025) Idárraga Ojeda, Leidy Viviana; Almonacid Hurtado, Paula María
Respuestas a preguntas en contratos de arrendamiento bajo la normativa ASC (Accounting Standards Codification) 842 utilizando grandes modelos de lenguaje
(Universidad EAFIT, 2025) Armendáriz Peña, David Adrián; Olarte Hernández, Tomás
The ASC 842 standard, part of GAAP (Generally Accepted Accounting Principles) in the United States, establishes rules for recording leases in financial statements, enhancing transparency and comparability. However, its implementation poses significant challenges, such as interpreting complex contracts and extracting key information, tasks often performed manually, leading to high costs and errors. This thesis develops an automated system to address relevant questions about lease contracts using Natural Language Processing, Large Language Models, and Retrieval Augmented Generation. The goal is to reduce reliance on external consultants by identifying the information needed to draft technical accounting memos automatically. The GenAI Lifecycle methodology was employed, including text vectorization using embedding models and data storage in vector databases like Pinecone. Using lease contracts obtained from the Security Exchange Comission, the system was developed to answer key questions such as dates, purchase options, or renewal terms, achieving at least 70% accuracy. The results demonstrate that the system significantly reduces the time and costs associated with contract analysis, improving the accuracy in compliance with ASC 842. This approach has practical implications for the accounting industry, offering a scalable solution that democratizes access to advanced artificial intelligence tools, enabling companies to efficiently manage their regulatory processes. This work represents a significant step forward in integrating artificial intelligence to solve real-world accounting problems, fostering innovation in the extraction and analysis of regulatory information.
Characterization of Phytosanitary Risks in Agricultural Crops using Multispectral Images
(Universidad EAFIT, 2025) García Montenegro, Michell; Peña Palacio, Juan Alejandro; Martínez Vargas, Juan David; Royal Academy of Engineering, Distinguished International Associates Program and RiSE group.
Modelo predictivo para optimizar el proceso de selección de aspirantes a becas talento en la Universidad EAFIT
(Universidad EAFIT, 2025) Acosta Ospina, Juan Pablo; Tabares Betancur, Marta Silvia del Socorro
Comparación de métodos de aprendizaje de máquina en el análisis de series temporales para la predicción de tasas de cambio
(Universidad EAFIT, 2025) Restrepo Vallejo, Stevens; Almonacid Hurtado, Paula María
The study of global financial markets represents a complex field of research, characterized by high competitiveness and volatility. The analysis of exchange rates serves as a focal point for investors and firms aiming to maximize profitability while minimizing risks. Although various techniques currently exist for estimating exchange rate price changes, the inherent stochastic nature of the market, coupled with the influence of political-economic factors, continues to pose significant challenges for precise and reliable data analysis. This study addresses the prediction of the prices of some of the most significant exchange rates in this market. Machine learning methods, which have demonstrated outstanding performance in the literature on time series forecasting, are compared and evaluated against a baseline linear model. The study primarily employs Random Forest models, Long Short-Term Memory (LSTM) neural networks, and a hybrid model combining Convolutional Neural Networks (CNNs) with LSTMs. Additionally, the robustness of these models is explored in the presence of outliers, with the aim of mitigating the risks associated with predictions involving highly variable data behaviors. The goal is to develop an adaptable analytical framework that enables investors and financial analysts to anticipate market movements, thereby enhancing their ability to make data-driven, informed decisions.
Estimación del crecimiento poblacional de Leptopharsa Gibbicarina en palma de aceite (caso de estudio)
(Universidad EAFIT, 2025) Salazar Hoyos, Alejandro; Restrepo Arias, Juan Felipe
Detección temprana de melanoma : aplicación de técnicas de procesamiento de imágenes y aprendizaje profundo
(Universidad EAFIT, 2025) Lacouture Fierro, Juan David; Álvarez Barrera, Claudia Patricia
Skin cancer is the most common type of cancer worldwide, with melanoma accounting for only 1% of cases but causing most deaths associated with this disease. In the United States, 97,610 new cases of melanoma were diagnosed in 2023, with a mortality rate of 7,990. In Colombia, the incidence of melanoma has increased significantly in recent years. According to the Cuenta de Alto Costo, 7,881 new cases were reported in 2024, with 11.94% of diagnoses concentrated in Bogotá and the Central region. Additionally, the total number of cases treated in the country increased from 53,622 in 2017 to more than 105,000 in 2021. These figures place Colombia as the fourth country in the Americas with the highest incidence of melanoma, highlighting the urgent need to implement innovative tools for early diagnosis. This project develops a deep learning model to diagnose melanoma through medical imaging, utilizing convolutional neural networks and advanced image processing techniques. The model includes data collection, training, and validation, aiming to deliver rapid and accurate diagnoses. The research encourages for the integration of artificial intelligence into medical practice, enabling early diagnosis in regions with limited access to specialists and alleviating the burden on the healthcare system. In conclusion, this initiative represents a milestone in dermatological care in Colombia, benefiting both high-incidence areas and rural communities.
Medellín seguro : predicción inteligente del número de hurtos a personas con algoritmos basados en series temporales
(Universidad EAFIT, 2025) Guerra Medina, Cindy Paola; Moreno Reyes, Nicolas Alberto
Today, we are immersed in the data revolution, an era characterized by the importance of understanding past events to predict the future, and from these, support strategies that facilitate decision-making in advance. In this context, Colombia faces important challenges in terms of security and coexistence, challenges that can be addressed or estimated through data analysis; in Medellín, the open data portal Medata (medata.gov.co), allows access to historical and descriptive statistics on the incidence of crimes against persons such as theft; which is a recurring crime that affects the security, quality of life and economy of citizens. This project proposes the use of time series algorithms implemented in the IBM SPSS Modeler platform, a robust and flexible tool that facilitates the programming of predictive model competition (IBM, 2023, SPSS Modeler. Through its ability to identify patterns, trends and seasonality in historical data, it seeks to estimate the future incidence of theft from persons in the city of Medellín, disaggregating the analysis at the level of communes and neighborhoods. The projections will be made on a monthly basis for the months of October, November and December 2024, which will serve as input for the planning of preventive security strategies that contribute to the prioritization of areas that require greater attention and optimize available resources that minimize the negative impacts of crime and generate a greater sense of tranquility and confidence in citizens.

Examinar

Envíos recientes