Maestría en Ciencias de los Datos y Analítica (tesis)

URI permanente para esta colección

Examinar

Envíos recientes

Mostrando 1 - 20 de 160
  • Publicación
    Clasificación ABC de inventarios mediante modelos de aprendizaje por refuerzo
    (Universidad EAFIT, 2025) Arrieta Salgado, Karolina; Almonacid Hurtado, Paula María
  • Publicación
    Predicción de ventas para una empresa de Hardware Business-to-Business
    (Universidad EAFIT, 2025) Sánchez Cárdenas, Hernán Felipe; Almonacid Hurtado, Paula María
  • Publicación
    Análisis comparativo de modelos predictivos para la estimación de PM2.5 : un enfoque basado en aprendizaje automático y predicción conformal
    (Universidad EAFIT, 2024) Camelo Valera, Matías; Martínez Vargas, Juan David; Sepúlveda Cano, Lina Maria
    Fine particulate matter (𝑃𝑀2.5pollution poses a significant environmental and public health challenge, requiring accurate predictive models for its monitoring and control. This study compares different machine learning approaches, including Linear Regression, Random Forest, and XGBoost, with and without the inclusion of mobility variables, to estimate 𝑃𝑀2.5 levels. Additionally, inductive conformal prediction is implemented to quantify uncertainty in the estimates and provide confidence intervals with 𝛼=0.05. The results show that while XGBoost experiences performance deterioration during training when mobility variables are included, it achieves the best validation performance with the lowest mean absolute error and the highest coefficient of determination. Conformal prediction enabled the establishment of confidence intervals with 89.26% coverage, close to the expected 95%, ensuring model reliability across different spatial and temporal scenarios. In conclusion, the use of machine learning models combined with advanced validation and calibration techniques, such as conformal prediction, enhances the accuracy and reliability of 𝑃𝑀2.5 estimation. However, the quality of input variables, particularly mobility-related data, remains a challenge, highlighting the need to incorporate meteorological information and improve data resolution. These findings contribute to the development of more reliable predictive tools for environmental management and air quality policy decision-making.
  • Publicación
    Detección automática de acordes empleando técnicas de caracterización de audio y machine learning
    (Universidad EAFIT, 2025) Gil Urrego, Rafael Alejandro; Martínez Vargas, Juan David; Sepúlveda Cano, Lina María
    Automatic chord detection in audio tracks is essential for developing various musical applications, such as music transcription and score generation. For this reason, there has been a growing interest in the field of data science to explore different strategies to address this need. The main approach studied in recent years is based on extracting features from audio files that contain chord information. Transforming the audio signal using different frequency analysis tools has generated data with a greater ability to describe the musical components present in the processed audio track. The Mel spectrogram and the Chromagram are some of the methods used for these tasks. Additionally, classical supervised analytical models such as Support Vector Machines (SVM), Random Forest, and Convolutional Neural Networks (CNN) have been employed in several studies. These models have demonstrated a high level of accuracy in chord identification. However, in most cases, they have been limited by the number of chord classes to estimate, as an increase in the number of classes can confuse the system, typically allowing a maximum of 24. In this thesis, a system for automatic chord identification was developed by implementing different classical and modern analytical models. For audio feature extraction, the pre-trained models HuBERT and VGGish were used. These extracted features were then fed into three classical models—SVM, Random Forest, and Gradient Boosting—to compare their results with those obtained by a modern model. The HuBERT architecture was chosen as the modern baseline model since it can function both as a feature extractor and a classifier. The experiments were conducted using recordings of 48 different chord classes, all played on a digital piano, providing a solid dataset for training and evaluating the proposed system’s performance. The study confirmed previous research findings: to obtain accurate chord class estimations, it is crucial to improve the characterization techniques of the input audio recordings. A recurring issue identified was the lack of a detailed description of the musical components in the recordings, which affected the models’ ability to deliver optimal results. Our findings highlight that precise feature extraction is key to reducing model generalization error, enabling better chord class identification in both classical supervised approaches and modern architectures such as HuBERT. Finally, it is concluded that modern models, including those based on Transformers, have a high dependency on the quantity and diversity of the data. To achieve effective adaptability, the training data must exhibit sufficient variations within the same class. When data lack intra-class variability, these systems struggle to adapt to new recordings, especially those with background noise or distortions.
  • Publicación
    Análisis del volumen útil diario del embalse de El Peñol de 2010 a 2023 a partir de datos funcionales
    (Universidad EAFIT, 2025) Giraldo Gómez, Sebastián; Ortiz Arias, Santiago
    This study analyzes the hydroelectric behavior of the El Peñol reservoir, with an emphasis on its historical dynamics. Comparisons were made with four Colombian reservoirs: El Peñol, Playas, Punchiná, and San Lorenzo. To achieve this, functional statistical techniques were applied to historical data from the period 2010-2023 provided by XM, along with information on the El Niño and La Niña phenomena obtained from the Institute of Hydrology, Meteorology, and Environmental Studies (IDEAM). The variables analyzed include the turbined volume, daily usable volume, total energy generation, and market prices, with the main objective of identifying temporal patterns, seasonal trends, and functional relationships between these variables. The analysis included the calculation of functional means, the estimation of functional variances, and the application of functional principal component analysis (functional PCA). These techniques made it possible to reduce the dimensionality of the data and understand the main factors influencing hydroelectric behavior. As part of the methodology, Fourier smoothing was used to represent the variables as continuous curves, facilitating noise removal and capturing underlying trends. This approach allowed for functional comparisons between the reservoirs, highlighting both similarities and differences in their operation. The results of this functional analysis provide a solid foundation for interpreting hydrological patterns in the Antioquia region, with special attention to the El Peñol reservoir and its impact on regional hydroelectric efficiency. This reservoir, one of the most important in the country, faces significant challenges arising from fluctuations in water availability and the effects of climate change, emphasizing the need for sustainable management strategies. In this context, functional indicators were developed to evaluate the sustainability of the reservoir’s operation and propose improvements in its management. This study contributes to the advancement of specific analytical tools for hydroelectric management in Colombia, also establishing a precedent for future research aimed at reservoirs with similar characteristics, both regionally and internationally.
  • Publicación
    Evaluación de rendimiento de diferentes modelo grandes de lenguaje para el reconocimiento de emociones en texto
    (Universidad EAFIT, 2024) López Atehortúa, David Alejandro; Montoya Múnera, Edwin Nelson
    It is becoming more common for people to express their opinions in short texts through different media thanks to the expansion of internet access. Understanding and efficiently analyzing an individual’s sentiment from a text is a task that is useful in multiple scenarios. For the above, a branch of computer science called Natural Language Processing (NLP) has been dedicated to developing techniques to understand everything related to human language. Traditional techniques, based on the frequency of a word or a group of consecutive words to classify the text in a positive, negative or neutral sentiment. These techniques have limitations because they fail to capture the full context of each word in a sentence, affecting their accuracy and ability to detect a more detailed spectrum of emotions. Recently, Long Language Models (LLMs) or Transformers revolutionized the way NLP is performed thanks to their ability to capture the context around each word in a text. This allows for the detection of feelings in a more precise way and even, the classification of the text into a more specific emotion such as joy, optimism, anger, sadness or others. This project aims to evaluate the performance of different LLMs to find the best performing one in emotion detection from short texts in English using datasets typically used in research related to NLP models.
  • Publicación
    Asesoría y prospección de visitas de clientes en agencias de autos por medio de chatbots e Inteligencia Artificial
    (Universidad EAFIT, 2025) Restrepo Acosta, Eduardo; Martínez Vargas, Juan David; Sepúlveda Cano, Lina María
  • Publicación
    Predicción dinámica del valor del flete de mercado para vehículos 3s3 del puerto de Buenaventura a Bogotá : un modelo integrado con variables exógenas económicas y del sector logístico
    (Universidad EAFIT, 2025) Vélez Medina, Camilo Alejandro; García Vargas, Johan Felipe
    Logistics, especially road transportation as a fundamental part of the supply chain, directly impacts the costs and availability of products in cities. This project develops a predictive model to estimate the market value of freight transportation for 3S3-type vehicles from the port of Buenaventura, Colombia, to Bogotá, Colombia. The variable of interest, referred to as FP_mean, corresponds to the daily average freight production cost. The innovation of the model lies in its ability to integrate critical exogenous variables, such as Brent crude oil prices, the exchange rate of the dollar, sector-specific factors collected in the SICE TAC (fuel, tolls, tires, lubricants, filters, maintenance, personnel), RNDC (National Road Cargo Dispatch Registry), and the arrival of ships at the port with their respective types of cargo. Multiple advanced modeling approaches were evaluated, including ARIMA, SARIMA, Random Forest, and LSTM, with the Random Forest model incorporating exogenous variables (random_forest_exogen) standing out for its superior performance, achieving an RMSE of 211,395.42 and a MAPE of 3.20%, making it the most accurate for estimating FP_mean. Additionally, the LSTM and SARIMA models also demonstrated competitive results, striking a balance between accuracy and stability across various scenarios. These findings highlight the importance of combining advanced machine learning techniques with domain expertise in logistics.
  • Publicación
    Modelo AR-MIMO : mejora del pronóstico de múltiples horizontes en series de tiempo con optimizaciones heurísticas
    (Universidad EAFIT, 2025) Arias Zuluaga, Pablo Simón; Saldarriaga Aristizábal, Pablo Andrés
  • Publicación
    Implementación de modelos de machine learning para la predicción de tendencias en pares de divisas del mercado forex
    (Universidad EAFIT, 2025) Ramírez Escobar, Sebastián; Almonacid Hurtado, Paula María
  • Publicación
    Reducción de ruido en señales bioacústicas : un enfoque basado en wavelets y aplicado al monitoreo de poblaciones de aves y anfibios
    (Universidad EAFIT, 2025) Carvalho Salazar, Sebastián; García Vargas, Johan Felipe
    Bioacoustic monitoring techniques enable non-invasive detection of biological populations through automatic recorders that continuously capture species vocalizations in natural habitats. This study assesses the impact of wavelet-based noise reduction on bioacoustic signal processing and evaluates its influence on benchmark classification models, specifically BirdNET for birds and AnuraSet for amphibians. Our methodology includes noise reduction preprocessing, followed by an in-depth analysis of classification performance metrics such as mel cosine similarity, temporal correlation, entropy ratio, and ROC-AUC curves. Results indicate that noise reduction enhances signal clarity and reduces false alarm rates, enabling more accurate discrimination in acoustically complex environments like urban areas and rainforests. Although the technique may suppress some subtle vocalization features, statistical analysis and radar plots suggest that adjustments to the denoising process can help optimize the balance between noise reduction and preservation of essential bioacoustic characteristics. Consequently, wavelet-based noise reduction is a robust strategy for high-interference environments, though it may be less suitable for studies requiring comprehensive capture of all vocalizations, such as endangered or low-density species. Moreover, denoising regulates confidence in incorrect predictions and preserves relevant features in correct predictions, reducing false alarms.
  • Ítem
    Modelo de recomendación de nuevos productos a clientes actuales
    (Universidad EAFIT, 2024) Isaza Higuera, Pablo; Sepúlveda Cano, Lina María
  • Ítem
    Modelación de excedencias de periodos secos y húmedos en la cuenca del río Porce mediante procesos de Poisson no homogéneos
    (Universidad EAFIT, 2024) Ferrucho Maloof, Isaac Eli; Suárez Sierra, Biviana Marccela; Carmona Duque, Alejandra María
    The present study analyzes the periods of precipitation deficits and excesses in the Porce river basin, Colombia, during the period 1970 to 2023. Using the Standardized Precipitation Index (SPI) and monthly precipitation series data from meteorological stations selected for their data completeness in the Porce river basin, a model based on Non-Homogeneous Poisson Processes (NHPP) was developed and applied to identify and characterize these periods. Different NHPP configurations, such as linear, potential and exponential intensity functions, were evaluated. The results indicate that power-law and linear models, in most cases, provide a superior fit for estimating drought and wet periods, while exponential models presented notable limitations in the ability to accurately represent extreme drought and wet events. This finding underscores the importance of choosing appropriate models that respond to the climatic and geographic particularities of the region, contributing significantly to the improvement of water resources management and planning.
  • Ítem
    Currency Prediction : Stochastic hybrid diferencial equations with LSTM
    (Universidad EAFIT, 2024) Arbeláez Betancur, Hoover Arley; Marín Sánchez, Fredy Hernán
  • Ítem
    Automatic Electrical Meter Forecasting : a Benchmarking Between Quantum Machine Learning and Classical Machine learning
    (Universidad EAFIT, 2024) Montes Castro, Jonathan Javier; Lalinde Pulido, Juan Guillermo; Sosa-Sierra, Daniel
    This work benchmarks Quantum Long Short-Term Memory (QLSTM) against classical LSTM networks using electrical meter data (KWh) from EPM, a public utility company, clients. The results show that QLSTM models learn in half the epochs compared to LSTM, as measured by the MSE cost function, while maintaining strong performance with respect to bias (Mean Absolute Percentage Error, MAPE) and variance (R^2) metrics. QLSTM leverages variational quantum circuits (VQC) to replace traditional LSTM cell gates, demonstrating the potential of quantum-hybrid algorithms in forecasting tasks. This study highlights the efficiency and accuracy advantages of quantum machine learning applied to real-world data from EPM’s electrical metering services.
Todo persona que consulte en este repositorio podrá copiar apartes del texto citando siempre la fuentes, es decir el título del trabajo y el autor. Esta autorización no implica la renuncia a la facultad que tiene el autor de publicar total o parcialmente la obra.
La Universidad no será responsable de ninguna reclamación que pudiera surgir de terceros que invoquen autoría de la obra que presenta el autor.
Todos los derechos reservados.