Maestría en Ciencias de los Datos y Analítica (tesis)

URI permanente para esta colección

Examinar

Envíos recientes

Mostrando 1 - 20 de 173
  • Publicación
    Respuestas a preguntas en contratos de arrendamiento bajo la normativa ASC (Accounting Standards Codification) 842 utilizando grandes modelos de lenguaje
    (Universidad EAFIT, 2025) Armendáriz Peña, David Adrián; Olarte Hernández, Tomás
    The ASC 842 standard, part of GAAP (Generally Accepted Accounting Principles) in the United States, establishes rules for recording leases in financial statements, enhancing transparency and comparability. However, its implementation poses significant challenges, such as interpreting complex contracts and extracting key information, tasks often performed manually, leading to high costs and errors. This thesis develops an automated system to address relevant questions about lease contracts using Natural Language Processing, Large Language Models, and Retrieval Augmented Generation. The goal is to reduce reliance on external consultants by identifying the information needed to draft technical accounting memos automatically. The GenAI Lifecycle methodology was employed, including text vectorization using embedding models and data storage in vector databases like Pinecone. Using lease contracts obtained from the Security Exchange Comission, the system was developed to answer key questions such as dates, purchase options, or renewal terms, achieving at least 70% accuracy. The results demonstrate that the system significantly reduces the time and costs associated with contract analysis, improving the accuracy in compliance with ASC 842. This approach has practical implications for the accounting industry, offering a scalable solution that democratizes access to advanced artificial intelligence tools, enabling companies to efficiently manage their regulatory processes. This work represents a significant step forward in integrating artificial intelligence to solve real-world accounting problems, fostering innovation in the extraction and analysis of regulatory information.
  • Publicación
    Characterization of Phytosanitary Risks in Agricultural Crops using Multispectral Images
    (Universidad EAFIT, 2025) García Montenegro, Michell; Peña Palacio, Juan Alejandro; Martínez Vargas, Juan David
  • Publicación
    Modelo predictivo para optimizar el proceso de selección de aspirantes a becas talento en la Universidad EAFIT
    (Universidad EAFIT, 2025) Acosta Ospina, Juan Pablo; Tabares Betancur, Marta Silvia del Socorro
  • Publicación
    Comparación de métodos de aprendizaje de máquina en el análisis de series temporales para la predicción de tasas de cambio
    (Universidad EAFIT, 2025) Restrepo Vallejo, Stevens; Almonacid Hurtado, Paula María
    The study of global financial markets represents a complex field of research, characterized by high competitiveness and volatility. The analysis of exchange rates serves as a focal point for investors and firms aiming to maximize profitability while minimizing risks. Although various techniques currently exist for estimating exchange rate price changes, the inherent stochastic nature of the market, coupled with the influence of political-economic factors, continues to pose significant challenges for precise and reliable data analysis. This study addresses the prediction of the prices of some of the most significant exchange rates in this market. Machine learning methods, which have demonstrated outstanding performance in the literature on time series forecasting, are compared and evaluated against a baseline linear model. The study primarily employs Random Forest models, Long Short-Term Memory (LSTM) neural networks, and a hybrid model combining Convolutional Neural Networks (CNNs) with LSTMs. Additionally, the robustness of these models is explored in the presence of outliers, with the aim of mitigating the risks associated with predictions involving highly variable data behaviors. The goal is to develop an adaptable analytical framework that enables investors and financial analysts to anticipate market movements, thereby enhancing their ability to make data-driven, informed decisions.
  • Publicación
    Estimación del crecimiento poblacional de Leptopharsa Gibbicarina en palma de aceite (caso de estudio)
    (Universidad EAFIT, 2025) Salazar Hoyos, Alejandro; Restrepo Arias, Juan Felipe
  • Publicación
    Detección temprana de melanoma : aplicación de técnicas de procesamiento de imágenes y aprendizaje profundo
    (Universidad EAFIT, 2025) Lacouture Fierro, Juan David; Álvarez Barrera, Claudia Patricia
    Skin cancer is the most common type of cancer worldwide, with melanoma accounting for only 1% of cases but causing most deaths associated with this disease. In the United States, 97,610 new cases of melanoma were diagnosed in 2023, with a mortality rate of 7,990. In Colombia, the incidence of melanoma has increased significantly in recent years. According to the Cuenta de Alto Costo, 7,881 new cases were reported in 2024, with 11.94% of diagnoses concentrated in Bogotá and the Central region. Additionally, the total number of cases treated in the country increased from 53,622 in 2017 to more than 105,000 in 2021. These figures place Colombia as the fourth country in the Americas with the highest incidence of melanoma, highlighting the urgent need to implement innovative tools for early diagnosis. This project develops a deep learning model to diagnose melanoma through medical imaging, utilizing convolutional neural networks and advanced image processing techniques. The model includes data collection, training, and validation, aiming to deliver rapid and accurate diagnoses. The research encourages for the integration of artificial intelligence into medical practice, enabling early diagnosis in regions with limited access to specialists and alleviating the burden on the healthcare system. In conclusion, this initiative represents a milestone in dermatological care in Colombia, benefiting both high-incidence areas and rural communities.
  • Publicación
    Medellín seguro : predicción inteligente del número de hurtos a personas con algoritmos basados en series temporales
    (Universidad EAFIT, 2025) Guerra Medina, Cindy Paola; Moreno Reyes, Nicolas Alberto
    Today, we are immersed in the data revolution, an era characterized by the importance of understanding past events to predict the future, and from these, support strategies that facilitate decision-making in advance. In this context, Colombia faces important challenges in terms of security and coexistence, challenges that can be addressed or estimated through data analysis; in Medellín, the open data portal Medata (medata.gov.co), allows access to historical and descriptive statistics on the incidence of crimes against persons such as theft; which is a recurring crime that affects the security, quality of life and economy of citizens. This project proposes the use of time series algorithms implemented in the IBM SPSS Modeler platform, a robust and flexible tool that facilitates the programming of predictive model competition (IBM, 2023, SPSS Modeler. Through its ability to identify patterns, trends and seasonality in historical data, it seeks to estimate the future incidence of theft from persons in the city of Medellín, disaggregating the analysis at the level of communes and neighborhoods. The projections will be made on a monthly basis for the months of October, November and December 2024, which will serve as input for the planning of preventive security strategies that contribute to the prioritization of areas that require greater attention and optimize available resources that minimize the negative impacts of crime and generate a greater sense of tranquility and confidence in citizens.
  • Publicación
    Estimación del efecto de las variables ambientales en la producción agrícola exportable en Antioquia usando modelos de ML
    (Universidad EAFIT, 2025) Páez Bermúdez, Johan Stiven; García Vargas, Johan Felipe
  • Publicación
    Financial well-being and credit behavior in Mexico
    (Universidad EAFIT, 2025) Patiño Hurtado, Germán Alonso; Hernández Zuluaga, Juan Felipe
  • Publicación
    Clasificación ABC de inventarios mediante modelos de aprendizaje por refuerzo
    (Universidad EAFIT, 2025) Arrieta Salgado, Karolina; Almonacid Hurtado, Paula María
  • Publicación
    Predicción de ventas para una empresa de Hardware Business-to-Business
    (Universidad EAFIT, 2025) Sánchez Cárdenas, Hernán Felipe; Almonacid Hurtado, Paula María
  • Publicación
    Análisis comparativo de modelos predictivos para la estimación de PM2.5 : un enfoque basado en aprendizaje automático y predicción conformal
    (Universidad EAFIT, 2024) Camelo Valera, Matías; Martínez Vargas, Juan David; Sepúlveda Cano, Lina Maria
    Fine particulate matter (𝑃𝑀2.5pollution poses a significant environmental and public health challenge, requiring accurate predictive models for its monitoring and control. This study compares different machine learning approaches, including Linear Regression, Random Forest, and XGBoost, with and without the inclusion of mobility variables, to estimate 𝑃𝑀2.5 levels. Additionally, inductive conformal prediction is implemented to quantify uncertainty in the estimates and provide confidence intervals with 𝛼=0.05. The results show that while XGBoost experiences performance deterioration during training when mobility variables are included, it achieves the best validation performance with the lowest mean absolute error and the highest coefficient of determination. Conformal prediction enabled the establishment of confidence intervals with 89.26% coverage, close to the expected 95%, ensuring model reliability across different spatial and temporal scenarios. In conclusion, the use of machine learning models combined with advanced validation and calibration techniques, such as conformal prediction, enhances the accuracy and reliability of 𝑃𝑀2.5 estimation. However, the quality of input variables, particularly mobility-related data, remains a challenge, highlighting the need to incorporate meteorological information and improve data resolution. These findings contribute to the development of more reliable predictive tools for environmental management and air quality policy decision-making.
  • Publicación
    Detección automática de acordes empleando técnicas de caracterización de audio y machine learning
    (Universidad EAFIT, 2025) Gil Urrego, Rafael Alejandro; Martínez Vargas, Juan David; Sepúlveda Cano, Lina María
    Automatic chord detection in audio tracks is essential for developing various musical applications, such as music transcription and score generation. For this reason, there has been a growing interest in the field of data science to explore different strategies to address this need. The main approach studied in recent years is based on extracting features from audio files that contain chord information. Transforming the audio signal using different frequency analysis tools has generated data with a greater ability to describe the musical components present in the processed audio track. The Mel spectrogram and the Chromagram are some of the methods used for these tasks. Additionally, classical supervised analytical models such as Support Vector Machines (SVM), Random Forest, and Convolutional Neural Networks (CNN) have been employed in several studies. These models have demonstrated a high level of accuracy in chord identification. However, in most cases, they have been limited by the number of chord classes to estimate, as an increase in the number of classes can confuse the system, typically allowing a maximum of 24. In this thesis, a system for automatic chord identification was developed by implementing different classical and modern analytical models. For audio feature extraction, the pre-trained models HuBERT and VGGish were used. These extracted features were then fed into three classical models—SVM, Random Forest, and Gradient Boosting—to compare their results with those obtained by a modern model. The HuBERT architecture was chosen as the modern baseline model since it can function both as a feature extractor and a classifier. The experiments were conducted using recordings of 48 different chord classes, all played on a digital piano, providing a solid dataset for training and evaluating the proposed system’s performance. The study confirmed previous research findings: to obtain accurate chord class estimations, it is crucial to improve the characterization techniques of the input audio recordings. A recurring issue identified was the lack of a detailed description of the musical components in the recordings, which affected the models’ ability to deliver optimal results. Our findings highlight that precise feature extraction is key to reducing model generalization error, enabling better chord class identification in both classical supervised approaches and modern architectures such as HuBERT. Finally, it is concluded that modern models, including those based on Transformers, have a high dependency on the quantity and diversity of the data. To achieve effective adaptability, the training data must exhibit sufficient variations within the same class. When data lack intra-class variability, these systems struggle to adapt to new recordings, especially those with background noise or distortions.
  • Publicación
    Análisis del volumen útil diario del embalse de El Peñol de 2010 a 2023 a partir de datos funcionales
    (Universidad EAFIT, 2025) Giraldo Gómez, Sebastián; Ortiz Arias, Santiago
    This study analyzes the hydroelectric behavior of the El Peñol reservoir, with an emphasis on its historical dynamics. Comparisons were made with four Colombian reservoirs: El Peñol, Playas, Punchiná, and San Lorenzo. To achieve this, functional statistical techniques were applied to historical data from the period 2010-2023 provided by XM, along with information on the El Niño and La Niña phenomena obtained from the Institute of Hydrology, Meteorology, and Environmental Studies (IDEAM). The variables analyzed include the turbined volume, daily usable volume, total energy generation, and market prices, with the main objective of identifying temporal patterns, seasonal trends, and functional relationships between these variables. The analysis included the calculation of functional means, the estimation of functional variances, and the application of functional principal component analysis (functional PCA). These techniques made it possible to reduce the dimensionality of the data and understand the main factors influencing hydroelectric behavior. As part of the methodology, Fourier smoothing was used to represent the variables as continuous curves, facilitating noise removal and capturing underlying trends. This approach allowed for functional comparisons between the reservoirs, highlighting both similarities and differences in their operation. The results of this functional analysis provide a solid foundation for interpreting hydrological patterns in the Antioquia region, with special attention to the El Peñol reservoir and its impact on regional hydroelectric efficiency. This reservoir, one of the most important in the country, faces significant challenges arising from fluctuations in water availability and the effects of climate change, emphasizing the need for sustainable management strategies. In this context, functional indicators were developed to evaluate the sustainability of the reservoir’s operation and propose improvements in its management. This study contributes to the advancement of specific analytical tools for hydroelectric management in Colombia, also establishing a precedent for future research aimed at reservoirs with similar characteristics, both regionally and internationally.
  • Publicación
    Evaluación de rendimiento de diferentes modelo grandes de lenguaje para el reconocimiento de emociones en texto
    (Universidad EAFIT, 2024) López Atehortúa, David Alejandro; Montoya Múnera, Edwin Nelson
    It is becoming more common for people to express their opinions in short texts through different media thanks to the expansion of internet access. Understanding and efficiently analyzing an individual’s sentiment from a text is a task that is useful in multiple scenarios. For the above, a branch of computer science called Natural Language Processing (NLP) has been dedicated to developing techniques to understand everything related to human language. Traditional techniques, based on the frequency of a word or a group of consecutive words to classify the text in a positive, negative or neutral sentiment. These techniques have limitations because they fail to capture the full context of each word in a sentence, affecting their accuracy and ability to detect a more detailed spectrum of emotions. Recently, Long Language Models (LLMs) or Transformers revolutionized the way NLP is performed thanks to their ability to capture the context around each word in a text. This allows for the detection of feelings in a more precise way and even, the classification of the text into a more specific emotion such as joy, optimism, anger, sadness or others. This project aims to evaluate the performance of different LLMs to find the best performing one in emotion detection from short texts in English using datasets typically used in research related to NLP models.
  • Publicación
    Asesoría y prospección de visitas de clientes en agencias de autos por medio de chatbots e Inteligencia Artificial
    (Universidad EAFIT, 2025) Restrepo Acosta, Eduardo; Martínez Vargas, Juan David; Sepúlveda Cano, Lina María
Todo persona que consulte en este repositorio podrá copiar apartes del texto citando siempre la fuentes, es decir el título del trabajo y el autor. Esta autorización no implica la renuncia a la facultad que tiene el autor de publicar total o parcialmente la obra.
La Universidad no será responsable de ninguna reclamación que pudiera surgir de terceros que invoquen autoría de la obra que presenta el autor.
Todos los derechos reservados.