Examinando por Materia "data integration"
Mostrando 1 - 1 de 1
Resultados por página
Opciones de ordenación
Ítem Revisión de técnicas estadísticas bayesianas para la coincidencia de entidades en conjuntos de datos grandes(Universidad EAFIT, 2024) López Valencia, Sebastián; Suárez Sierra, Biviana MarcelaIn the context of data analysis, entity matching is a crucial task that involves identifying and pairing records that represent the same entity across different data sources. This work provides a review of various statistical techniques, with a particular focus on Bayesian methods, to address this problem in large datasets. In the theoretical framework and state of the art, various matching techniques are reviewed, including rule-based methods, text distance functions, and machine learning-based methods. Several optimization strategies are also presented to reduce the computational cost associated with entity matching, including heuristics, less complex distance measures, and fast-converging learning algorithms. A notable approach is to group and then compare entities, which significantly reduces the complexity of the necessary comparisons. In the data description section, the procedures for data acquisition and preprocessing are detailed, which are fundamental to ensure the quality and relevance of the datasets used in the experiments. The work methodology is described in detail, covering everything from business knowledge to data acquisition, understanding, and modeling. Finally, in the development of methods and results, the findings obtained through the application of the reviewed and proposed techniques in this thesis are presented. The conclusions highlight the effectiveness of Bayesian techniques and suggest areas for future research.