González Ayala, Camilo Andrés.

An Exploratory Analysis of Digital Information using Natural Language Processing for the Planning and Decision Making Process of Water Resources in Bolivia : [Recurso Electrónico] / Camilo Andres, Gonzalez Ayala. - Bogotá (Colombia): Escuela Colombiana de Ingeniería Julio Garavito, 2021 - 76 p.: gráf.

Tesis (Magíster en Ingeniería Civil con énfasis en Recursos Hidráulicos y Medio Ambiente)


In recent years, the community is much more participatory in the planning and decisionmaking processes of Integrated Water Resources Management. However, differences between
competing stakeholders prevent the identification of important variables in decision-making. In
addition, the COVID-19 situation has prevented activities from being face to face with the
community where fundamental information is collected for the planning process. Faced with
this panorama, and with the aim of complementing the characterization of a water system, and
provide an alternative that collaborates in the planning and decision-making process, this
research focuses on analyzing digital information sources from the public media, obtaining
useful information from articles associated with a basin. The case study corresponds to La Paz
- Choqueyapu river basin in Bolivia. The information from 6 representative newspapers of that
country, related to water resources, was extracted. An exploratory analysis of the information
is executed and it is associated with historical information on hydrological phenomena such as
precipitation in the last decade, finding a good correlation between both sources of information.
Through the application of Named Entity Recognition, it was possible to identify different
entities associated with bodies of water, dams, authorities and communities that are present in
the basin.
Each of the articles is associated with a positive or negative sentiment according to its
content in order to carry out a qualitative analysis of the basin. From the article and its
associated sentiment, sentiment text classification models are build in the context of water
resources with the extracted articles with different techniques of word embedding and
classification machine learning algorithms. It was found that the model with the best
performance corresponds to the SVM algorithm with linear kernel and Word2vec continuous
bag of words word embedding, obtaining 84% accuracy. This result was compared with the
value obtained through the Spanish Sentiment Analysis library of 63%, evidencing a high
improvement in the classification of texts associated with water resources in the Spanish
language. Finally, by finding the most frequent words in a positive or negative context,
important variables can be evidenced for the improvement of the planning and decision-making
process.


DESARROLLO DE RECURSOS HÍDRICOS--RECURSOS NATURALES--BOLIVIA
PROCESAMIENTO DE LENGUAJE NATURAL
INFORMACIÓN DIGITAL
TESIS DE GRADO

628.144 / G643e